In This Issue:
- A Note from the Scientific Director
- New Sample Size Calculator for SMART Trials on Web
- Featured Scientist: Alena Ixmocané Scott
- Announcing New Software for Variable Selection
- Recent Activity in The Methodology Center
- Ask a Methodologist
Welcome back to The Methodology Center Perspective!
Linda Collins, Director of the Methodology Center, is on sabbatical during a very exciting year in the Center. As Scientific Director, I have the pleasure of keeping in touch with all of you during her leave. I am pleased to announce that Bethany Bray, a recent pre-doctoral fellow in the NIDA-funded Prevention and Methodology Training program at Penn State, has accepted the position of Assistant Director of the Methodology Center.
NIH has awarded the Methodology Center four new grants. This summer, NIMH awarded Susan Murphy five years of funding to focus on improving methods for adaptive treatment strategies. This fall, NIH has awarded us two grants in response to an NIH Roadmap request for proposals focused on facilitating interdisciplinary research via methodological and technical innovation in the behavioral and social sciences. These two projects are truly interdisciplinary, with each involving two principal investigators from different disciplines. Linda Collins (a quantitative psychologist) and Daniel Rivera (an electrical engineer) will collaborate on developing engineering approaches for improving behavioral interventions. Runze Li (a statistician) and Lisa Dierker (a psychologist) will develop new methods for analyzing intensive longitudinal data. In addition, NIDA awarded me two years of funding to model multiple risks for adverse behavioral outcomes.
The Center continues to develop new software products to assist you in your research. PROC SCADLS is the latest in our set of SAS procedures (including PROC LCA and PROC LTA) which will be available soon from our website. This new procedure can be used to select a parsimonious and well-fitting subset of variables from a large number of predictors of an outcome, and fit an adjusted regression model using the selected variables. In addition, we have launched our first Web-based application, which calculates the sample size necessary for a SMART design under specified conditions. You can read more about these products in this issue. Be sure to check out our website’s new features (such as a historical timeline of our Center) and new look at methodology.psu.edu!
Our 12th Summer Institute on Longitudinal Methods, held in June 2007 at Penn State, was a great success due in large part to our well-known presenters! The Institute featured Donald Hedeker of the University of Illinois, Chicago, who taught a hands-on workshop on mixed models for longitudinal data, and Joseph Schafer of the Methodology Center, who presented a workshop on causal inference. I am excited to announce that David Mackinnon at Arizona State University will present on mediation modeling at the next Institute, to be held in June 2008. On-line registration will begin in February. Watch our website, or send a note to MC@psu.edu and request that we add you to our Summer Institute listserv.
We have organized an exciting suite of one-credit courses in methodology to be offered this spring at Penn State. Michael Rovine, Professor of Human Development and Family Studies, will teach Time Series Analysis; Donna Coffman, Research Associate in the Methodology Center, will offer a course on Mediation Analysis; and Rhonda BeLue, Assistant Professor of Health Policy and Administration, will teach Complex Adaptive Systems. Due to popular demand, we anticipate continuing to offer a spring suite of courses on advanced methodological topics for years to come.
Upon her return from sabbatical in summer 2008, Dr. Collins will become the new President of the Society for Prevention Research. We all look forward to her leadership as the Society continues to grow. Until then, may you all have compliant participants, clean data and the methodological tools you need to continue your exciting work in prevention and treatment sciences!
Stephanie T. Lanza,
Ph.D. Scientific Director, The Methodology Center
Penn State University
The Methodology Center is pleased to announce a new Web application for calculating the sample size necessary to identify the best strategy when using data from a SMART experimental design. The application assumes two first-stage treatments followed by either one or two second-stage treatments.
To use the sample size calculator, three quantities must be entered: the standardized effect size one wants to detect, the desired probability of correctly choosing the best strategy, and the maximum sample size one can afford or has available. Based on these values, the sample size necessary to identify the best strategy is reported. Check out the new applet at methodology.psu.edu!
Alena Ixmocané Scott
Alena Scott received her Ph.D. in statistics from Rice University in Houston, Texas. Currently, she is a postdoctoral research fellow at the Institute for Social Research at the University of Michigan. Her research focuses on developing statistical methodology for a new clinical trial design in order to answer questions about adaptive treatment strategies, which are evidence-based rules to assist in customizing the sequencing and timing of treatments to a patient. Alena’s previous work involved using new techniques in density estimation to develop adaptive thresholding algorithms for denoising signals contaminated with Gaussian noise.
Alena’s interest in adaptive treatment strategies is motivated by her interest in the treatment of drug dependence and mental illness. These diseases present a challenge for the treating clinician due to the heterogeneity in patient response to treatment, the chronic nature of the disease and the high probability of relapse after a response. These disease characteristics require the clinician to make many treatment decisions at sequential points in time. Specialized experimental designs, sequential multiple assignment randomized trials (SMART), have been proposed as a way to develop such sequences of treatments.
Alena is working with Susan Murphy at the University of Michigan to develop statistical methodology for guiding the design and analysis of a SMART trial. In a recently submitted publication, they present sample size formulae and test statistics for using a SMART design to answer four specific research questions about individual components of an adaptive treatment strategy as well as treatment strategies as a whole. This work was recently presented at two conferences, the Albert J. Silverman Research Conference and the SAMSI Summer 2007 Program on Challenges in Dynamic Treatment Regimes and Multistage Decision-Making. Another paper to extend these results is in progress.
We are pleased to announce the upcoming release of a beta version of PROC SCADLS, a SAS1 procedure for automated variable selection in linear regression when there are many candidate predictor variables. This procedure, developed by The Methodology Center, is near completion and will be available free of charge at . This procedure implements the methods described in Fan and Li (2001) for using penalized least squares with the Smoothly Clipped Absolute Deviation (SCAD) penalty as an alternative to older techniques such as stepwise regression. This penalty combines the automatic deletion of non-significant coefficients (as in significance testing or subset selection) with a small amount of shrinkage (as in ridge regression) in order to provide what may be a more statistically stable alternative to the older variable selection methods in higher-dimensional datasets. It accomplishes model selection and coefficient estimation in a single process.
The size of the penalty (and thus the size of the predicted model) is chosen using either the Generalized Cross-Validation or Bayesian Information Criterion fit measures. Variables considered to be of special importance can be forced into the model using a FORCEIN statement. The coefficient estimates and estimated standard errors for the included variables are provided, and can also be saved in separate SAS datasets.
A more general version of the procedure is planned for development in 2008. It will allow continuous, dichotomous, or count outcomes (the current version is for continuous only), as well as a CLASS statement to handle recoding of categorical predictors automatically. Sample code is shown below:
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.
1This procedure was developed for Version 9.1 of the SAS System for Windows. Copyright © 2002-2003, SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.
As part of the NIH Roadmap Initiative, NIH has awarded two four-year grants to scientists in the Methodology Center. Both grants share the goal of facilitating interdisciplinary research via methodological and technical innovation in the behavioral and social sciences. Accordingly, the grants each have two principal investigators. Runze Li and Lisa Dierker (Associate Professor of Psychology at Wesleyan University) are Principal Investigators of a grant to develop new models for analyzing intensive longitudinal data. Linda Collins and Daniel Rivera (Associate Professor of Chemical Engineering at Arizona State University) are Principal Investigators of a grant to study engineering approaches to improving behavioral interventions.
The National Institute of Mental Health awarded Susan Murphy (PI), Joelle Pineau (Assistant Professor of Computer Science, McGill University), A. John Rush (Professor of Clinical Sciences, University of Texas Southwestern Medical School) and Scott Stroup (Associate Professor of Psychiatry, University of North Carolina at Chapel Hill) the R01, ‘Learning Adaptive Treatment Strategies in Mental Health.’ The objective is to develop methodological analysis and evaluation tools for improving the adaptive, sequential clinical decision making that occurs in clinical practice, particularly regarding the management of patients suffering from chronic mental health disorders.
The National Institute on Drug Abuse awarded Stephanie Lanza (PI) an R03 to explore multiple risks for adverse developmental outcomes. She will use mixture models to quantify key multilevel profiles of risk that precede substance use, risky sexual behavior, and other problem behaviors.
Linda Collins has been named President-elect of the Society for Prevention Research.
Runze Li, along with Hui Zou, authored a paper, entitled ‘One-step sparse estimates in nonconcave penalized likelihood models,’ for Annals of Statistics that will be featured as a discussion piece.
A new article demonstrating the use of PROC LCA for latent class analysis appeared in Structural Equation Modeling:
Lanza, S. T., Collins, L. M., Lemmon, D. R., & Schafer, J. L. (2007). PROC LCA: A SAS procedure for latent class analysis. Structural Equation Modeling, 14(4), 671-694.
John Dziak and Runze Li authored the chapter ‘An overview on variable selection for longitudinal data’ in the book Quantitative medical data analysis using mathematical tools and statistical techniques (edited by D. Hong & Y. Shyr, published in 2007 by World Sciences Publisher).
This fall, the Methodology Center will release PROC SCADLS, a new SAS procedure that implements the Smoothly Clipped Absolute Deviation penalty (SCAD) variable selection method proposed by Fan and Li (2001). The goal is to select a parsimonious and well-fitting subset of variables from a large number of predictors of a continuous outcome variable, and then fit an adjusted regression equation to this subset. This procedure represents an attractive alternative to traditional variable selection criteria, such as adjusted R-square, AIC and BIC, along with stepwise/best subset variable selection algorithms. PROC SCADLS will be available for download on the Center's website.
The Methodology Center developed a new Web application that calculates the sample size necessary to identify the best strategy when using data from a SMART experimental design. See the Center website for more details and to give it a try!
Linda Collins attended and presented at two NIH meetings focusing on the etiology, prevention and treatment of overweight and obesity. Both meetings brought an interdisciplinary group of scientists together to share ideas about how to approach this growing public health issue. The first meeting was organized by the National Heart, Lung, and Blood Institute and was held in Bethesda on August 21-22, 2007. The second meeting was organized by the National Institute on Child Health and Human Development and was held on October 10-12, 2007. In both meetings, Linda stressed that there are many new methodological issues arising from behaviorally-oriented research on overweight and obesity, and that supporting methodological work that is integrated with etiology, prevention and treatment will help the obesity field move forward faster.
The 12th annual Summer Institute on Longitudinal Methods, funded by the National Institute on Drug Abuse, was held at Penn State June 4-6, 2007. Donald Hedeker presented Mixed Models for Longitudinal Continuous and Dichotomous Data and Joseph Schafer presented Practical Tools for Causal Inference.
I have been hearing a lot lately about propensity scores. What are they, and how can I use them? — Signed, Lost Cause
Propensity scores are useful when trying to draw causal conclusions from observational studies where the “treatment” (i.e. the “independent variable” or alleged cause) was not randomly assigned. For simplicity, let’s suppose the treatment variable has two levels: treated (T=1) and untreated (T=0). The propensity score for a subject is the probability that the subject was treated, P(T=1). In a randomized study, the propensity score is known; for example, if the treatment was assigned to each subject by the toss of a coin, then the propensity score for each subject is 0.5. In a typical observational study, the propensity score is not known, because the treatments were not assigned by the researcher. In that case, the propensity scores are often estimated by the fitted values (p-hats) from a logistic regression of T on the subjects’ baseline (pre-treatment) characteristics.
In an observational study, the treated and untreated groups are not directly comparable, because they may systematically differ at baseline. The propensity score plays an important role in balancing the study groups to make them comparable. Rosenbaum and Rubin (1983) showed that treated and untreated subjects with the same propensity scores have identical distributions for all baseline variables. This “balancing property” means that, if we control for the propensity score when we compare the groups, we have effectively turned the observational study into a randomized block experiment, where “blocks” are groups of subjects with the same propensities.
You may be wondering: why do we need to control for the propensity score, rather than controlling for the baseline variables directly? When we regress the outcome on T and other baseline characteristics, the coefficient for T is an average causal effect only under two very restrictive conditions. It assumes that the relations between the response and the baseline variables are linear, and that all of the slopes are the same whether T=0 or T=1. More elaborate analysis of covariance (ANCOVA) models can give better results (Schafer & Kang, under review), but they make other assumptions. Propensity scores provide alternative ways to estimate the average causal effect of T without strong assumptions about how the response is related to the baseline variables.
So, how do we use the propensity scores to estimate the average causal effect of T? Because the propensity score has the balancing property, we can divide the sample into subgroups (e.g., quintiles) based on the propensity scores. Then we can estimate the effect of T within each subgroup by an ordinary t-test, and pool the results across subgroups (Rosenbaum & Rubin, 1984). Alternatives to subclassification include matching and weighting. In matching, we find a subset of untreated individuals whose propensity scores are similar to those of the treated persons, or vice-versa (Rosenbaum, 2002). In weighting, we compare weighted averages of the response for treated and untreated persons, weighting the treated ones by 1/P(T=1) and the untreated ones by 1/P(T=0) (Lunceford & Davidian, 2004). However, very large weights can make estimates unstable.
Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23, 2937-2960.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55.
Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Assoication, 79, 516-524.
Rosenbaum, P. R. (2002). Observational Studies, 2nd Edition. New York: Springer Verlag.
Schafer, J. L., & Kang, J. D. Y. (under review). Average causal effects: A practical guide and simulated case study.