###

Latent Class Analysis (LCA)

**Cluster Analysis:**Algorithm for assigning individuals into groups (clusters) so that the most similar objects are grouped. Cluster analysis can be done using many different algorithms, and typically is based on responses to multiple continuous variables.**Configural Frequency Analysis (CFA):**Exploratory analysis of contingency table data to detect “types” and “antitypes” (cells occurring more/less frequently than expected by chance).**Factor Analysis:**Measurement model that posits that continuous underlying latent variables explain patterns of association among multiple (typically continuous) observed variables.**Finite Mixture Model:**A probabilistic model that represents unobserved sub-populations within an overall population based on responses to multiple observed variables. LCA is a type of finite mixture model.**Indicator:**Observed variable used in a measurement model such as LCA to measure a latent variable (often referred to as an item). For example, if the latent variable is “teen delinquency,” the indicators might include shoplifting, lying to parents, property damage, and carrying a gun.**Item:**See*Indicator***Latent Class Analysis:**Finite mixture model used to identify underlying (latent) subgroups within a population based on individuals’ responses to multiple observed variables. Factor analysis is based on continuous latent variables, whereas LCA is based on categorical latent variables.**Latent Transition Analysis:**Finite mixture model that extends LCA to longitudinal data, enabling scientists to estimate the incidence of transitions between underlying subgroups (latent statuses) over time.**Latent Variable:**An unobserved variable posited to explain a set of observed responses to indicators; in LCA the latent variables are categorical, whereas in factor analysis the latent variables are continuous. For example you might use observed values of shoplifting, lying to parents, property damage, and carrying a gun to estimate delinquency latent classes.

### Latent Transition Analysis (LTA)

**Finite Mixture Model:**A probabilistic model that represents unobserved sub-populations within an overall population based on responses to multiple observed variables. LCA is a type of finite mixture model.**Hidden Markov Model (HMM):**A statistical model for recovering a series of states over time from repeated observations. Certain HMMs based on multiple indicators of the state at each time are analogous to LTA.**Indicator:**Observed variable used in a measurement model such as LTA to measure a latent variable (often referred to as an item). For example, if the latent variable is “teen delinquency,” the indicators might include shoplifting, lying to parents, and property damage, and carrying a gun.**Item:**See*Indicator***Latent Class Analysis:**Finite mixture model used to identify underlying (latent) subgroups within a population based on individuals’ responses to multiple observed variables. Factor analysis is based on continuous latent variables, whereas LCA is based on categorical latent variables.**Latent Transition Analysis:**Finite mixture model that extends LCA to longitudinal data, enabling scientists to estimate the incidence of transitions between underlying subgroups (latent statuses) over time.

### Causal Inference

**Broken Randomized Trial:**A study that was designed to be a randomized controlled trial, but after treatment assignment was completed, some individuals received a different treatment than they were assigned to. One example is randomizing children to receive a particular preschool curriculum, but parents switch their children to a different center.**Causal Effect:**An observed outcome that occurs as the result of a prior treatment or exposure.**Causal Inference:**Detection of a causal relationship between an occurrence*T*(often called a treatment or an exposure) and an outcome*Y*.**Causal Mediation:**Building on the mediation model (see above), causal mediation is used to adjust for the fact that levels of the mediator*M*are not randomized. This approach allows for estimation of the causal effect of treatment/exposure*T*on the outcome that transmits through mediator*M*.**Confounder:**A variable that affects both the treatment/exposure*T*and the outcome*Y*, potentially biasing an estimate of the causal effect of*T*on*Y*. Confounders may be observed (i.e., measured in a study) or unobserved; observed confounders can be accounted for statistically when drawing causal inference.**Counterfactual:**A potential outcome that did not occur. For example, if we study the impact of attending preschool on later academic achievement, we would need to determine the achievement of a subject who went to preschool and what his achievement would have been if he had not attended preschool. Since it is impossible for him to retroactively NOT attend preschool, this outcome is the counterfactual.**Mediation Model:**This is the standard mediation model.**Moderator**: A variable that interacts with the effect of the treatment/exposure*T*on the outcome*Y*, or with the effect of the mediator*M*on the outcome*Y*. If different groups within the study sample experience different levels of the moderator, the outcomes for the groups will be different.**Mediator:**Sometimes a treatment impacts the outcome in an indirect manner, via a third variable, called a mediator. Mediation models posit that the treatment/exposure*T*variable causes change in the mediator*M*, which, in turn, causes change in the outcome*Y*.**Nonrandomized:**When participants in a study are not randomly assigned to treatment/exposure, as in an observational study.**Observational Study:**A study where the participants are not randomly assigned to the treatment. This usually occurs in studies where it would be impossible or unethical to randomize participants (e.g., it would be unethical to assign a nonsmoker to start smoking).**Potential Outcomes Framework:**A framework for measuring the impact of a treatment/exposure by considering what would have happened if the treatment had been different (i.e., a counterfactual outcome). For example, to consider the effect of marriage on substance use, you would need to observe outcomes for individuals given their actual marital status and also given the other status. Since the outcome can only be observed under one status, POTENTIAL outcomes are considered.**Propensity Score:**An individual’s probability of treatment/exposure modeled as a function of many possible confounding variables. These scores are a useful tool for comparing groups in different treatment/exposure conditions when the sample is not randomized.**Randomized Controlled Trial (RCT)****:**The gold standard for drawing causal inference of a treatment*T*on an outcome*Y*. Randomizing individuals into treatment conditions typically creates balance on all measured and unmeasured confounders, so that the effect of*T*on*Y*can be interpreted as causal.**Rubin Causal Model****:**(a.k.a. Potential Outcomes Framework) general framework for drawing causal inference based on the potential outcomes framework.**Selection Bias:**Bias in causal inference that results from the fact that individuals often self-select their treatment/exposure status. For example, college attendees may have lower alcohol use in adulthood than their non-college counterparts, but we cannot infer causality because individuals self-select into college.

Return to Causual Inference page

### Intensive Longitudinal Data (ILD)

**Ecological Momentary Assessment (EMA):**Repeated sampling of study participants’ experiences, emotions, behaviors, and/or situations in real time, within the context of their life (not a laboratory). Smartphones are often used to collect EMA.**Intensive Longitudinal Data (ILD):**Repeated measures data involving many (20 or more, often hundreds) measurements over time. Such data are often gathered using smart phones, web-based assessments, and diary studies.**Multilevel Data:**(a.k.a. “nested data”) Data from units that are structurally inside other units. Common examples include students nested within classrooms and repeated-measures nested within individuals.**Survival Data:**(a.k.a. “time-to-event data”) Data reflecting the time to an event occurrence.**Time-Invariant Covariate:**Covariates with values that remain the same over the course of a study. Examples include gender, race/ethnicity, family structure, family history of drug abuse, and intervention condition (for experiments with a single point of randomization).**Time-Invariant Effect:**The effect of a covariate is constant across the study. In a study of smoking behavior that follows students from age 12 to age 18, if boys were 15% more likely than girls to smoke during the entire study, gender would have an effect, but that effect would be time-invariant.**Time-Varying Covariate:**Covariates with values that change over time (unlike covariates such as gender, which are time-invariant). For example if you study teen drinking behavior, stress will be a time-varying covariate, because each teen’s stress level can vary from moment-to-moment and day-to-day.**Time-Varying Effect:**The*effect*of a covariate may change over time, whether or not the covariate itself varies over time. For example, in a smoking cessation intervention study, assignment to a nicotine replacement therapy condition might show an immediate effect on cravings, but that effect may diminish with time.**Time-Varying Effect Model (TVEM):**A natural extension of linear regression models where the*coefficients*can vary over time. This flexible approach allows the mean trajectory and effects of covariates to vary with time without assuming parametric (e.g., linear or quadratic) functions.

### Adaptive Interventions/SMART

**Adaptive Intervention:**(a.k.a. “adaptive treatment strategy” or “dynamic treatment regimen” or "treatment policy") A course of time-varying treatment designed to adapt to an individual’s changing life circumstances, response to treatment, or other designated indicator. The treatment is adapted via decision rules that input the values of the tailoring variables and output recommended treatment. The decision rules are developed from previous studies.**Adaptive Treatment Strategy:**(a.k.a. "adaptive intervention" or “dynamic treatment regimen” or "treatment policy") A course of time-varying treatment designed to adapt to a individual’s changing life circumstances, response to treatment, or other designated indicator. The treatment is adapted via decision rules that input the values of the tailoring variables and output recommended treatment. The decision rules are developed from previous studies.**Dynamic Treatment Regimen:**(a.k.a. "adaptive intervention" or “adaptive treatment strategy” or "treatment policy") A course of time-varying treatment designed to adapt to a participant’s changing life circumstances, response to treatment, or other designated indicator. The treatment is adapted via decision rules that input the values of the tailoring variables and output recommended treatment. The decision rules are developed from previous studies.**Inference:**The process of drawing conclusions based on observed data; this may involve confidence intervals or hypothesis testing.**Multi-Stage Decision Making:**Some intervention programs can be divided into multiple stages, where at each stage a decision is made regarding treatment. Examples of decisions include how to deliver the treatment, the type of treatment, and whether to augment treatment. Each decision relies on current and past information on the individual and his/her life circumstances. The goal of these decisions is to achieve high quality outcomes.**Reinforcement Learning:**An area of computer science that focuses on how best to use data to inform multi-stage decision making.**Tailoring Variable:**The patient information used by an adaptive intervention to adapt and re-adapt treatment to the individual.**Treatment Policies:**(a.k.a. “adaptive treatment strategy” or “dynamic treatment regimen” or "treatment policy") A course of time-varying treatment designed to adapt to a individual’s changing life circumstances, response to treatment, or other designated indicator. The treatment is adapted via decision rules that input the values of the tailoring variables and output recommended treatment. The decision rules are developed from previous studies.

Return to Adaptive Interventions/SMART page

### Optimizing Behavioral Interventions/MOST

**Aliasing:**A phenomenon that occurs in fractional factorial designs whereby certain effects can be estimated only in combination.**Behavioral Intervention:**A program aimed at modifying behavior for the purpose of preventing/treating disease, promoting health, and/or enhancing well-being.**Continuous Optimization Principle:**The principle that optimization is an ongoing process. Every behavioral intervention is a perennial work in progress. Once an intervention has been optimized, a cycle of MOST should be started again to effect further improvements. An intervention can always be made more potent or more efficient.**Dummy Coding:**An approach to coding independent variables in regression analysis in which the categories are assigned values of either 0 or 1.**Effect Coding:**An approach to coding independent variables in regression analysis in which the categories are assigned values of 1, 0, or -1.**Factorial Experiment:**An experimental design in which two or more categorical independent variables are each studied at two or more levels (Vogt, 1993, p. 90).**Fractional Factorial Experiment:**Special case of factorial design. Fractional factorials always involve a simple fraction (e.g. ½, ¼) of the experimental conditions in the corresponding complete factorial design.**Interaction:**An interaction is present in a factorial experiment when the joint effect of two or more factors cannot be expressed as the simple sum of their main effects.**Intervention Component:**Any part of an intervention that can reasonably be separated out for study. This definition is meant to be both broad and practical, because what constitutes a component can be very different in different situations.**Main Effect:**The effect of one factor averaged across all levels of all other factors in a factorial experiment.**Multiphase Optimization Strategy (MOST):**An engineering-inspired approach to building, optimizing, and evaluating multicomponent behavioral interventions.**Optimize:**According to the online*Concise Oxford Dictionary of Mathematics*, the optimized solution is “the best possible solution… subject to given constraints.”**Note that optimized does not mean best in an absolute or ideal sense.**The constraints are always part of the definition.**Optimization Criterion:**An operational definition of what is meant by “best possible solution, subject to constraints.” For examples, see the FAQ question: "What is an optimization criterion?".**Randomized Clinical Trial (RCT):**An experimental design in which a limited number of experimental conditions (often two) are directly compared.**Resource Management Principle**: The principle that research resources—money, experimental subjects, time, equipment, personnel, etc.—must be managed strategically to move intervention science forward fastest.

Return to Optimizing Behavioral Interventions/MOST page

### High-Dimensional Data Analysis and Variable Selection

**High-Dimensional Data:**Data in which the number of variables is greater than the sample size.**Nonconvex Penalized Least Squares:**Variable selection procedures for linear regression models based on penalized least squares with a nonconvex penalty such as the SCAD.**Nonconcave Penalized Likelihood:**Penalized likelihood with a nonconcave penalty such as the SCAD that selects significant variables in likelihood-based models such as generalized linear models. This procedure simultaneously estimates coefficients and selects variables for a model with high-dimensional predictors. The procedure uses singularities at the origin to produce sparse solutions.**Smoothly Clipped Absolute Deviation (SCAD) Variable Selection Procedures**: Penalized least squares and penalized likelihood with the SCAD penalty to remove insignificant variables by estimating their coefficients as 0.**Ultrahigh-Dimensional Data:**Data with an extremely high number of variables, much greater than the sample size.