LCA Applied Example | The Methodology Center

LCA Applied Example

The latent variable “youth risk behavior” is measured by the observed variables “sex,” “drinking,” “smoking,” and “other drugs.” What is a Latent Class?

A latent class is a variable indicating underlying subgroups of individuals based on observed characteristics. Membership in the subgroup is said to be “latent” because membership in a class cannot be directly observed.


Introductory Example: Teen Health-Risk Behavior Data

This example is based on publicly available data from the Youth Risk Behavior Surveillance System. If you have experience analyzing data, you can download the data and PROC LCA and perform the analysis yourself. This example is explained in detail in chapter 2 of Latent Class and Latent Transition Analysis by Collins & Lanza (2010).


About this data

LCA Mathematical Model (in brief terms)

The analysis was completed using a SAS procedure developed by The Methodology Center, PROC LCA. PROC LCA is easy to use and requires minimal syntax.


Read about the LCA mathematical model


Selecting the Proper Number of Classes

The model is selected using information about fit.

To select the number of classes for the model, specify and run a 2-class model and repeat with 3 classes, 4 classes..., up to the highest plausible number of classes. From the results, information about fit (including log likelihood, degrees of freedom, G2, AIC, BIC, CAIC, etc.) are compared to identify the optimal model. Also, the bootstrap likelihood ratio test can be used to compare models. The Methodology Center created a SAS macro to perform the bootstrap likelihood ratio test for PROC LCA users.



The analysis reveals the classes; the researcher interprets and labels them.

In LCA, the responses of all participants to all items are analyzed. A specified latent class model is fit to the data, and the parameter estimates are obtained.  Once the number of classes is selected, the output includes the probability of a response to EACH risk item in the inventory for each latent class. In other words, you will see the probability that members of each class had of engaging in each risky behavior. (See the table below.) For this analysis, a five-class model was selected, which means that the analysis revealed five latent subgroups in the population of teens. The scientists interpreted the results and assigned the following labels to the groups:


  • 67% of the respondents fell into the Low Risk class.
  • 14% were in the Binge Drinkers class.
  • 9% were in the Early Experimenters class.
  • 5% were in the High Risk class.
  • 4% were in the Sexual-Risk Takers class.


Note that the totals add up to 100% (within rounding) because in theory, every individual belongs to one and only one class.


But what do these categories mean, and how were the labels arrived at? Below is the table of item-response probabilities, which indicate the likelihood that teens in a given class reported in engaging in a particular risky behavior. These probabilities provide the basis for labeling the classes.


Five-Latent-Class Model of Health Risk Behaviors: Probability of engaging behaviors for each subgroup

(Youth Risk Behavior Surveillance System Data; N = 13,840)


Latent Class


Low Risk

Early Experi-menters

Binge Drinkers

Sexual Risk-Takers

High Risk






  Smoked first cigarette before age 13






  Smoked daily for 30 days






  Has driven when drinking






  Had first drink before age 13






  ≥5 drinks in a row in past 30 days






  Tried marijuana before age 13






  Used cocaine in life






  Sniffed glue in life






  Used meth in life






  Used ecstasy in life






  Had sex before age 13






  Had sex with four or more people






*Item-response probabilities >.5 in bold to facilitate interpretation.


The analysis provides information about patterns of risky behavior and the prevalence of those patterns.

All responses where a group member was more likely to reply “yes” are indicated by a number larger than .50 (.50 would indicate half of the group members said “yes” and half said “no”). So .04 next to “Smoked first cigarette before age 13” in the Low Risk column means that members of the Low Risk group had a 4% chance of saying they had smoked prior to age 13. The table shows that members of the Low Risk group were very unlikely to report any risk behavior, but their most prevalent behavior is having had an alcoholic drink before age 13 (14%). Members of the Binge Drinkers group were most likely to have had 5 or more drinks on one occasion (74%) but were also significantly more likely to have driven while drinking (42%) than the Low Risk group (1%). Early Experimenters had a high probability of experimenting with alcohol and tobacco before age 13, but they had a lower than 50% chance of participation in all other risks. Still, Early Experimenters had a higher likelihood of engaging in each risk behavior than members of the Low Risk group. This analysis provides information about the combinations of risks youth are likely to be exposed to, and the proportion of youth exposed to the risks.


NOTE: The names of the groups were assigned by the scientists based on the results of the LCA. The analysis divides the groups empirically; scientists label the groups based on what the groups indicate about the data.


Read about more Center applied LCA research or Center methodological & technical LCA research.


See our recommended reading for LCA.


Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences. Hoboken, NJ: John Wiley & Sons, Inc.

Like Us On Facebook or Tweet This Page