What is a Latent Class?
A latent class is a variable indicating underlying subgroups of individuals based on observed characteristics. Membership in the subgroup is said to be “latent” because membership in a class cannot be directly observed.
Introductory Example: Teen HealthRisk Behavior Data
This example is based on publicly available data from the Youth Risk Behavior Surveillance System. If you have experience analyzing data, you can download the data and PROC LCA and perform the analysis yourself. This example is explained in detail in chapter 2 of Latent Class and Latent Transition Analysis by Collins & Lanza (2010).
About this data
 Measured 12 heathrisk behaviors
 13,480 US high school students (grades 9  12)
 Collected in 2005
Participants responded to questions about sexual behavior, smoking behavior, alcoholconsumption behavior, and previous usage of other prevalent drugs, including marijuana, ecstasy, and cocaine. The 12 questions in the table below were the "items" used to identify the latent classes.
So, by looking at the table to the right, you can see that 25% of the students had five or more drinks in a row during the month prior to data collection. Also, 7% had sex before the age of 13. This information is interesting and potentially useful on its own, but it might be more useful if we could see common patterns of behavior among groups of students.
Proportion of Students Reporting Each Health Risk Behavior (Youth Risk Behavior, 2005; N = 13,840) 

Health Risk Behavior 
Proportion Responding Yes 
Smoked first cigarette before age 13 
.15 
Smoked daily for 30 days 
.12 
Has driven when drinking 
.11 
Had first drink before age 13 
.26 
>5 drinks in a row in past 30 days 
.25 
Tried marijuana before age 13 
.09 
Used cocaine in life 
.08 
Sniffed glue in life 
.12 
Used meth in life 
.06 
Used ecstasy in life 
.06 
Had sex before age 13 
.07 
Had sex with four or more people 
.17 
Note. Proportions are based on N responding to each question. The amount of missing data varied across questions 
LCA Mathematical Model (in brief terms)
The analysis was completed using a SAS procedure developed by The Methodology Center, PROC LCA. PROC LCA is easy to use and requires minimal syntax.
Read about the LCA mathematical model
Selecting the Proper Number of Classes
The model is selected using information about fit.
To select the number of classes for the model, specify and run a 2class model and repeat with 3 classes, 4 classes..., up to the highest plausible number of classes. From the results, information about fit (including log likelihood, degrees of freedom, G^{2}, AIC, BIC, CAIC, etc.) are compared to identify the optimal model. Also, the bootstrap likelihood ratio test can be used to compare models. The Methodology Center created a SAS macro to perform the bootstrap likelihood ratio test for PROC LCA users.
Results
The analysis reveals the classes; the researcher interprets and labels them.
In LCA, the responses of all participants to all items are analyzed. A specified latent class model is fit to the data, and the parameter estimates are obtained. Once the number of classes is selected, the output includes the probability of a response to EACH risk item in the inventory for each latent class. In other words, you will see the probability that members of each class had of engaging in each risky behavior. (See the table below.) For this analysis, a fiveclass model was selected, which means that the analysis revealed five latent subgroups in the population of teens. The scientists interpreted the results and assigned the following labels to the groups:
 67% of the respondents fell into the Low Risk class.
 14% were in the Binge Drinkers class.
 9% were in the Early Experimenters class.
 5% were in the High Risk class.
 4% were in the SexualRisk Takers class.
Note that the totals add up to 100% (within rounding) because in theory, every individual belongs to one and only one class.
But what do these categories mean, and how were the labels arrived at? Below is the table of itemresponse probabilities, which indicate the likelihood that teens in a given class reported in engaging in a particular risky behavior. These probabilities provide the basis for labeling the classes.
FiveLatentClass Model of Health Risk Behaviors: Probability of engaging behaviors for each subgroup (Youth Risk Behavior Surveillance System Data; N = 13,840) 


Latent Class 


Low Risk 
Early Experimenters 
Binge Drinkers 
Sexual RiskTakers 
High Risk 

67% 
09% 
14% 
04% 
05% 
Smoked first cigarette before age 13 
.04 
.76* 
.11 
.17 
.64 
Smoked daily for 30 days 
.02 
.31 
.27 
.12 
.66 
Has driven when drinking 
.01 
.15 
.42 
.11 
.45 
Had first drink before age 13 
.14 
.79 
.21 
.39 
.68 
≥5 drinks in a row in past 30 days 
.08 
.48 
.74 
.16 
.79 
Tried marijuana before age 13 
.01 
.46 
.03 
.22 
.55 
Used cocaine in life 
.00 
.07 
.19 
.03 
.88 
Sniffed glue in life 
.06 
.22 
.19 
.04 
.58 
Used meth in life 
.00 
.02 
.10 
.01 
.73 
Used ecstasy in life 
.00 
.06 
.11 
.06 
.64 
Had sex before age 13 
.01 
.18 
.00 
.81 
.30 
Had sex with four or more people 
.06 
.24 
.29 
.83 
.56 
*Itemresponse probabilities >.5 in bold to facilitate interpretation. 
The analysis provides information about patterns of risky behavior and the prevalence of those patterns.
All responses where a group member was more likely to reply “yes” are indicated by a number larger than .50 (.50 would indicate half of the group members said “yes” and half said “no”). So .04 next to “Smoked first cigarette before age 13” in the Low Risk column means that members of the Low Risk group had a 4% chance of saying they had smoked prior to age 13. The table shows that members of the Low Risk group were very unlikely to report any risk behavior, but their most prevalent behavior is having had an alcoholic drink before age 13 (14%). Members of the Binge Drinkers group were most likely to have had 5 or more drinks on one occasion (74%) but were also significantly more likely to have driven while drinking (42%) than the Low Risk group (1%). Early Experimenters had a high probability of experimenting with alcohol and tobacco before age 13, but they had a lower than 50% chance of participation in all other risks. Still, Early Experimenters had a higher likelihood of engaging in each risk behavior than members of the Low Risk group. This analysis provides information about the combinations of risks youth are likely to be exposed to, and the proportion of youth exposed to the risks.
NOTE: The names of the groups were assigned by the scientists based on the results of the LCA. The analysis divides the groups empirically; scientists label the groups based on what the groups indicates about the data.
Read about more Center applied LCA research or Center methodological & technical LCA research.
See our recommended reading for LCA.
Reference
Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences. Hoboken, NJ: John Wiley & Sons, Inc.