# LCA Mathematical Model

This analysis was completed using SAS software and The Methodology Center's PROC LCA. View example PROC LCA syntax.

*NOTE: After you read this page, you may want to return to selecting the proper number of classes** on the example page.*

Latent class analysis relies on a contingency table created by cross-tabulating all indicators of the latent class variable. Suppose we estimate a latent class model with *n _{c}* classes from a set of

*M*dichotomous items. Suppose also that we include in the model a covariate denoted

*X*which may be either continuous or dichotomous (0 or 1 coded). Let the vector

*Y*= (

_{i}*Y*,...,

_{i1}*Y*) represent individual

_{iM}*i*'s responses to the

*M*items, where the possible values of

*Y*are 1, …,

_{im}*r*. Let

_{m}*L*

_{i }= 1,2,...,

*n*be the latent class membership of individual

_{c}*i*, and let

*I(y = k)*be the indicator function; that is, a function that equals 1 if

*y*equals

*k*, and 0 otherwise. Suppose we let the last class be the reference class. Let

*X*represent the value of the covariate for individual

_{i}*i*; the covariate may be related to the probability, γ, of membership in each latent class, but is assumed to be otherwise unrelated to

*Y*. Then the contribution by individual

_{i}*i*to the likelihood is

.

The *β* parameters are the coefficients in logistic regressions using the covariate *X* to predict latent class membership. The* *γ parameters can be expressed as functions of the β parameters as follows:

for = 1,... ,*n _{c}*. Note that the last two expressions on the right are equal because we assume that the last (i.e., the

*n*

_{c}^{th}) class is used as the reference class. The reference class has its

*β*s constrained to zero, because the relative probabilities of being in the other classes are being compared to the probability of this reference class. It is necessary to choose one class and set its

*β*s to zero for the sake of model identifiability, because of the natural constraint that the probabilities for all classes must sum to one for each individual. The choice of reference class does not affect the final fitted probability estimates for any individual or class. This model allows us to estimate the log odds that individual

*i*falls in latent class relative to the baseline class. For example, if class 2 is the reference class, then the log odds of membership in class 1 relative to class 2 for an individual with value on the covariate is

Exponentiated *β* parameters are odds ratios, reflecting the increase in odds of class membership (relative to reference class *n _{c}*) corresponding to a one-unit increase in the covariate. Note that multiple covariates can be included simultaneously, just as in logistic regression.