September 18, 2012
Although it is commonly written in textbooks, researchers sometimes forget that how a categorical variable is coded determines the interpretation of its associated beta coefficient in regression analyses. In a new technical report, “Effect Coding Versus Dummy Coding in Analysis of Data From Factorial Experiments,” Methodology Center researchers Kari Kugler, Jessica Trail, John Dziak, and Linda Collins explain the differences between effect coding and dummy coding when the multiple regression approach is used to perform an ANOVA.
This report is a tutorial for researchers on the impact of using different coding methods. The two most common ways of coding, dummy coding (0, 1) and effect coding (-1, 1) are compared and contrasted. In short, the two coding schemes yield the same omnibus F; however, they yield different estimates, F statistics, and p-values for the individual effects (except for the highest order interaction). The authors note that while neither coding scheme is right or wrong, it is important to understand exactly which effects are being estimated when a categorical variable is quantified. In addition, they discuss what estimates are being provided by different statistical software packages.