Factorial Experiments: Frequently Asked Questions

Click on a question to display the answer.

 

Read an introduction to factorial designs aimed at investigators with a background in the RCT, or refer to Collins, Dziak, Kugler, and Trail (in press).

 

Collins, L. M., Dziak, J. D., Kugler, K. C., & Trail, J. B. (in press). Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine. 

Fractional factorial designs are noteworthy special cases of factorial designs. They are called fractional factorials because they always involve a simple fraction (e.g., ½, ¼) of the experimental conditions in the corresponding complete design. These designs have a number of properties that have been determined by statisticians. One property is that when there are equal numbers of subjects in each experimental condition, the design is balanced, meaning that every level of a factor appears the same number of times at every level of each of the other factors. This balance property is shared by complete factorial designs, and is why fractional and complete factorial experiments make very efficient use of experimental subjects.

 

As laid out in Collins, Dziak, and Li (2009), any design that can be arrived at by starting with a complete factorial and removing experimental conditions is an incomplete factorial. The term “fractional factorial” refers to incomplete factorials that share the balance property of the corresponding complete factorial. Thus, by this definition, all fractional factorials are incomplete factorials, but not all incomplete factorials are fractional factorials.

 

As shown in Collins et al., many designs in common use in intervention science are incomplete factorials but are not fractional factorials. One example is the comparative treatment design (e.g., Behar & Borkovec, 2003). Typically, incomplete factorials that are not fractional factorials involve fewer experimental conditions, which can be an advantage if experimental condition overhead costs are high. However, these designs are not balanced, and typically require many more subjects than a comparable factorial design to achieve the same level of power. They also have some properties that may be less desirable than those of fractional factorials. Finally, it should be noted that designs like the comparative treatment design do not estimate main effects, although the effects they estimate may be what is desired in a given situation. For more about all this, see Collins et al. (2009).

 

References

Behar, E. S., & Borkovec, T. D. (2003). Psychotherapy outcome research. Handbook of psychology, 213–240.

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202-224. doi:10.1037/a0015826 PMCID: PMC2796056

 

Everything we say about factorial experiments on this web site is based on using effect (-1,1) coding. However, a lot of behavioral scientists have been trained to use dummy (0,1) coding. Dummy coding has its place, particularly in one-way ANOVA and in non-experimental situations, but for component selection experiments in MOST it is essential to use effect coding. It may not seem like using (-1,1) instead of (0,1) would be a very big deal, but it turns out that it is. Oddly, this difference seems to be discussed only rarely and not very explicitly.

 

Whether you use effect codes or dummy codes to perform an ANOVA within a regression framework (say, PROC GLM in SAS), you will be interpreting the b-weights associated with the vectors of codes. When effect codes are used, these b-weights correspond to the textbook definitions of main effects and interactions. However, when dummy codes are used, the b-weights do not necessarily correspond to these textbook definitions. (Note the implication here: If you are doing hypothesis testing based on dummy codes, you may not be testing hypotheses about main effects and interactions.) In fact, under most circumstances, dummy-coded effects should not be referred to as main effects and interactions. We prefer to maintain a clear distinction by calling dummy-coded effects first-order effects, second-order effects, etc. (For simplicity, from now on we will refer to second-order effects, third-order effects, etc. as higher-order effects.)

 

People trained in using dummy coding often make two assertions about factorial ANOVA. The first is that it is impossible to interpret main effects if there are any substantial interactions. The second is that there is always less statistical power for tests of interactions than for main effects, with power decreasing as the number of factors involved in the interaction increases (e.g., less power for three-way interactions than two-way interactions). These statements may be attributable to a failure to distinguish between first-order effects and main effects, and between higher-order effects and interactions. It is true that when dummy coding is used, it is impossible to interpret first-order effects if there are any substantial higher-order effects. When dummy coding is used, the first-order effects and higher-order effects are correlated. You can see how this could make interpretation of a first-order effect difficult if the higher-order effects were substantial.
 
However, dummy-coded first-order effects and higher-order effects are not usually the same as effect-coded main effects and interactions. When there are equal ns in each experimental condition, all of the effect-coded main effects and interactions are orthogonal. Although interactions must always be taken into account thoughtfully when interpreting main effects, this orthogonality means that a main effect is not necessarily contaminated or rendered uninterpretable by interactions. In addition, when effect coding is used, the power associated with each effect, whether a main effect or interaction, is identical given the same effect size (when effect size is expressed as a regression coefficient). Thus with effect coding there is not necessarily less power available for interactions than for main effects; this will depend on effect size. Of course, very little is known about interactions in behavioral science, so we don’t know whether they are likely to have effect sizes comparable to main effects, or whether the effect sizes for interactions are likely to be smaller. If the effect sizes for the interactions are smaller than those for the main effects, the power associated with the interactions will be correspondingly smaller.

 

There are several reasons to consider conducting a factorial experiment:

  • There are several intervention components you wish to examine experimentally.
  • The costs associated with subjects are high relative to the overhead costs associated with each experimental condition, or subjects are difficult to get or limited in number.
  • You are interested in examining interactions between intervention components.

 

Here are some reasons you might want to consider a design other than a factorial experiment:

  • You need the results of an experiment on one component before you can move to the next.
  • You are certain you can implement only a very small number of experimental conditions, say 2 or 3.
  • Not every combination of intervention components is implementable, because, some combinations do not make sense, can be expected to be toxic, etc. (Note that often the components can be reframed to make a factorial experiment possible.)
  • Each intervention component is expected to exert such a small main effect that it would prohibitive to power the experiment to detect these effects. (Note that hypothesis testing per se is not necessarily the point of component selection experiments.)
  • The intervention is to consist of a single component and you need to decide which one, OR there are only a few (say, 3-4) combinations of intervention components under consideration. (In this case you may want a comparative treatment design, but this is likely to require more subjects than a factorial experiment.)

 

You might want to consider a fractional factorial experiment if a complete factorial design is conceptually suitable for your research questions and one or more of the following is true:

  • there is an upper limit on the number of experimental conditions you can implement, and this upper limit is 8 or greater;
  • you want to take advantage of the efficiency factorial designs offer in terms of use of subjects, and also want to economize on experimental condition overhead costs; and/or
  • you have to use cluster randomization and don’t have enough clusters to populate a complete factorial design. (More about this can be found in Dziak, Nahum-Shani, & Collins, 2012.)

 

Reference

Dziak, J. J., Nahum-Shani, I., & Collins, L. M. (2012). Multilevel factorial experiments for developing behavioral interventions:  Power, sample size, and resource considerations. Psychological Methods. Advance online publication. doi: 10.1037/a0026972 PMCID: PMC3351535

This decision is made on statistical grounds, not intuitive grounds. A wide variety of fractional factorial designs has already been determined and tabled, each with a specific set of experimental conditions. The experimenter chooses one of these designs that has desirable properties for the situation at hand. The choice is made either by looking up designs in a book (e.g., Wu & Hamada, 2011) or, more likely, using software such as PROC FACTEX.

 

Some important considerations when selecting a fractional factorial design are how many experimental conditions the design requires, which effects are aliased with the effects of primary scientific interest, and how many effects are aliased with each effect of primary scientific interest.

 

The best fractional factorial design is the one that is the most economical WHILE enabling satisfactory estimation of the effects of primary scientific interest.

 

Reference

Wu, C. F., & Hamada, M. (2000). Experiments: Planning, analysis, and parameter design optimization. New York: Wiley.

 

Whenever experimental conditions are removed from a factorial design, either in a fractional factorial design or an incomplete factorial design (e.g., a comparative treatment design), some effects become combined, or aliased (for more details, see Collins, Dziak, & Li, 2009, or Chakroborty et al., 2009). In fractional factorial designs it is known which effects are aliased, enabling the investigator to choose a design that involves aliasing that the investigator finds tolerable.

 

To make effective use of fractional factorial designs, it is necessary

  •   that the effects of primary scientific interest are main effects and lower-order interactions, and
  •   to assume that higher-order interactions are negligible in size. 

 

The strategy is to choose a design that aliases the effects of primary scientific interest with the negligible higher-order interactions. Then (assuming the assumptions are correct) the estimates of the effects of scientific interest will not be very different from what they would have been in a much more expensive complete factorial experiment.

 

A fractional factorial design can be identified by using tables in books (such as Wu & Hamda, 2011), looking them up online, or (the easiest for most people) using software such as PROC FACTEX in SAS.

 

References

Chakraborty, B., Collins, L. M., Strecher, V. J., & Murphy, S. A. (2009). Developing multicomponent interventions using fractional factorial designs. Statistics in Medicine, 28(21), 2687-2708. doi:10.1002/sim.3643

Collins, L. M., Chakraborty, B., Murphy, S. A., & Strecher, V. J. (2009). Comparison of a phased experimental approach and a single randomized clinical trial for developing multicomponent behavioral interventions. Clinical Trials, 6(1), 5-15. doi:10.1177/1740774508100973

Wu, C. F., & Hamada, M. (2000). Experiments: Planning, analysis, and parameter design optimization. New York: Wiley.

 

A complete factorial experiment involves no aliasing of effects. On the other hand, it can be expensive if experimental condition overhead costs are high.

 

A fractional factorial experiment is usually less costly than a complete factorial. On the other hand, it will always involve aliasing of effects.

 

Here is a slightly different perspective on this:

 

Given the level of resources available, you can investigate more factors with a fractional factorial experiment than with a complete factorial experiment. The “price” you pay for the additional factors is that some effects will be aliased. If the aliasing is likely to make your scientific conclusions shaky, then it is not worth it. Only you can judge this. See "What is the resource management principle?" in the MOST FAQ section.

 

There are many different fractional factorial designs. The strategy in using fractional factorial designs is to choose one that aliases the effects of primary scientific interest with other effects that are not of scientific interest and are likely to be negligible in size. For example, in Collins et al. (2011) we chose a design that aliases the main effects with five-way interactions, and the two-way interactions with four-way interactions. We did not have any reason to predict that there would be large four-way or five-way interactions, so we feel comfortable with this. Thus when we look at a main effect estimate, even though it will actually be a combination of the main effect and a five-way interaction, we are willing to assume the effect is due primarily to the main effect.

 

This table summarizes the risks and potential payoffs. Note that there is a potential cost associated with the fractional factorial design, but also a potential opportunity cost associated with the "safer" complete factorial. 

 

 Decision

Reality

Higher-order interactions are negligible

Some higher-order interactions are non-negligible

Assume higher-order interactions negligible, conduct fractional factorial

Correct decision

Payoff: Good management of resources enables gaining additional scientific information and moving science forward faster

Incorrect decision

Cost: Possibility of some incorrect scientific conclusions

Assume higher-order interactions non-negligible, conduct complete factorial

Incorrect decision

Cost: Resources wasted; missed opportunity to gain more science information

Correct decision

Payoff: Accurate estimates of main effects and interactions enables moving science forward faster

If your theory predicts that an interaction will be substantial, or is particularly interesting, then you definitely do not want to choose a design that aliases it with another effect that is expected to be sizeable. 

 

If you are examining a set of k intervention components and your theory is silent about higher-order interactions, you have two choices.

  • You can assume that all of the interactions up to the k-way interaction are likely to be non-negligible in size, or declare them all to be of scientific interest. In this case, you should invest in a complete factorial experiment. Of course, you will want to make sure you have sufficient power to detect the interactions because you have declared them to be scientifically important and large. 
  • You can assume that none of the interactions are likely to be sizeable, and take advantage of the economy of a fractional factorial experiment.

 

Some investigators choose designs like the comparative treatment design because they do not want to deal with interactions. However, in these designs the interactions do not go away. Instead, they become confounded with main effects. This is not necessarily a bad thing, but it is good to be aware of it so you can make an informed decision about the design you want. For an explanation of this see Collins, Dziak, and Li (2009).

 

Reference

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14(3), 202-224. doi:10.1037/a0015826 PMCID: PMC2796056

You often recommend examining several intervention components in a single experiment. If I have five intervention components, this would involve 32 experimental conditions. I don’t see how a 32-condition experiment could have sufficient power without requiring a massive sample size.

 

A 32-condition RCT would require massive resources. However, a 32-condition factorial experiment is quite different from a 32-condition RCT. In Collins et al. (2011) we describe a 32-condition fractional factorial experiment (with six factors) that is quite adequately powered with N=512.

 

Read an informal introduction to factorial experiments aimed at those with a background in the RCT or refer to Collins, Dziak, Kugler, and Trail (in press).

 

You may also be interested in the page about writing grant proposals using MOST.

 

Reference

Collins, L. M., Baker, T. B., Mermelstein, R. J., Piper, M. E., Jorenby, D. E., Smith, S. S., Schlam, T. R., Cook, J. W., & Fiore, M. C. (2011). The multiphase optimization strategy for engineering effective tobacco use interventions. Annals of Behavioral Medicine, 41, 208-226. PMCID: PMC3053423  View abstract

Collins, L. M., Dziak, J. D., Kugler, K. C., & Trail, J. B. (in press). Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine. 

 

Yes! This is possible even when cluster randomization is necessary. For more about this see Dziak, Nahum-Shani, and Collins (2012).

 

Reference

Dziak, J., Nahum-Shani, I., & Collins, L. M. (2012). Multilevel factorial experiments for developing behavioral interventions: Power, sample size, and resource considerations. Psychological Methods. Advance online publication. doi: 10.1037/a0026972 PMCID: PMC3351535

Follow Us