Spotfire Statistica® Design of Experiments

Experimental methods are used in agriculture, food & beverage, chemical, health sciences, manufacturing, marketing, power, and many other industries to optimize processes. Specifically, the goal of these methods is to identify the optimum settings for the different factors that affect the process. In this article we will describe the classes of designs that are typically used in experimentation.

decrease variability in clothing dye
the optimum texture of fish patties as a result of the relative proportions of different types of fish; Mullet, Sheepshead, and Croaker
quickly identify which factors are important and most likely to yield improvements
test a marketing campaign
maximize the yield of chemical reaction
cash-flow optimization

Note: DOE and Distributions & Simulation modules can be used together to run experiments with simulated data.

The major classes of designs that are typically used in experimentation are: 2^(k-p) (two-level, multi-factor) designs, 2-level screening (Plackett-Burman) designs for large numbers of factors, 3^(k-p) (three-level, multi-factor) designs (mixed designs with 2- and 3-level factors are also supported), central composite, non-factorial, surface designs, Latin squares, Greco-Latin squares designs, Taguchi robust design experiments (orthogonal arrays) analysis, mixture designs and triangular surfaces, D- and A-optimal design, and special procedures for constructing experiments in constrained experimental regions.

Additionally, the Bayesian reliability approach, as put forth by Peterson (2004), is available via the workspace. This approach explicitly takes into account the correlation structure of the data, the variability of the process distribution, and the model parameter uncertainty.

Interestingly, many of these experimental techniques have made their way from the production plant into management, and successful implementations have been reported in profit planning in business, cash-flow optimization in banking, etc. (e.g., see Yokoyama and Taguchi, 1975).

2^(k-p) Fractional Factorial Designs at 2 Levels

In many cases, it is sufficient to consider the factors affecting the production process at two levels. For example, the temperature for a chemical process may either be set a little higher or a little lower, the amount of solvent in a dyestuff manufacturing process can either be slightly increased or decreased, etc. The experimenter would like to determine whether any of these changes affect the results of the production process. The most intuitive approach to study those factors would be to vary the factors of interest in a full factorial design, that is, to try all possible combinations of settings. This would work fine, except that the number of necessary runs in the experiment (observations) will increase geometrically.

For example, if you want to study 7 factors, the necessary number of runs in the experiment would be 2⁷ = 128. To study 10 factors you would need 2¹⁰ = 1,024 runs in the experiment. Because each run may require time-consuming and costly setting and resetting of machinery, it is often not feasible to require that many different productions runs for the experiment. In these conditions, fractional factorials are used that "sacrifice" interaction effects so that main effects may still be computed correctly.

2^(k-p) Fractional Factorial Designs at 2 Levels - Plackett-Burman (Hadamard Matrix) Designs

When you need to screen a large number of factors to identify those that may be important (i.e., those that are related to the dependent variable of interest), then you want to employ a design that allows one to test the largest number of factors main effects with the least number of observations, that is to construct a resolution III design with as few runs as possible. One way to design such experiments is to confound all interactions with "new" main effects. Such designs are also sometimes called saturated designs, because all information in those designs is used to estimate the parameters, leaving no degrees of freedom to estimate the error term for the ANOVA. Because the added factors are created by equating (aliasing), the "new" factors with the interactions of a full factorial design, these designs always will have 2^k runs (e.g., 4, 8, 16, 32, and so on).

Plackett and Burman (1946) showed how full factorial design can be fractionalized in a different manner, to yield saturated designs where the number of runs is a multiple of 4, rather than a power of 2. These designs are also sometimes called Hadamard matrix designs. Of course, you do not have to use all available factors in those designs, and, in fact, sometimes you want to generate a saturated design for one more factor than you are expecting to test. This will allow you to estimate the random error variability and test for the statistical significance of the parameter estimates.

2^(k-p)Maximally Unconfounded and Minimum Aberration Designs

Users can search for Maximum R plus maximum unconfounded and Maximum R plus minimum aberration designs that best suit the problem.

2^(k-p) fractional factorial designs are often used in industrial experimentation because of the economy of data collection that they provide. For example, suppose an engineer needed to investigate the effects of varying 11 factors, each with 2 levels, on a manufacturing process. Let us call the number of factors k, which would be 11 for this example. An experiment using a full factorial design, where the effects of every combination of levels of each factor are studied, would require 2^(k)experimental runs, or 2048 runs for this example. To minimize the data collection effort, the engineer might decide to forego investigation of higher-order interaction effects of the 11 factors and focus instead on identifying the main effects of the 11 factors and any low-order interaction effects that could be estimated from an experiment using a smaller, more reasonable number of experimental runs. There is another, more theoretical reason for not conducting huge, full factorial 2-level experiments. In general, it is not logical to be concerned with identifying higher-order interaction effects of the experimental factors, while ignoring lower-order nonlinear effects, such as quadratic or cubic effects, which cannot be estimated if only 2 levels of each factor are employed. So although practical considerations often lead to the need to design experiments with a reasonably small number of experimental runs, there is a logical justification for such experiments.

The alternative to the 2^k full factorial design is the 2^(k-p) fractional factorial design, which requires only a "fraction" of the data collection effort required for full factorial designs. For our example with k=11 factors, if only 64 experimental runs can be conducted, a 2^(11-5) fractional factorial experiment would be designed with 26 = 64 experimental runs. In essence, a k - p = 6 way full factorial experiment is designed, with the levels of the p factors being "generated" by the levels of selected higher order interactions of the other 6 factors. Fractional factorials "sacrifice" higher order interaction effects so that lower order effects may still be computed correctly. However, different criteria can be used in choosing the higher order interactions to be used as generators, with different criteria sometimes leading to different "best" designs. You can use different criteria and options in searching for the "best" 2^(k-p) design that suits your needs.

2^(k-p) fractional factorial designs can also include blocking factors. In some production processes, units are produced in natural "chunks" or blocks. To make sure that these blocks do not bias your estimates of the effects of the k factors, blocking factors can be added as additional factors in the design. Consequently, you may "sacrifice" additional interaction effects to generate the blocking factors, but these designs often have the advantage of being statistically more powerful, because they allow you to estimate and control the variability in the production process that is due to differences between blocks.

3^(k-p), Box-Behnken, and Mixed 2 and 3 Level Factorial Designs

In some cases, factors that have more than 2 levels have to be examined. For example, if one suspects that the effect of the factors on the dependent variable of interest is not simply linear, then at least 3 levels are needed to test for the linear and quadratic effects (and interactions) of those factors. Also, sometimes some factors may be categorical in nature, with more than 2 categories. For example, you may have three different machines that produce a particular part.

The Experimental Design module contains a complete implementation of the standard (blocked) 3(k-p) designs enumerated by Connor and Zelen and mixed 2 and 3-level designs described by Connor and Young (see McLean and Anderson, 1984) for the National Bureau of Standards of the U.S. Department of Commerce.

Bayesian Reliability Optimization for Continuous/Binary Response

The Bayesian Reliability Optimization nodes address problems with current frequentist response optimization methods. The nodes implement a Bayesian reliability approach as put forth by Peterson (2004) that explicitly takes into account the correlation structure of the data, the variability of the process distribution, and the model parameter uncertainty. There are two nodes available depending on the type of response variables, continuous and binary.

Central Composite and Non-Factorial Response Surface Designs

The 2^(k-p) and 3^(k-p) designs all require that the levels of the factors are set at, for example, 2 or 3 levels. In many instances, such designs are not feasible, because, for example, some factor combinations are constrained in some way (e.g., factors A and B cannot be set at their high levels simultaneously). Also, for reasons related to efficiency, which will be discussed shortly, it is often desirable to explore the experimental region of interest at particular points that cannot be represented by a factorial design.

The designs (and how to analyze them) discussed in this section all pertain to the estimation (fitting) of response surfaces, following the general model equation:

y = b0 +b1 *x1 +...+bk *xk + b12 *x1 *x2 +b13 *x1 *x3 +...+bk-1,k *xk-1 *xk + b11 *x1² +...+bkk *xk²

Put into words, one is fitting a model to the observed values of the dependent variable y, which include (1) main effects for factors x₁, ..., x_k, (2) their interactions (x₁*x₂, x₁*x₃, ...,x_k-1*x_k), and (3) their quadratic components (x₁², ..., x_k²). No assumptions are made concerning the "levels" of the factors, and you can analyze any set of continuous values for the factors.

There are some considerations concerning design efficiency and biases, which have led to standard designs that are ordinarily used when attempting to fit these response surfaces, and those standard designs (e.g., see Box, Hunter, and Hunter, 1978; Box and Draper, 1987; Khuri and Cornell, 1987; Mason, Gunst, and Hess, 1989; Montgomery, 1991). But in the context of constrained surface designs and D- and A-optimal designs, these standard designs can sometimes not be used for practical reasons. However, the central composite design analysis options do not make any assumptions about the structure of your data, that is, the number of distinct factor values, or their combinations across the runs of the experiment. These options can be used to analyze any type of design, to fit the data in the general model described above.

Constructing D- and A-Optimal Designs

In standard factorial designs (2^(k-p) and 3^(k-p)) and Central Composite Designs, the property of orthogonality of factor effects is important. When the factor level settings for two factors in an experiment are uncorrelated (i.e. varied independently of each other) then they are said to be orthogonal to each other. If you are familiar with matrix and vector algebra, two column vectors X₁ and X₂ in the design matrix are orthogonal if X_1'*X₂= 0). Intuitively, it is clear that one can extract the maximum amount of information regarding a dependent variable from the experimental region (the region defined by the settings of the factor levels), if all factor effects are orthogonal to each other. Conversely, suppose one ran a four-run experiment for two factors as follows:

The columns of factor settings for X₁ and X₂ are identical to each other (their correlation is 1), and there is no way in the results to distinguish between the main effect for X₁and X₂.

The D- and A-optimal design procedures provide various options to select from a list of valid (candidate) points (i.e., combinations of factor settings) those points that will extract the maximum amount of information from the experimental region, given the respective model that you expect to fit the data. You need to supply the list of candidate points, for example, the vertex and centroid point computed by the Designs for constrained surface and mixtures option within Statistica. Then you will need to specify the type of model you expect to fit the data, and the number of runs for the experiment. The Experimental Design module will then construct a design with the desired number of cases, that will provide as much orthogonality between the columns of the design matrix as possible.

The reasoning behind D- and A-optimality is discussed, for example, in Box and Draper (1987, Chapter 14). The different algorithms used for searching for optimal designs are described in Dykstra (1971), Galil and Kiefer (1980), and Mitchell (1974a, 1974b). A detailed comparison study of the different algorithms is discussed in Cook and Nachtsheim (1980).

D-Optimal Split Designs

The standard split plot design is characterized by two sizes of experimental units. Split plot designs began in agriculture where one factor was typically applied to one large plot of land (e.g. fertilizer). This factor was called the whole plot factor. Another factor was applied within the whole plot (e.g. seed variety). This factor was referred to as the sub plot factor. The two experimental units in this case are the whole plot and the sub plot. Since there are two sizes of experimental units, there are two sources of experimental error. This extra source of error affects the subsequent hypothesis tests that are performed.

Split plot designs extend beyond the agricultural setting from which they were originally conceived. It is quite common to encounter these designs in an industrial setting where an experimenter has two factors, where one is considered hard to change and the other is a factor that is easy to change. The hard to change factor may not be reset every experimental run due to complications, whereas the easy to change factor is reset every run.

The following illustration comes from Kowalski and Potcner (2003). Consider an experiment where you are trying to determine the water resistance property of wood. There are two factors that are considered: pretreatment and stain. There are two types (levels) of pretreatment and four types (levels) of stain that can be applied to the wood. It is easiest to randomly apply the pretreatments to a whole board and then divide the board into four individual pieces. The stain is then randomly applied to each individual piece. Due to the different levels of randomization, there are two distinct sizes of experimental units, the board and the individual pieces. Since there are two different sizes of experimental units there are two sources of error.

For a split-plot experiment with sample size n and b whole plots, the linear mixed model can be written as:

g = Xb + Zg + ?

where X represents the n×p fixed design matrix containing the settings of both the whole-plot factors w and the sub-plot factors s and their model expansions, b is a p-dimensional vector containing the p fixed effects in the model, Z is an n × b matrix of zeroes and ones assigning the n runs to the b whole plots, g is the b-dimensional vector containing the random effects of the b whole plots, and ? is the n-dimensional vector containing the random errors.

Split plot designs can be analyzed in Statistica using either Variance Estimation and Precision or General Linear Models (GLM) modules. Variance Estimation and Precision offer two methods of parameter estimation:

Traditional ANOVA method
Restricted Maximum Likelihood or REML.

REML is a newer method that is typically recommended to analyze these designs as well as other more general mixed models (i.e. models that contain both random and fixed effects). GLM offers the traditional ANOVA approach only.

Note: Spotfire Statistica Professional and Spotfire Statistica Expert - Data Science products do not include the Variance Estimation and Precision module. If this type of analysis is important, then Spotfire Statistica Expert - Quality Control or Spotfire Statistica Enterprise should be purchased.

Designs for Constrained Surfaces and Mixtures

In the context of mixture designs, it often happens in real-world studies that the experimental region of interest is constrained. All factors settings cannot combine with all settings for the other factors in the study. The Experimental Design module contains an implementation of an algorithm suggested by Piepel (1988) and Snee (1985) for finding the vertices and centroids for such constrained regions.

Latin Square Designs

Latin square designs (the term Latin square was first used by Euler, in 1782) is used when the factors of interest have more than two levels and you know ahead of time that there are no (or only negligible) interactions between factors. For example, if you wanted to examine the effect of 4 fuel additives on reduction in oxides of nitrogen and had 4 cars and 4 drivers at your disposal, then you could of course run a full 4 x 4 x 4 factorial design, resulting in 64 experimental runs. However, you are not really interested in any (minor) interactions between the fuel additives and drivers, fuel additives and cars, or cars and drivers. You are mostly interested in estimating the main effects, in particular the one for the fuel additives factor. At the same time, you want to make sure that the main effects on drivers and cars do not affect (bias) your estimate of the main effect of the fuel additive.

If you labeled the additives with the letters A, B, C, and D, the Latin square design that would allow you to derive unconfounded main effects estimates could be summarized as follows (see also Box, Hunter, and Hunter, 1978, page 263):

LatinSquareDesignSampleStructure.png.ac4effac1724b86bcc79cc27ad6c7712.png

Mixture Designs and Triangular Surfaces

Special issues arise when analyzing mixtures of components that must sum to a constant. For example, if you wanted to optimize the taste of a fruit-punch, consisting of the juices of 5 fruits, then the sum of the proportions of all juices in each mixture must be 100%. Thus, the task of optimizing mixtures commonly occurs in food-processing, refining, or the manufacturing of chemicals. A number of designs have been developed to address specifically the analysis and modeling of mixtures (see, for example, Cornell, 1990a, 1990b; Cornell and Khuri, 1987; Deming and Morgan, 1993; Montgomery, 1991).

Parameter Estimation / Unbalanced Design

The parameter estimation for 2^(k-p) and Plackett-Burman designs, 3^(k-p) and Box-Behnken designs mixed 2 and 3 level full and fractional factorial designs, central composite, and response surface designs, and mixture designs are accomplished via sweeping (e.g., see Dempster, 1969). Therefore, the design does not need to be balanced to estimate the parameters; however, if it is unbalanced then the parameter estimates are not independent of each other.

Profiling Predicted Responses and Response Desirability

A typical problem in product development is to find a set of conditions, or levels of the input variables, that produce the most desirable product in terms of its characteristics, or responses to the output variables. The procedures used to solve this problem generally involve two steps: 1) predicting responses on the dependent, or Y variables, by fitting the observed responses using an equation based on the levels of the independent, or X variables, and 2) finding the levels of the X variables that simultaneously produce the most desirable predicted responses on the Y variables. Derringer and Suich (1980) give, as an example of these procedures, the problem of finding the most desirable tire tread compound. There are a number of Y variables, such as PICO Abrasion Index, 200 percent modulus, elongation at break, and hardness. The characteristics of the product in terms of the response variables depend on the ingredients, the X variables, such as hydrated silica level, silane coupling agent level, and sulfur. The problem is to select the levels for the X's that will maximize the desirability of the responses on the Y's. The solution must take into account the fact that the levels for the X's that maximize one response may not maximize a different response.

Response/desirability profiling as an option when analyzing 2^(k-p) (two-level factorial) designs, 2-level screening designs, 2^(k-p) maximally unconfounded and minimum aberration designs, 3^(k-p) and Box Behnken designs, mixed 2 and 3 level designs, central composite designs, and mixture designs. Response/desirability profiling allows you to inspect the response surface produced by fitting the observed responses using an equation based on levels of the independent variables. You can inspect the predicted values for the dependent variables at different combinations of levels of the independent variables, specify desirability functions for the dependent variables, and search for the levels of the independent variables that produce the most desirable responses to the dependent variables.

Residuals Analysis

Extended residuals analysis is available as an option when analyzing 2^(k-p) (two-level factorial) designs, 2-level screening designs, 2^(k-p) maximally unconfounded and minimum aberration designs, 3^(k-p) and Box Behnken designs Mixed 2 and 3 level designs, central composite designs, and mixture designs. The option provides for extensive residuals analyses, allowing you to use a variety of diagnostic tools in inspecting different residual and predicted values with the goal of:

examine the adequacy of the prediction model
decide if transformations of the variables in the model are needed
existence of outliers in the data.

Residuals are the deviations of the observed values on the dependent variable from the predicted values, given the current model. The ANOVA models used in analyzing responses on the dependent variable make certain assumptions about the distributions of residual (but not predicted) values on the dependent variable. These assumptions can be summarized by saying that the ANOVA model assumes normality, linearity, homoscedasticity, and independence of residuals. All of these properties of the residuals for a dependent variable can be inspected.

Taguchi Methods: Robust Design Experiments

Taguchi methods are popular. The documented examples of sizable quality improvements that resulted from implementations of these methods (see, for example, Phadke, 1989; Noori, 1989) have added to the curiosity among American manufacturers. In fact, some of the leading manufacturers in this country have used these methods with success. For example, AT&T is using these methods in the manufacture of very large scale integrated (VLSI) circuits; also, Ford Motor Company has gained significant quality improvements due to these methods (American Supplier Institute, 1984 to 1988). However, as the details of these methods are becoming more widely known, critical appraisals are also beginning to appear (for example, Bhote, 1988; Tribus and Szonyi, 1989).

Taguchi's robust design methods are set apart from traditional quality control procedures and industrial experimentation in various respects. Of particular importance are:

The concept of quality loss functions,
The use of signal-to-noise (S/N) ratios, and
The use of orthogonal arrays.

Several books have been published on these methods, for example, Peace (1993), Phadke (1989), Ross (1988), and Roy (1990), to name a few. It is recommended that you refer to those books for further specialized discussions. Introductory overviews of Taguchi's ideas about quality and quality improvement can also be found in Barker (1986), Garvin (1987), Kackar (1986), and Noori (1989).

Analyzing Data Collected in Experiments

Statistica has additional methods for analyzing data collected in experiments and for fitting ANOVA/ANCOVA-like designs to continuous or categorical outcome variables. Or data can be simulated and then analyzed.

Linear/Non-Linear Models
- General Linear Models (GLM) and General Regression Models (GRM): Sophisticated model-building procedures (stepwise and best-subset selection of predictor effects)
- Generalized Linear Models (GLZ): Stepwise and best-subset selection of predictor effects in ANOVA/ANCOVA-like designs for various popular alternatives to linear least squares models such as logit, multinomial logit, and probit models.
Multivariate Exploratory Techniques
- General Discriminant Analysis Models (GDA): ANOVA/ANCOVA-like experimental designs for classification, and stepwise and best-subset selection of predictor effects. GDA includes desirability profiler and response optimization methods, which can be used to determine the factor combinations, levels, and/or values that maximize the posterior classification probabilities for one or more categories of the dependent variable.