Spotfire Statistica® General Linear Models

The General Linear Models (GLM) module provides techniques for analyzing any univariate or multivariate Analysis of Variance (ANOVA), regression, or Analysis of Covariance (ANCOVA) design. GLM uses the least square methods of the general linear model to estimate and test hypotheses about effects. This article describes the features of the GLM module.

Analyses types

Analysis of Covariance, Factorial ANOVA, Factorial Regression, General MANOVA/MANCOVA, homogeneity-of-Slopes Model, Huge Balanced ANOVA, Main Effects ANOVA, Mixture Surface Regression, Multiple Regression, Nested Design ANOVA, One-Way ANOVA, Polynomial Regression, Repeated Measures ANOVA, Response Surface Regression, Separate Slopes Model, Simple Regression

Analysis of incomplete designs

Since Type IV sums of squares sometimes generate misleading results (e.g., Milliken & Johnson, 1992; Searle, 1987; see also Hocking, 1985) two other options are provided. (1) "Type V sums of squares" is used in industrial experimentation. It involves a combination of the methods employed in computing Type I and Type III sums of squares. (2) "Type VI sums of squares" is identical to the effective hypotheses approach described by Hocking (1985). This approach applies to the Sigma-restricted solution.

Cross-Validation and Prediction Samples

A very important step when fitting models to be used for the prediction of future observation is to cross-validate the results, i.e., to apply the current results to a new set of observations that were not used to compute those results (estimate the parameters). GLM offers very flexible methods for computing detailed predicted values and residual statistics for observations (1) that were not used in the computations for fitting the current model and have observed values for the dependent variables (the cross-validation sample), and (2) that were not used in the computations for fitting the current model, and have missing data for the dependent variables.

Designs

The user can choose simple or highly customized one-way, main-effect, factorial, or nested ANOVA or MANOVA designs, repeated measures designs, simple, multiple and polynomial regression designs, response surface designs (with or without blocking), mixture surface designs, simple or complex analysis of covariance designs (e.g., with separate slopes), or general multivariate MANCOVA designs. Factors can be fixed or random (in which case synthesized error terms will be computed). All of these designs can be efficiently specified via any of the three types of user interfaces described above, and customized in various ways (e.g., you can drop effects, specify custom hypotheses, etc.). Also, GLM can handle large analysis designs; for example, repeated measures factors with 1000 levels can be specified, models may include 1000 covariates, or you can analyze efficiently huge between-group designs.

Desirability Profiles and Response Optimization:

After fitting a model it is often desirable to determine an optimum setting for the dependent variable or combination of dependent variables. For example, in the manufacture of tires, one might be interested in the hardness of the tire, indices of abrasion, and effectiveness during braking. Each one of these characteristics may add to the desirability in particular (and often non-linear) ways (e.g., there may be an optimum level of hardness). The user can define the desirability (function) for the dependent variables, and then to review the combined desirability for all dependent variables over the levels (or user-defined values) of the predictor variables. The option to automatically find the optimum desirability is also provided.

Efficient Computations for Balanced ANOVA Designs

Contains an option to "instruct" the program that the design is balanced, and that the more efficient computational methods can be used. Even very large designs with effects with degrees of freedom in the hundreds can thus be analyzed in mere seconds, while the general computational procedures that do not assume a balanced design may take several minutes to accomplish the same.

Hypothesis Testing

Contained Effects, Type I Sums of Squares, Type II Sums of Squares, Type III Sums of Squares, Type IV Sums of Squares, Type V Sums of Squares, Type VI (Effective Hypothesis) Sums of Squares

Overparameterized model (coding of categorical predictors):

Nested designs and separate slope designs are best analyzed using the overparameterized model. This is the most common way to estimate variance components, and to compute synthesized error terms in mixed model ANOVA.

Planned Comparisons of Least Squares Means

The user can specify planned comparisons for testing hypotheses about (estimable) population marginal means. GLM also has flexible options for testing hypotheses about linear combinations of effects.

Post-Hoc Tests for Repeated Measures Effects

This module includes options for performing post-hoc comparisons on the observed means in interaction effects; for effects involving repeated measures factors or interaction effects for repeated measures factors and between-group factors, GLM allows the user to choose between different possible estimates of the population error variance (sigma, which is necessary to compute the post-hoc p values). Specifically, by default, in interaction effects involving both between and within-subject (repeated measures) effects, the program will choose either (1) the between error term for comparisons of means within the levels of the repeated measures factors, (2) the respective within error term for comparisons of means within the levels of the between factors, and (3) a pooled estimate derived from both for comparisons of means across the levels of the between and within (repeated measures) factors. These methods are described in detail in Winer, Brown, and Michels (1991, p. 526-531), and Milliken and Johnson (1992, p. 322-350). An option is also provided for entering a user-defined estimate of sigma.

Sigma-restricted model (coding of categorical predictor

Factorial designs with large numbers of factors are best analyzed using the sigma restricted model; in short, a simple 2-way interaction of two two-level factors requires only a single column in the design matrix using the sigma restricted parameterization, but 4 columns in the overparameterized model; as a result, analyzing, for example, an 8-way full factorial design with GLM only requires a few seconds

Tests of Assumptions, Residual Statistics

After fitting a particular model, it is always extremely important to carefully inspect the results with regard to any serious violations of assumptions for the respective statistical tests and procedures. This module includes options to aid in this task, including plots of means versus standard deviations, and various tests of the homogeneity of variances. the user can easily check for outliers by computing the extended list of predicted value and residual statistics and sort the observations by a chosen residual statistic (e.g., the Mahalanobis distance, deleted residual value, leverage value, etc.).

Historical note: It is the emergence of the theory of algebraic invariants in the 1800's that made the general linear model, as we know it today, possible. The theory of algebraic invariants developed from the groundbreaking work of 19th-century mathematicians such as Gauss, Boole, Cayley, and Sylvester. The development of the linear regression model in the late 19th century, and the development of correlational methods shortly thereafter, are clearly direct outgrowths of the theory of algebraic invariants. Regression and correlational methods, in turn, serve as the basis for the general linear model.

Sign In