Jump to content
  • Spotfire Statistica® Discriminant Function Analysis


    Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types. Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The module performs forward stepwise analysis where variables are evaluated and added to the model if they contribute the most discrimination between groups.

    Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types.

    Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA).

    The module performs forward stepwise analysis where variables are evaluated and added to the model if they contribute the most discrimination between groups. The backward stepwise analysis is also available where all variables are added to the model. Variables with the least amount of contribution between groups are removed. Or the user can enter user-specified blocks of variables. 

    In the case of a single variable, the significance test of whether or not a variable discriminates between groups is the F-test. F is essentially computed as the ratio of the between-groups variance in the data over the pooled average within-group variance. If the between-group variance is significantly larger then there must be significant differences between means.

    Usually, you include several variables in a study in order to see which one(s) contributes to the discrimination between groups. This scenario creates a matrix of total variances/covariances and a matrix of pooled within-group variances/covariances. The user can compare those two matrices via multivariate F-tests in order to determine whether or not there are any significant differences (with regard to all variables) between groups. 

    Output includes the Wilks' lambdas, partial lambdas, F to enter (or remove), the p levels, the tolerance values, and the R-square.

    Canonical Analysis can also be performed to report the raw and cumulative eigenvalues for all roots, and their p levels, the raw and standardized discriminant (canonical) function coefficients, the structure coefficient matrix (of factor loadings), the means for the discriminant functions, and the discriminant scores for each case.

     
    Historical note: In the two-group case, discriminant function analysis can also be thought of as (and is analogous to) multiple regression. Two-group discriminant analysis is also called Fisher linear discriminant analysis after Fisher, 1936. Sir Ronald Aylmer Fisher was described as "a genius who almost single-handedly created the foundations for modern statistical science" and "the single most important figure in 20th century statistics". 
     

    In the two-group case, the user fits a linear equation: 

           Group = a + b1*x+ b2*x2 + ... + bm*xm

    a is a constant and b1 through bm are regression coefficients. The interpretation of the results of a two-group problem is straightforward and closely follows the logic of multiple regression: Those variables with the largest regression coefficients are the ones that contribute most to the prediction of a group membership.

    See Jennrich, 1977, for another description for of the computations involved.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...