Whenever one needs to classify a "mountain" of information into manageable meaningful piles, cluster analysis is of great utility. Cluster analysis is considered a data mining technique and is frequently used to explore the problem space. Typically it is used when there is not a priori hypotheses. In a sense, cluster analysis finds the most significant solution possible. Therefore, statistical significance testing is not needed.
This module contains clustering methods; k-means, hierarchical clustering, and two-way joining. Data can be processed from either raw data files or matrices of distance measures. The user can cluster cases, variables, or both based on a wide variety of distance measures including Euclidean, squared Euclidean, City-block (Manhattan), Chebychev, Power distances, and percent disagreement, and 1-Pearson r. Amalgamation (linkage rules) are available; single, complete, weighted, and unweighted group average or centroid, Ward's method, etc... Cluster membership data can be appended to the current data for further analysis or model building.
Visualization options include customizable tree diagrams, discrete contour-style two-way joining matrix plots, plots of amalgamation schedules, plots of means in k-means clustering, and many others.
Cluster analysis was first used by Zubin in 1938 and Tryon in 1939 in the field of psychology. Tryon is well known for his rat experiment with "maze-bright" and "maze-dull" rats (i.e. quick or slow problem solving was potentially hereditary in rats).
Recommended Comments
There are no comments to display.