When the number of different items (categories) in the data is very large and not known ahead of time...

When the "factorial degree" of important association rules is not known ahead of time...

Then pivot tables and cross-tabulations are too cumbersome to use or may not be applicable. For example, a three-way association would not be visible in a cross-tabulation.

The a priori algorithm implemented in Spotfire Statistica® Association Rules automatically detects the relationships ("crosstabulation tables") that are important (i.e., crosstabulation tables that are not sparse, not containing mostly zeros), but also determines the factorial degree of the tables that contain the important association rules.

The Association Rules module can find rules of the kind *If X then (likely) Y* where X and Y can be single values, items, words, etc., or conjunctions of values, items, words, etc. (e.g., *if (Car=Porsche and Gender=Male and Age<20) then (Risk=High and Insurance=High)*). The program can be used to analyze simple categorical variables, dichotomous variables, and/or multiple response variables. The algorithm will determine association rules without requiring the user to specify the number of distinct categories present in the data, or any prior knowledge regarding the maximum factorial degree or complexity of the important associations. In a sense, the algorithm will construct crosstabulation tables without the need to specify the number of dimensions for the tables or the number of categories for each dimension. Hence, this technique is particularly well suited for data and text mining of huge databases.

For additional information see Agrawal and Swami, 1993; Agrawal and Srikant, 1994; Han and Lakshmanan, 2001; Witten and Frank, 2000.

## Recommended Comments

There are no comments to display.