Jump to content
  • Spotfire® Tips & Tricks: Hierarchical Cluster Analysis in few clicks with Spotfire®


    This article describes strategies for hierarchical clustering

    Introduction

    Hierarchical cluster analysis or HCA is a widely used method of data analysis, which seeks to identify clusters often without a priori information about data structure or the number of clusters.

    Strategies for hierarchical clustering generally fall into two types

    Agglomerative: This is a bottom-up approach where each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

    Divisive: This is a top down approach where all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

    Hierarchical Cluster Analysis in Spotfire

    The algorithm used for hierarchical clustering in Spotfire® is a hierarchical agglomerative method. For row clustering, the cluster analysis begins with each row placed in a separate cluster. Then the distance between all possible combinations of two rows is calculated using a selected distance measure. The two most similar clusters are then grouped together and form a new cluster. In subsequent steps, the distance between the new cluster and all remaining clusters is recalculated using a selected Clustering Method. The number of clusters is thereby reduced by one in each iteration step. Eventually, all rows are grouped into one large cluster. The order of the rows in a dendogram is defined by the selected Ordering weight. The cluster analysis works the same way for column clustering.

    Distance Measures: The following measures can be used to calculate the distance or similarity between rows or columns:

    • Correlation
    • Cosine Correlation
    • Tanimoto Coefficient
    • Euclidean Distance
    • City Block Distance
    • Square Euclidean Distance
    • Half Square Euclidean Distance

    Clustering Methods: The following Clustering methods are available in Spotfire®

    • UPGMA
    • WPGMA
    • Single Linkage
    • Complete Linkage
    • Ward's Method

    Spotfire also provides options to normalize data and perform empty value replacement before performing Clustering

    hctool.thumb.png.daf57a4333078d847d151905b9a8636e.png

    Hierarchical Clustering tool - Walkthrough 

    To perform clustering with the Hierarchical Clustering tool  Iris Data set was used.

    Select Tools > Hierarchical Clustering...

    Select Data Table and next click Select Columns...

    Sepal length, Sepal Width, Petal Length, and Petal width  columns were selected

    hc_-select_columns.thumb.png.8d74576833e0ba44202cc068e3a222e0.png

    Next in order to have row dendrograms  Cluster rows check box was selected.

    Click the Settings... button to open the Edit Clustering Settings dialog and select a Clustering method and Distance measure. In this case, default options were selected.

    The hierarchical clustering calculation is performed, and heat map visualization with the specified dendrograms is created in just a few clicks. A cluster column is also added to the data table and made available in the filters panel. The Bar chart uses the Cluster ID column to display Species. The pruning line was set to 3 clusters and it is observed that  Setosa was predicted correctly as a single cluster but there were some rows in Virginica and Versicolor which were not in the right Cluster and these are known issues.

    hca-iris.thumb.png.53605c7317957d684df9fc3f37ecb653.png

     

    How do I learn more?

    You could also request a featured session on Hierarchical Clustering  from above on Dr. Spotfire Office Hours by:


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...