## Introduction to Beeswarm Plot

Addendum - January 2024 - Please also consider using the Violin Plot Mod for Spotfire!

Beeswarm plots are great to show both the overall distribution of a variable and the individual data points. They look like one-dimensional scatter plots displaying individual measures as points with the difference that a logic is applied to ensure that plotted points are closely packed and do not overlap. They are useful to display an information-dense summary of how the top features in a dataset impacts the model's output, showing the distribution of a single numeric metric across one or more categories.

For people interest in learning more about dot plots/beeswarm plots and models for distributing the points on the chart then this article by LeLand Wilkson should prove useful: http://moderngraphics11.pbworks.com/f/wilkinson_1999.DotPlots.pdf

Learn more about visualizations and dashboards with Spotfire®

## Creating a Beeswarm Plot in Spotfire®

This configuration work with all versions of Spotfire. Creating a beeswarm plot relies on the generous configuration flexibility of the native Scatter Plot in Spotfire which also provided the foundation for chromosome maps, gantt charts, swim lanes, ternary plots and many others in the past. The challenge with configuring a beeswarm plot is the distribution of the points both into horizontal bands and distributing the points evenly within the bands.

This tutorial uses a fictional data with measurements variables for 3 different categories of data. Download this dataset at the end of the page and open it in Spotfire.

Open the visualizations panel and add a Scatter Plot to the canvas.

Select a category columns in the X-axis and select the quantitative metric of interest on the Y-axis. In this example we use the (Group) column as X-axis and (Metric) as Y-axis.

We want to allow the number of horizontal bands to be easily adjusted using a slider control. Create a Text area and add a Property Control of type Slider. In the Slider setting, click New… and add a new Document Property named "Bins" of type Integer with a default value of 40. Then set property value through a "Numerical Range" between 10 to 100 by interval of 10 and click OK.

Now we will use a calculated column to create the beeswarm horizontal bins based on the slider selection. Go to Data > Add calculated column? and paste-in this function:

BinByEvenIntervals([Metric],${Bins})

We will name this new column ?Binned Metric?.

Back in the scatter plot, change the Y-axis to (Binned Metric). Now changing the binning value with the slider will change it in the scatter plot.

Next we need to evenly distribute the points within each horizontal band. To achieve this we will use another calculated column. Go back to Data > Add calculated column? and paste-in this function:

Rank([Metric],"asc",[binned Metric],"ties.method=first") - (Max(Rank([Metric],"asc",[binned Metric],"ties.method=first")) over ([binned Metric]) / 2)

We will name this new column ?Distributed Metric?.

Back in the scatter plot, change the X-axis to (Distributed Metric) and also remove the Color-by configuration. Now the scatter plot is configured as a beeswarm plot and you can show more or less bins using the slider.

The (Distributed Metric) calculated column ranks our metric and then subtracted half the max rank (or record count) for that bin to distribute the records around a central axis. Using that method, within each horizontal band the values for the metric increases from left to right.

## Additional Configurations

### Trellising a beeswarm plot

We want to compare the distribution for the 3 different groups that we have in our dataset using a trellis.

We will use a new calculated column t to combine the group category and the (Binned Metric) column. Go to Data > Add calculated column? and paste-in this function:

Concatenate([binned Metric], [Group])

We will name this new column "Binned Metric Group"

Next we need, again, to evenly distribute the points within each horizontal band. To achieve this we will use another calculated column. Go back to Data > Add calculated column… and paste-in this function:

Rank([Metric],"asc",[binned Metric Group],"ties.method=first") - (Max(Rank([Metric],"asc",[binned Metric Group],"ties.method=first")) over ([binned Metric Group]) / 2)

We will name this new column "Distributed Metric Group".

Back in the scatter plot, change the X-axis column to (Distributed Metric Group) and go to Properties > Trellis to use the (Group) column to trellis by.

### Drawing results as a bar chart

We want to compare the distribution as bars. Right-click the beeswarm plot and switch visualization to a Bar Chart. Then right click the visualization to change it to Horizontal Bars and use (Binned Metric) column as X-axis and (Distributed Metric Group) column as Y-axis using the Sum aggregation.

Now we need to change the Color-by expression. Click on the Color-by selector and paste-in the following expression:

If([Distributed Metric Group]<=0,"left","right")

And now use the same color for both left and right.

### Adding lines to the beeswarm plot

Switching the Y-axis to the raw (Metric) column will allow to have lines for mean, q1, q3, etc. with the side effect of showing the rising Metric from left to right within each band, remember each band is still defined because the X-axis axis is still using a calculated column which takes into account the (Binned Metric) column.

### Drawing results as a dot plot

A similar approach can be used to draw the results in a bar chart or to produce a dot plot, which is like a histogram but records are represented as individual points rendered and grouped into bins (bars).

To do that, we need to create a new calculated column. Go back to Data > Add calculated column… and paste-in this function:

Rank([binned Metric],"asc",Concatenate([binned Metric],[Group]),"ties.method=first")

We will name this new column ?Distributed Metric Oneside?.

Open the visualization panel and add a Scatter Plot to the canvas. Select the (Binned Metric) column as X-axis and (Distributed Metric Oneside) on the Y-axis.

Now change the trellis configuration to trellis by Rows and select the (Group) column.

You can also switch this visualization to a bar chart. Just right-click the visualization and select Bar Chart.

## Recommended Comments

There are no comments to display.