Jump to content
  • Clustering markers on maps with Spotfire


    This article first shows how to set up clustering of markers, and as a second step shows how to use multiple map layers that uses clustering when at remote zoom level and non-clustered markers at detailed zoom levels.

    Introduction

    This article shows how to group markers on maps into clusters using the R function for k means clustering. When visualizing each row as a marker on a map, you get a great view of the details. However, when there are very many overlapping markers in certain areas, it may be hard to understand the general distribution of the data.

    One technique that can help you see the general distribution of the data is to cluster markers based on geographical proximity. This can help get a sense of the data at a remote zoom level.

    This article first shows how to set up clustering of markers, and as a second step shows how to use multiple map layers that uses clustering when at remote zoom level and non-clustered markers at detailed zoom levels.

     

    Step 1 - clustering markers

    In this example, we are using data from the Fatality Reporting System (FARS) made available by the NHTSA. The file contains data from the year 2015 only.

    Here we have plotted every accident on a map. At this zoom level, it is clear that in some of the regions there have been so many accidents that it is hard to judge the distribution of accidents, and even harder to find and marking a particular accident marker.

     

    all_data_non-clustered.thumb.png.c2069d24eba3249c92542d3addf4e314.png

     

    So, in this case, we want to group markers based on proximity, to end up with fewer markers that make it easier to discern the distribution. For the clustering, we use the TERR k means function on the marker by the axis of the mapchart, and we cluster the markers using geographical proximity (latitude/longitude).

     

    Clustering Markers

    This is the expression we put on the Marker By axis of the marker layer:

    <TERR_Integer("output <- kmeans(data.frame(input1,input2),${NumberOfClusters})$cluster",[LONGITUD],[LATITUDE]) AS [CLUSTERS]>

    The ${NumberOfClusters} part refers to a document property that is there to make it so the user can interactively change from the UI.

     

    Positioning the clustered markers

    When we cluster a number of accidents we need to set the positioning of the markers in the Map Chart to use the Average longitude/latitude of the markers of the cluster.  To do this set the Aggregation of the Coordinate columns to Avg() as shown below.

    positioning_markers.png.0d9d0f6937257c548b2f39c12f8f6a82.png

     

    Note: One could consider using a weighted average based on for example the number of fatalities in each accident, but we won't consider this here.

    To let us experiment a bit with the number of clusters we can introduce a Document Property that we pass as input to the k means function, and create an input field in a text area where we can edit the value of the document property. 

    Since we aggregate a number of rows into each marker, we may want to use pie markers to display some aspects of the aggregated data. In this case, we use the DRUNK_DR column to visualize the proportion of fatalities in each pie that involved drunk drivers.

    Here are some screenshots with a different number of clusters you can also try this by downloading the attached DXP file and using the map on the first page called "single layer clustered map".

     

    Screenshot with 40 Clusters

    clustered_data_with_40_clusters.thumb.png.915a6bddb0019690928cb7b2c874a98e.png

     

    Screenshot with 400 Clusters

    clustered_data_with_400_clusters.thumb.png.8ead5f0f18e13616ca0fbe3c40e2e850.png

     

    Step 2 - combining clustered and non clustered map layers to enhance the user experience

    As is obvious from using the map on the first page of the attached DXP (single layer clustered map), no number of clusters is good at all zoom levels. As the user zooms closer one would want to increase the number of clusters, eventually using no clustering at all. This is possible using the Zoom visibility setting in Spotfire mapcharts.

    In this case, we have two clustered layers and one non clustered layer as shown in the Zoom visibility settings below:

     

    zoom_visibility_settings.png.f67593b3730a1d19f9557d888e14879a.png

     

    Accident clustered uses 400 clusters, accident clustered 4000 uses 4000 clusters and is being activated when the user zooms to a more granular level, and eventually, the accident layer which is non-clustered is made active.

    You can try this on the second page called clustered multi-layer map of the attached DXP file.

    External data

    In the case of using external data in Spotfire, you cannot use TERR or other data functions. However, the technique of using different layers in the map and aggregating the data to different levels still works nicely with external data. Also, the Zoom visibility feature works with external data so that one can show aggregated data only at a high level, and as the user zooms in the closer show, progressively less aggregated and eventually unaggregated data.

    Hope this is useful and use the comments feature if there are any questions or comments.

    clustering_markers_on_a_map_-_wiki_article.zip


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...