Jump to content
  • Spotfire analysis of water environments around the world - Gartner Catalyst 2019 conference


    This article describes Spotfire's contribution to the Gartner bake-off - analysis of water environments around the world - Gartner Catalyst 2019 conference

    Spotfire was selected as one of three vendors invited to the Gartner Catalyst Conference 2019 for an analytics show-down session - Spotfire onstage along with IBM and Looker. The analysis pertains to water quality and the environment. Water resources - consumption, drought, and water quality - pollution and improvements, are included. The aim is to demonstrate the power of using analytics and machine learning to help learn, investigate and better the environment. This fits in well with Spotfire's drive to use #data4good, so we accepted the invitation from Gartner gladly. 

    Our two presenters: Michael O'Connell (Chief Analytics Officer, Spotfire) and data science team cover three use cases that are relevant to today's environmental concerns, and also show the power of analytics and machine learning in these fields. We are using Spotfire combined with geoanalytics, machine learning and statistics to show global trends and to investigate highly local issues. The three use cases are:

    1. Worldwide analysis of water consumption compared to world bank development data using AI Recommendations in Spotfire
    2. Historical and future trend analysis of the River Kelvin in Glasgow, Scotland using Spotfire's embedded R engine with geoanalytics
    3. Identification and pattern of bear and other wildlife appearances at Katmai National Park, Alaska using Image Rekognition through Spotfire, AWS and Python

    Worldwide analysis of water consumption compared to world bank development data using AI recommendations in Spotfire

    In this analysis we bring in three disconnected data sources, and combine these in Spotfire to provide further insights across the globe. We start by reading in the world bank development data which has many variables on economic and social growth per country. Spotfire's in-built artificial intelligence recommendations engine then automatically finds relationships in this data, and suggests best practice visuals to use to analyze this relationship:

    01_ai_recs-0.thumb.png.39834334fbb13a28c7437452c793e350.png

    Spotfire's AI Recommendations suggesting relationships and visual for a user

    We can see Spotfire has found relationships between internet users and population growth, as well as phone subscriptions. We now want to bring in and merge data on global water consumption which we get from the Our World in Data website. Here Spotfire is able to merge several datasets intelligently, allowing the user to view the data flow, and edit this as well as add data transformations. This is all displayed through Spotfire's data canvas, which provides a fully interactive data wrangling interface:

    02_add_multiple_csvs-0.thumb.png.6a0d072d495a96cfe849236d45791a8b.png
    Spotfire reading multiple CSVs while automatically suggesting joins


    03_data_canvas-0.thumb.png.ad530beededd79a19b9688a38bcce7de.png

    Spotfire's Data Canvas allows visual interaction and transformation of data in many ways

    Once we have combined our economic and water data together in Spotfire, we can use the AI recommendations again to find links between water consumption and factors such as population growth and percentage of population under 14. We can also use Spotfire?s inbuilt advanced data relationship functions which allow us investigate specific relationships using techniques such as regression and clustering. Here we use regression to display relationships in our data to water consumption:

    04_data_relationships-0.thumb.png.ca7f79f925ecb1a583bac98d143c91a1.png

    Spotfire Data Relationships and Linear Regression

    We can then expand upon these relationships by mapping the data. Spotfire has automatic geocoding which allows us to map this data easily:

    05_automatic_geocoding-0.thumb.png.c3ff432c31485a6597b90d94c3763b04.png

    Automatic geocoding in Spotfire - automatically mapped countries

    We can further augment this analysis by using another Spotfire in-built data relationships: K-means clustering. This helps us identify related countries in terms of the important factors related to water consumption:


    06_k_means_clustering-0.thumb.png.4a2e079a2ca52563f1fe353f85e5f4d4.png

    K-means clustering in Spotfire through Data Relationships

    Here we find interesting clusters where the most developed countries have been clustered together with other clusters highlighting other levels of development vs. water consumption. 

    In this use case we have been able to quickly analyze relationships across the world in terms of water consumption vs social and economic terms, quickly identifying the strongest relationships and then being able to derive clusters of countries showing a distinct pattern in terms of economic growth and development. This was all performed using Spotfire's inbuilt augmented AI recommendations engine, data relationships and automated geocoding.

    Historical and future trend analysis of the River Kelvin in Glasgow, Scotland using TERR/R in Spotfire with geoanalytics

    In this use case we perform an analysis of time series data from Scotland. The River Clyde and Kelvin in Glasgow suffered greatly through the industrialization of the city in the early 1900s onwards. However, in recent decades there has been considerable effort to identify, tackle and improve the water to restore its ecological status and habitats. Here we will analyze this past and project future trends utilizing machine learning and statistical methods.

    First let's build our story. In Spotfire, we can include web pages for viewing which will help provide a context for our data science analysis:


    07_embedded_webpages-0.thumb.png.e37b8fa8371d049676e6e019e4099aaf.png

    Embedded web pages in Spotfire

    Using Spotfire we rapidly built a parameterized dashboard that allows us to map different pollutants across the River Kelvin catchment, explore trends over time and also to forecast into the future. This is powered by Spotfire's interactive and dynamic controls, and by utilizing Spotfire's embedded TERR (now called Spotfire Runtime R statistical engine (TER. TERR is an enterprise version of R that provides greater flexibility and performance than the open source version of R:

    08_parameterized_dashboard_forecasting-0.thumb.png.051f29467004f02a6d55515b029558cf.png
     

    Parameterised dashboard and forecasting in Spotfire

    For instance, above we can see that Ammonia has improved since the 1960's and the forecast (using Holt-Winters) shows this trend is expected to continue into the future. We can compare each measure of water quality in the River Kelvin to see which pollutants are worsening, or improving, to test whether the previous investments had the desired impact and whether future work is still required:

    09_parameter_comparison_missing_.thumb.png.66d2642262b9acfc2bc34f982798c550.png

     

    Comparing multiple parameters in Spotfire

    We can see from the above trends that phosphate levels (in blue) take a dramatic turn towards lower and therefore less polluted levels post 2003. This was when the large investments works were completed. This is shown even more prominently when looking at the Bothlin Burn on the bottom right chart. In contrast we see Nitrate which is often from farming activities, is forecast to increase, so clearly improvements are still required to mitigate this effect. However, some sections of the river we can't forecast as there are missing data from years. Using machine learning methods we can resolve this by imputing value to fill in these gap years:

    10_imputed_temporal_values(2).thumb.png.6fde0566cc7f9f063ebd56b830d2dc06.png

     

    Imputing temporal values in Spotfire

    This temporal imputation was performed using the MICE R library, called directly in Spotfire via its embedded R engine (TERR). We can also analyze and remove outliers using Spotfire's ability to interactively filter out or remove data based upon conditions we specify:

    10_5_handling_outliers.thumb.png.a7e4d76deec257834875adf6fc68b342.png

     

    Outlier detection and removal in Spotfire

    We finally want to expand on our mapping of this river catchment to give a fuller picture of water quality and how is changes spatially. To do this we use geoanalytics in Spotfire to extract points from the river polygons which will allow us to map the data better. We map the sampling points where we have data, to the stations on the rivers closest to them. Again we can use TERR and geoanalytics to solve this problem easily in Spotfire:

    11_extract_points_from_polylines_w_terr-0.thumb.png.126f2a82ddd975b90fd53a1cc18e4667.png
    Extracting polygon points from lines in Spotfire

    12_spatial_nearest_neighbor_join_imputation(1)-0.png.e7e540e1ad8fd4069e7f7f142c1a2cc6.png

    Imputation of points on a map using geoanalytics

    In summary, we have been able to use a powerful combination of statistics such as time series forecasting and imputation, as well as geoanalytics in Spotfire to provide a historical, and future trend analysis. We have shown while improvements have had a great impact, there are still areas of concern as highlighted in our analysis story at the start.

    Identification and pattern of bear and other wildlife appearances at Katmai National Park, Alaska using image recognition through Spotfire, AWS and Python

    Our final demonstration shows how we can utilize advanced machine learning techniques such as image recognition in Spotfire, and the massive potential that cloud services bring to data science. In this example we use Spotfire's Python data function to call AWS S3 and the AWS Rekognition image recognition services to analyze the habits of bears and other wildlife in Katmai National Park in Alaska. This footage was taken by snapshots every 10 minutes over an evening of Explore.ORGs live bear cam feed. Here we can plot a timeline of when they appear and in what abundance:


    12_image_rekognition(1).gif.2e1a615577e4dacca8f2e8b1c510aa56.gif

    Interactive image recognition in Spotfire using AWS

    You can read an in-depth blog on this demonstration in this community blog, or watch a live explanation and overview in 

    .

    Colin Gray - TIBCO Data Science - Aug 2019.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...