The first blog post introduced Chatper 5, "Practical Applications of Spotfire Visualizations". This post covers some content from Chapter 10, "Information Links and Data Connectors" - specifically the part of the chapter that discusses the use of Spotfire's custom expressions when working with in database data.
But first, a quick recap on in-memory versus in database analytics. I quote from the book:
Part of the challenge with in-database analytics, is that the full suite of Spotfire custom expressions is not available to you. You fancy using OVER expressions, for example, to calculate a cumulative sum? This can appear challenging at first sight. This is because the Spotfire data connectors query the underlying database directly - they build queries to extract the summary information - just enough to visualize the data at some aggregated level, but databases don't support the OVER expressions so you can't use them directly.
Let's go back to the hypothetical scenario described above. You could imagine that Spotfire needs to show total sales for each month - the query passed to the database might be something like this:
SELECT Sum(Sales) as Sales, month([saleDate]) as [Month] FROM SalesData GROUP BY month([saleDate])
This query sums the total sales for each month and would produce a bar chart that looks like this:
Remember - you, as the user, do not have to enter the SQL query - Spotfire does all that for you automatically when you choose what you want to show on each of the axes!
Now, let's say we wanted to calculate the cumulative sum. It could be that the database doesn't support cumulative sum or might be rather inefficient in doing so. For example a quick Internet search on cumulative sum, SQL Server reveals some interesting insights. Apparently, a lot of database administrators will be very upset if you try to calculate something like this on their database inefficiently! It seems that calculating cumulative sum is rather difficult for SQL Server...
Better yet, let's use Spotfire's custom expression language to calculate the cumulative sum. After all, we know the values of each of the individual bars, we just need to sum them over time. We'll be using the standard axis selector to choose the cumulative sum aggregation, then inspecting the resultant custom expression to find out what's going on under the hood.
We can choose Cumulative Sum like this:
The Spotfire visualization will now update like this:
Now this visualization is using the combined power of in-database and in-memory calculations - first of all, the database is being queried to return the raw sums for each month; then Spotfire is calculating the cumulative sum in memory. Therefore, we get the best of both worlds: a small amount of data is returned to Spotfire; the cumulative sum calculation is performed efficiently in memory, rather than inefficiently within the database.
Inspecting the custom expression yields the following:
Sum([sales]) THEN Sum([Value]) OVER (AllPrevious([Axis.X]))
So, what's going on here? The key is the THEN keyword. It leads to what is known as a post-aggregation expression - i.e., there is an initial aggregation, followed by a further aggregation. Using post-aggregation expressions makes OVER expressions much easier to work with than otherwise. You can read more about OVER expressions in Chapter 7 of the book. Its title is "Source Data is Never Enough".
The initial expression of Sum([sales]) is passed directly to the underlying database (Spotfire will automatically add the necessary SELECT, Group By, etc. clauses to the SQL). The database will return a small number of rows to Spotfire as an in-memory data table for visualizing the bar chart. Although the table isn't shown to you (it's hidden), you can imagine it like this:
The post-aggregation part of the expression then operates on the Value column (in this case it's sum of sales). It's simply:
Sum([Value]) OVER (AllPrevious([Axis.X]))
All this is doing is performing a cumulative sum. The cumulative sum is calculated in-memory against the (hidden) data table that drives the visualization.
Now that we understand how the combination of hybrid in-database and in-memory aggregations work, we can then apply all sorts of other post-aggregation expressions to the value axis of the bar chart - moving averages, difference year-on-year, etc.. You can even apply the post aggregation expressions to streaming (live) data!
It's critically important to understand that post-aggregation expressions can only be used on the value axis of a visualization and even then, only if the x-axis is categorical (or binned). It's something I have not realized in the past!
To summarize:
Spotfire gives us the ability to perform hybrid in-database and in-memory aggregations. We need to use in-database aggregations when working with large data volumes. When we do this, we can then operate on the data returned by the database using Spotfire's hugely powerful expression language. By combining in-database and in-memory aggregations, we get the best of all worlds. Fast performance on large data; small volumes of data being transferred; powerful and flexible calculations.
The TIBCO Community page on the book is a great starting place for further exploration of topics covered in the book. Another related and very useful page on the community is: TIBCO Spotfire® Enablement Hub - this is a launchpad for all sorts of great stuff!
This is the second of a series of blog articles on the book - watch this space for more excerpts and related concepts. I'll be picking various topics and discussing them - providing introductions to those topics in some cases; expanding on the content in the book in others.
Pick up a copy of the book today from Amazon
Recommended Comments
There are no comments to display.