How to manage a Spotfire table with several data functions? ... Use a blob!

Introduction

Have you ever wanted to write a quick Spotfire data function to make a small modification to a data table, only to be stymied because the data table was generated by an earlier data function? This is a tips & tricks blog on how you can manage a Spotfire table with several data functions.

A binary Spotfire document property can be used to pass data among multiple data functions

Example: sharing a table the "obvious" way ... what happens

For example you might use one data function to retrieve data from collect some data and initialize a simple Spotfire table "UserInfo" that contains a running balance of some sort for some users:

You might want to use a second data function to update UserInfo, based on some recent user activity:

A couple of things occur:

The second data function will need to use "UserInfo" both as an input and output. Spotfire will warn of a "Cyclic Dependency" when it detects this. A data function that incorporates a cyclic dependency will still work although it cannot be set to run automatically, it will need to be triggered manually.
The first time the new data function is executed Spotfire will produce another warning, saying that the data table will be overwritten. When this occurs, the first data function becomes disassociated with the table UserInfo, even though it generated this table originally. In fact, if "UserInfo" is the only output from the original data function, the data function itself will be deleted!

What's happening is, only one data function in a dxp file can output a data table to Spotfire in such a way that the table gets completely overwritten. This has the effect that the resulting data tables are strongly linked to the specific data function that generated them:

Initial method that might be tried to share a Spotfire data table between two data functions.
(top): Initialization data function creates a data table.
(bottom), A second data function updates the data table, overwriting it. The first data function can now no longer write to the data table. The first data function will be deleted from the Spotfire analysis unless it has other outputs.

Solution: Use a Blob

One handy way to deal with this situation is to store the information about one or more data tables in a binary object (or "blob"). This blob can be stored in Spotfire as a Document Property; one advantage with this strategy is that any number of data functions can freely update, modify or overwrite the document property and its contents. So one data function can be used to initialize the blob, a second (or more) data functions can then update the blob and its contents, and finally a dedicated extraction data function pulls the tables out of the blob and sends to Spotfire:

Illustration of updating data with two or more data functions (boxes), using a binary object (Blob)

The main elements of this strategy:

The data table is stored inside the binary object (Blob) as R data frames. The blob is stored as a binary Spotfire document property, and can be modified by any number of data functions (boxes).
The Spotfire data table is extracted from the blob using a dedicated data function. The data tables only interact with this one data function.
The extraction data function does not need any cyclic dependencies and can be set up to be triggered automatically whenever the blob changes.
The Update data function involves a cyclic dependency, as the blob is used both as input and output. The cyclic dependency is permitted although this data function must be triggered manually.
The Initialize data function can be run again at any time as needed, to refresh the binary object.

Here is some simple code to illustrate the initialization TERR data function, with a single table "UserInfo". This initialization data function has no inputs and only outputs the binary object UserInfoBlob. The data here is randomly generated but it could be pulled from a database. The key line is the last one, where UserInfoBlob is created:

# [TERR] Initialize data

# Data function to initialize user table
# Inputs (none)
# Output
#   UserInfoBlob

# ----- Function definition(s) ------------------------------------
GetData = function(){
  # generic wrapper for data.  e.g. pull from database etc
  set.seed(1)
  N=6
  i.users.all = 1:N
  UserInfo = data.frame(
    UserID = paste0("User ",sprintf("%03.0f",i.users.all)),
    Balance = rpois(n=N,lambda=80),
    check.names = F
  )
  UserInfo
}
# ----- end functions ----------------------------------------------

UserInfo = GetData()

# library(SpotfireUtils)

UserInfoBlob = SObjectToBlob(UserInfo)

The command "library(SpotfireUtils)" is commented out: this library is loaded automatically at runtime when the data function runs but is needed when running the code interactively for development.

The TERR data function for extracting the data table from the blob might look like this:

# [TERR] Extract User Info
# Inputs
#   UserInfoBlob
# Output
#   UserInfo (only time table is returned)

UserInfo = BlobToSObject(UserInfoBlob)

This data function simply extracts the data frame from the UserInfoBlob object as UserInfo and returns it to Spotfire. This can be set up to run automatically whenever the input UserInfoBlob changes

Finally here is a simple example of a TERR data function that modifies the blob, so the blob appears both as input and output.

# [TERR] Update user table
# Inputs
#   UserInfoBlob
# Output
#   UserInfoBlob

# ----- functions ----------------------------------------------------------
GetUserActivity = function(UserInfo){
  set.seed(1)
  N=round(nrow(UserInfo)/2) # number to update
  UserActivity = data.frame(
    UserID =   sample(UserInfo$UserID,N), # randomly select users to update
    Activity = -rpois(n=N,lambda=8)
  )
  UserActivity
}
# ----- end functions -----------------------------------------------------

UserInfo = BlobToSObject(UserInfoBlob)

UserActivity = GetUserActivity(UserInfo)

# Locate the columns with activity
irow.activity = match(UserActivity$UserID, UserInfo$UserID)

# Update these rows
UserInfo$Balance[irow.activity] = 
      UserInfo$Balance[irow.activity] + UserActivity$Activity

UserInfoBlob = SObjectToBlob(UserInfo)

In this data function script:

A simple dummy function, GetUserActivity() is defined; here we simply generate some random user activity but in general the activity might be pulled from another source, or be passed into the data function through additional input arguments.
The data frame UserInfo is extracted using BlobToSObject()
The UserActivity data frame is generated using our dummy function GetUserActivity();
The matching rows are found and stored in irow.activity
These rows of the UserInfo table are updated
Finally, the modified UserInfo data table is converted back to UserInfoBlob which is returned to Spotfire.

When setting up this Update data function, Spotfire will detect the cyclic dependency and show a warning message like this:

This is perfectly normal and you can click "Yes" to continue. However this data function must be run manually, and cannot be triggered to refresh automatically.

Whenever the Update data function is run, it will update the Spotfire binary document property UserInfoBlob. The Extract data function will then automatically run and the data table appearing in Spotfire will automatically update.

Coding Hint

In practice, at the beginning of each TERR data function, I usually include a snippet of R code that looks like this:

TimeStamp=paste(date(),Sys.timezone())
if(file_test("-d", "C:/Temp")) save(list=ls(), file="C:/Temp/mydata.in.RData", RFormat=T )
# remove(list=ls()); load(file='C:/Temp/mydata.in.RData'); print(TimeStamp)

This code simply saves a copy of whatever variables are present as the data function starts up, to a temporary file on disk, and then returns. Once the input variables to the data function are defined, and mapped to Spotfire objects, a copy of these objects will be saved to disk.

In an interactive TERR session in RStudio, I'll run the line that is normally commented out, this clears my variables and loads the objects that were just saved from the data function. The Timestamp is a basic sanity-check to make sure I'm looking at data from the expected run.

This strategy makes a good starting point to develop the data function; I typically start developing a data function by defining and mapping the input variables, including a code stub like this one, running once, then switching to the interactive TERR session to write the actual code which I'll then copy/paste into the data function. Later, if an error occurs or an unexpected result, I can always load the fresh data and step through the R code to reproduce and correct the analysis. I'll then delete or comment out the stub once the code is working.

Sometimes it can be valuable to save the variables that are present as the data function finishes up, in which case similar code can be placed at the very end of the data function but using "mydata.out.RData" as a name. There are two reasons for doing this:

Sometimes the expected R objects do not return to Spotfire as expected, so this gives you a last chance to look at them before they are sent back;
This provides a convenient way to develop additional code in your interactive session, without having to execute the entire data function.

Closing remarks

This example has addressed a common situation, where the user wants to update a data table with a data function that was not involved in the data's initialization.

The binary object however can contain more complex objects. For example, you might be working with two data tables and an R model object that might be useful later on in a data function; you can easily package up these objects into an R list object and save this as a binary object ("dataBlob") which can be returned to Spotfire as a binary document property, and used and modified by other data functions:

dataList = list(
  table1 = table1,
  table2 = table2,
  model1 = model1
)

dataBlob = SObjectToBlob(dataList)

Illustration of storing different types of objects in a Blob, using the list structure above.

Peter Shaw - TIBCO Data Science - Oct 2019

Peter Shaw is a data scientist in the TIBCO Data Science team, based in Seattle. His interests include geospatial analysis, mapping, pattern recognition, optimization, time series and routing. He views data science as a contact sport, with the analyst, the data, and analytical models as the players. Other interests include photography, drawing, music, and partner dancing.

Sign In

How to manage a Spotfire table with several data functions? ... Use a blob!

Introduction

Example: sharing a table the "obvious" way ... what happens

Solution: Use a Blob

Illustration of updating data with two or more data functions (boxes), using a binary object (Blob)

Coding Hint

Closing remarks

Illustration of storing different types of objects in a Blob, using the list structure above.

Table of contents

User Feedback

Recommended Comments

Industries