Introduction
When you create an analysis using Spotfire®, you can enhance the capabilities through the use of data functions. Data functions are a method of calling languages such as Python, Spotfire® Enterprise Runtime for R (TERR), Open Source R or even Matlab. This allows Spotfire users to interactively call TERR through the Spotfire client, or through the web player to greatly enhance their analytics. This guide serves as a summary of tips and tricks which can be used to aid in the development of TERR/R data functions, as well as how they can be implemented in your analytics to enhance the insights and data science in Spotfire.
Looking for tips on using Python data functions: check out this equivalent wiki article: Tips for working with Python in Spotfire
Different inputs than expected
When creating an R script to run in an Spotfire® Enterprise Runtime for R (TERR) data function, or expression function, you might encounter problems when you run the R script that did not occur when you ran the same R script outside of Spotfire®. You can troubleshoot these problems. The simplest approach is to save the R objects that are being passed in to your script from Spotfire to a known location, and then load those objects into an interactive session of TERR or R. It is typically much easier to debug the issue in that environment. After you have corrected the issue, you can then update the script in the data function.
To do this, add a line of code to the top of the R script to save the input parameters to an RData file:
save(list=ls(), file="C:/debug.RData", RFormat=TRUE)
Then, in RStudio or the TERR console, load the RData file and run the R script so that the exact same inputs are used as when the R script ran in the TERR data function.
load(file="C:/debug.RData")
Notes
- If you are using an open-source R data function, you should omit the flag RFormat=TRUE.
- Rather than writing to the C:/ drive directly, you might want to adjust the path to another known location, such as C:/temp/, but make sure that location exists before you attempt to run the script. Or use a relative path, such as "~/Dump.RData".
- If you are running this on a Linux system, you need to adjust the path accordingly.
- After you have resolved the issue, be sure to delete or comment out the line in your script, to avoid consuming unnecessary resources when you share or deploy the script.
- In some cases, it might be useful to save the objects again at the end of your script. Do this by saving the objects to a separate file with another, similar line of code. You can then inspect how the objects have changed during your script, and what is getting passed back to Spotfire.
- The attached documents on supported data types are also useful references, as well as the discussion of Data Type mapping in the TERR Help files. In particular, note that Long Integers in Spotfire are not supported in data functions, and so passing in a Long Integer causes an error.
For more information on how to debug TERR data functions and expression functions, see the attached document Top 10 Tricks for Spotfire® Enterprise Runtime for R Data Functions, and download the file below. Review the in-depth training course available online: Spotfire Analyst Extended with R - Spotfire® Enterprise Runtime for R(available in resources).
Setting the Debug option in Spotfire
You can get information about problems with a data function from the data function debugger. By default, this option is disabled, because running it can cause performance to be affected.
To enable data function debugging in Spotfire, follow these steps.
- From the menu, click Tools > Options.
- In the Options dialog box, in the left pane, scroll down and click Data Functions.
- Select the Enable Data Function Debugging option, and then click OK to accept the changes.
When you run a data function, notice that the Details icon is displayed in the Notifications panel in the lower left corner of the Spotfire window.
Click Details to open the debugging dialog box. The debugging results show all of the information that passed between TERR and Spotfire, and it includes any errors or warnings that result from running the data function. You can review the results of running the data function, or export the results to a text editor to examine it more carefully.
After you are finished debugging using the data function debugger, be sure to return to the Tools > Options dialog box and turn off the data function debugger.
Checking whether the script is being run when expected
To confirm whether a script is being run when you expect it to, use the technique above to save the objects to an .RData, but also add this line of code to create a timestamp:
TimeStamp=paste(date(),Sys.timezone())
Then, when you load the RData file into an interactive session, you can check the timestamp with this line of code:
print(TimeStamp)
Different packages installed
Another common cause of issues is different packages available in the different environments. In brief, if you want a data function or an expression function to use functionality from an external package, that package must be installed in the right environment (either under the copy of the TERR engine used by Spotfire on the desktop, or on Spotfire® Statistics Services), and it must be loaded in the data function or expression function. This can cause confusion, especially if you have multiple versions of TERR installed on a server, or on a desktop (for example, if you installed Spotfire, you have the embedded TERR engine, and you have separately installed the Spotfire® Enterprise Runtime for R Developer Edition).
To determine which packages you have loaded in an interactive session of TERR or R, you can use these commands:
- search() lists all the packages that are attached, hence directly usable by the user.
- loadedNamespaces() lists all the loaded packages, which also include packages that are only accessible from other packages or by using the pkgName::objectName syntax.
To get this information within your data function, add two lines of code to the top of the R script to save the package lists to a text file:
pkgs <- list(search = search(), loadedNamespaces = loadedNamespaces()) dump("pkgs", file="C:/debugPackages.txt")
Compare this information to what you see when you run the commands interactively in TERR or R.
Notes
- Rather than writing to the C:/ drive directly, you might wish to adjust the path to another known location, such as C:/temp/, but make sure that location exists before attempting to run the script.
- If you are running this on a Linux system, you will need to adjust the path accordingly.
- After you have resolved the issue, be sure to delete or comment out the line in your script, to avoid consuming unnecessary resources when you share or deploy the script.
- In some cases, it can be useful to save the list of packages again at the end of your script, by saving the list to a separate file with another, similar line of code. You can then inspect how the search path has changed during your script.
- Sometimes you might have the same package installed in two different places, or your .libPaths() might differ between RStudio and Spotfire. To help determine this, consider using searchpaths(), which returns the absolute file path, instead of search().
-
These packages are loaded automatically in a Spotfire environment, but would need to be loaded explicitly in an interactive debugging environment, if any of their functions are used in your script:
- SpotfireConnector
- SpotfireData
- SpotfireUtils
For full details on installing and using packages, see Spotfire Package Management.
Special versions of packages required
Another occasional source of confusion can come from packages that have special versions. To solve certain problems found with CRAN packages running in TERR, TIBCO has modified the packages and reposted them for the convenience of TERR users.
As of TERR 4.0, for the version of TERR you are running, install.packages() first checks TRAN.tibco.com for any special versions, and if no special package is found, it installs the package from CRAN. (TERR Tools in Spotfire also first checks TRAN.tibco.com for special package versions.) In most cases, these special package versions are no longer required in later TERR releases, and the standard version from CRAN is installed.
Improving data function readability
As data function scripts get longer, they can become difficult to review and understand. Here are a few tips for improving readability:
- Encapsulate sections of code, especially sections that are reused, into functions. You will need to have the function definition at the top of your script.
- Better yet, encapsulate functions into packages. This allows you to easily shared reused code between data functions by loading the package into your data function.
- Use the Collapse features in the script editing dialog, to collapse functions and if/then clauses into a single line.
- Define sections of the code you want to collapse and hide by wrapping the section in { }, and then add a comment with a title.
In the example below, we hide label and collapse a section of code called "Sample Section 1".
Here is the section displayed:
And collapsed:
Slower than expected
Sometimes you may not know why your function is running slowly. A simple way to determine which part of the function is taking the longest is to add a set of debuging messages containing timestamps. First make sure that debug messages are enabled using the instructions on this page, then sprinkle statements throughout your data function along these lines:
cat("Function starting", date(), "\n")
terr_function_input_data_types.pdf
terr_function_output_data_types.pdf
Recommended Comments
There are no comments to display.