Jump to content

Random Forest - Data Function for Spotfire® 1.0

2 Screenshots


Random forests are an ensemble decision tree machine learning method for classification and regression.


Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of over-fitting to their training set.

Random forests can be used in many areas, such as modelling and prediction of binary response variable, such as offer acceptance, customer churn, financial fraud or product / equipment fail; as well as explanation of detected anomalies.

This data function includes the R/TERR code for the Random forest model, missing data imputation and random Over-Sampling Examples to resolve un-balanced class issue for binary response variables.  It uses the CRAN randomForest package within the Spotfire interface.  It is focused on supervised classification with a binary response variable.  Random forest can also be used with unsupervised machine learning, but this is not addressed in this release.

The distribution also includes an Iron-python script to filter dependent variables based on the selection of the independent variable.


Data Function Documentation

Description of Input parameters to the data function:





Data table with arbitrary number and names of explanatory columns (Column type: integer/real/string)

 Suggest sending multiple columns using Spotfire Expression

$map("[AnalysisData].[${ExplanatoryColumns}]", ",")  where ExplanatoryColumns is a document

property limited through  (datatype:real or datatype:integer or datatype:string) and isIncluded:TRUE and not depColumn



Data table with binary response in the form of integer (1/0) or string (churn/active) format

Suggest sending multiple columns using Spotfire Expression [AnalysisData].[${depColumn}], where depColumn is a Spotfire document property



character string to indicates the true state (event happens) from resp.Co


Description of output parameters to the data function:






Percentage of true stage from the response variable


Table with 4 columns

Information on true positive/true negative/false positive/false negative counts and percentage


Table with 4 columns

Variable of Importance table with mean decrease accuracy and mean decrease Gini index


Table with 2 columns

Information for generating the ROC curve


string value

Fatal error message during the function execution


string value

Warning message during the data cleansing stage


binary value (blog)

Random Forest Modelling object

Spotfire demo (.dxp) file

Using the dxp file with your own data:

The distribution also includes a Spotfire .dxp file.  The primary function of the .dxp file is to provide an example illustrating how the embedded data function could be wired up to your data in your own .dxp. It is not intended to provide a complete analysis solution.  However, you can still replace the embedded data with your own data using the following procedure:

1) The input in the dxp is the AnalysisData table. You can start with the provided dxp and just replace the AnalysisData with your own data

2) Go to the variable selection tab and select the independent and dependent variable

3) Click Refresh Model button

Release P1.0

Published: March 2017

Initial Release

  • Create New...