Model Evaluation Templates for Spotfire® and Spotfire® Data Science - Team Studio - Spotfire Data Science - Team Studio

Model Evaluation templates in Spotfire® make it easy for Data Scientists and Business Analysts alike to easily view model outcomes, assess fit, and compare parameter subspaces. The templates are built with Spotfire® and, using the Team Studio Data Function integration, serve as a companion to Spotfire® Data Science - Team Studio providing an interactive filter/mark/zoom visual extension to the Team Studio platform.

Overview

Model Evaluation templates in Spotfire® make it easy for Data Scientists and Business Analysts alike to view model outcomes, assess fit, and compare parameter subspaces. The templates are built with Spotfire® and, using the Data Function for Spotfire® Data Science - Team Studio integration, serve as a companion to Spotfire® Data Science - Team Studio providing an interactive filter/mark/zoom visual extension to the Team Studio platform. Currently the templates cover regression and classification models from Team Studio, as covered below.

To learn how to connect and use these templates, please use the forums on this community to ask your questions.

Correlation and Importance

Feature correlation helps assess the high and low collinearity regions of the parameter subspace, and helps the data scientist focus on particular areas where there may be weaknesses in the model. Feature importance is also computed to help understand how the model is impacted by different features.

Regression

The outcome of a Predictor operator in Spotfire® Data Science - Team Studio returns real-valued predictions per regression model configured to the operator. Spotfire® Data Science - Team Studio performs regression with modeling operators that include Random Forest, Gradient Boosted Trees, Neural Networks, Linear Regression with regularization, and more. There are shared methods to evaluate the assumptions, fit, and biases across regression models.

The Evaluation Metrics table is useful to identify models with lower mean absolute error, mean absolute percentage error, mean squared error, and root mean squared error. On top of this, the plots are essential to see how the models fit on test or validation data collectively. Our Feature Importance table (output of any "tree" or "forest" model in Spotfire® Data Science - Team Studio), tells us the top variable importances per those models. When we use a Spotfire® filter to change the subspace of the data being fitted (here, its done for the top feature importance variable), we can see if certain models perform better or worse on this data subset. The Histogram of Residuals illustrates how spread out the errors are, and the Scatterplot of predictions against the target compares how close predictions are to the true values. In conjunction, these visuals and metrics table can be used to choose the best models and identify any areas for biases where the models might have higher errors on certain data subsets.

Classification

The outcome of the Classifier operator in Spotfire® Data Science - Team Studio returns the probability of each class a prediction can result in, per classification model configured to the operator. Spotfire® Data Science - Team Studio performs classification with modeling operators that include Logistic Regression, Gradient Boosted Trees, Random Forest and Neural Networks with regularization, grid searching and more. There are shared methods to evaluate the goodness of fit, variable importance, and display confusion matrices across classification models.

In the Classification template, several methods are available to review and evaluate the models predictions and performance. Correlations and feature importance is assessed and can be explored using interactive, and filterable visuals such as heatmaps and network charts. This allows to view which parameters influence the model and where relationships within the data exist. Model evaluation then is available in two forms: general classification evaluation, and binary (two class) classification. In the general classification ROC curves are shown for each model, as well as key model metrics to compare the models. Included are AUC, accuracy, and F measures. A confusion matrix is also supplied. When we use Spotfire's filters on the predictions and test data, we are able to view these metrics for these specific sub spaces and learn where our models are strongest, or may not perform as well. It also helps determine bias in the data by viewing sub spaces the model overly favors in prediction of classes. All the metrics presented can be viewed for each classification label allowing for further evaluation of the model performance per classification label in the data.

Finally, a further exploration and evaluation suite of functionality is available specifically for binary classification models e.g. models which only have two outcomes such as true/false etc. Here the template is able to display the distributions of probabilities per class as shown in the Prediction Probability Distribution visual. This is customizable to alter the bar chart type such as stacked or 100% stacked which helps evaluate how the probabilities predicted are spread across the two classes from different viewpoints. Further to this, a Spotfire slider is available on the left to alter where the cut off for a positive or negative classification occurs. This allows for fine tuning, and changing the balance of the predictions to favor more positive or negative classification assignments. As with the previous evaluations, the Spotfire filters allow for sub space exploration of the impact of this probability cut off also.

Please contact datascience@spotfire.com to request access to these templates

Sign In

Model Evaluation Templates for Spotfire® and Spotfire® Data Science - Team Studio

Overview

Correlation and Importance

Regression

Classification

Table of contents

User Feedback

Recommended Comments

Industries