Jump to content
  • How to build Statistica data functions in Spotfire


    This page is there to help users with building Spotfire Statistica® or simply Statistica data functions for Spotfire®. We will present several examples of data functions together with Spotfire applications using them. The user is able to download all these examples, try and learn from building them.

    This page is focused on examples and it is not describing how to enable integration between Spotfire and Statistica or what a Statistica Data Function is. For these topics, please visit this community page.

     

    Examples

    Example 1 (Correlation)

    Materials: You can download example files (underlying workspace and dxp file) on this community exchange article (we are referring here to the Correlation Example of this Exchange item). You can see the whole process of building this example in this video:

    Description: This example of the Statistica data function is used for the computation of a correlation matrix from input data. The correlation matrix is recomputed automatically according to marking in the Spotfire visualization itself. This means input data are dependent on interactive choices within the Spotfire dashboard.

    It is using this simple workspace (if you are not familiar with Statistica workspaces, please visit this article).

    workspace_correlation_1.jpg.7aa9dd815ebd06ac5741a5e28cd9b71a.jpg

    What this example shows:

    • Example of single functionality computation brought back to Spotfire
    • Example of one Input and one Output data function (no parametrization involved)
    • Example of automatic triggering data function after the change of Marking
    • Example of universal data function (can be applied to different data without changes)

    In the dashboard there is the Correlation matrix table, the values of that table are computed results from the data function:

    correlation_output.jpg.905b6b4f626cf76fc467882867ab2358.jpg

     

    Feature highlights: Automatic triggering on change and responsiveness according to marking.

    • This is enabled through the checkbox Refresh function automatically and Limit by Marking

    automated_triggering.thumb.jpg.60af954bae68eba1fd1d2b5e41b79f96.jpg

    After changing the Marking in another visualization, the correlation matrix is recomputed automatically.

     

    Example 2 (Clustering)

    Materials: You can download example files (underlying workspace and dxp files) on this community exchange article.

    Description: This Statistica data function is used for clustering tasks which together with various additional outputs also a great foundation for anomaly detection. You can control through the Spotfire application the number of clusters or let the software choose the optimal number of clusters. 

    It is using this simple workspace for computation of k-means clustering on all variables in the file which is:

    workspace_clustering.jpg.831814e0bd17eabfd1d07cac7c002eec.jpg

    What this example is showing:

    • Example of single functionality computation brought back to Spotfire
    • Example of one input and multiple output data function together with data function parameterization involved
    • Example of triggering data function on demand after pressing the action button?
    • Example of universal data function (can be used without changes on new data)

    Below is the video dedicated to this example. The video is ends with the dashboard without action controls.

     How to define action controls is described below.

    Feature highlights: Parameterization of the data function and action controls for triggering the data function.

    • Parametrization: The user can choose parameters that he/she might be possibly using in Tools->Statistica->Modify data function definition->Parameters tab:

    parameter_setting_clustering.jpg.f93bee3d7179070407f44b7f295e85e7.jpg

    In our example we have two parameters: The number of clusters and V-fold cross-validation:

    Users can assign values to these parameters in  Tools->Statistica->Configure data function parameters->Input tab or Data->Data Function properties->Edit Parameters...-> Input tab:

    clustering_assigning_of_parameters.jpg.0cbace5b88b8b131842baeb1bbe98b26.jpg

    We can assign concrete value or (like in our example) we can assign the value from document properties called in our example CV (for V-fold cross-validation parameter) and numberOfclusters (for Number of clusters parameter). Assigning the document property has the advantage because you can let the user define the value of that property using sliders, list boxes, or checkboxes interactively, like in the section Parameters for Clustering of the example dashboard:

    choice_of_parameters_clustering.jpg.1d9201f283a4d44772a772a82b0114f9.jpg

    This Text Area is constructed using Insert Property Control (green color) and Action control (blue color):

    text_area_settings.jpg.792cd6acb2b03df42f3b8647e6c169f9.jpg

    The definition of a button triggering the data function is the following:

    action_control_clustering_0.jpg.958644160abc4f0bed7cb7db6afd8af2.jpg

    And slider definition:

    slider_definition_clustering.jpg.e8a8595989fa8982807c55018d2ad963.jpg

     

    Example 3 (Variable Importance)

    Materials: You can download example files (underlying workspace and dxp files) on this community exchange page.

    Description: This is an example dxp where the Statistica data function is used to identify the best predictors for the classification task. Together with that, some basic data cleaning is applied inside the data function before the actual best predictors evaluation. Variable importance is computed on an interactively selected subset of the data with the possibility to define variables coming to analysis from dropdowns.

    Inside the data function, this workspace is used:

    workspace_1.jpg.840fa71a221be44b08667f30d2d3670a.jpg

    The main part of the dashboard where the data function is used is on the page Investigation

    variable_importance.thumb.jpg.9f201e4d220852a6837c80711d91c958.jpg

    You can choose points to be included in the analysis (marking on the upper left graph), and variables to be investigated in the upper right variable lists. After pressing a button data function is triggered and the bottom right graph of the most important variables is retrieved.

    What this example shows:

    • Example of more steps of computation inside the data function (data cleaning and variable importance)
    • Example of one Input and one Output data function  with parametrization involved
    • Example of variable names  parametrization
    • Example of on-demand triggering data function computed on a subset of data defined by marking

    Feature highlights: Parameterization of variable selection.

    • This example uses parametrization for the number of best predictors (this parameter is transferred in the same way as parameters in Example 2), in addition to that variable selection is transferred to be used inside the Feature Selection node. How to set this parameter is in detail described in this article.

     

    Example 4 (Statistical process control charts)

    Materials:  You can download example files (dxp file) on this community exchange page.

    Description: This template is designed to enable users to build a wide range of quality control charts inside the Spotfire application with the possibility does define chart specifications interactively according to users' needs. It is a showcase of replicating wide Statistica functionalities inside Spotfire without the need for the final user to interact with the Statistica environment. 

    What this example shows:

    • Example of a complex ready application built on results of Statistica data function
    • Example of a template (you can use your own data for analysis without the need for changes in the dashboard or the Statistica workspace)
    • Example of data function with one input, multiple outputs, and multiple parameters 
    • Example of automatical triggering the data function after changing the parameters of the data function, parameters are exposed to the data function via various dropdowns and sliders.

    Inside the template, this workspace is used:

    spc_workspace_0.thumb.jpg.869ee132d0800ac4b3f38d0440059815.jpg
    Remark:
     Underlying workspace is using some custom nodes.

    One of the pages in the output Spotfire template:

    scp_dashboard.thumb.jpg.ba5cf6c7d6acce1bb61fd04929381178.jpg 

     

    Example 5 (Model Diagnostic plots)

    Materials: You can download example files (underlying workspace and dxp files) on this community exchange page.

    Description: This is a template for calculating various model quality diagnostic plots from input data with probabilities. It is a great showcase for education on how various diagnostic charts are dependent on each other as you can see in the screenshot below.

    diagplotsspotfire.thumb.jpg.efdbceda628becd7460957eb5734f4ff.jpg

    What this example shows:

    • Example of data function using complex data transformation logic 
    • Example of data function using more input data sets as well as more output data sets, in addition, parametrization of variable selection is used
    • Example of triggering the data function by pressing a button?
    • Example of the universal data function (can be applied on different data without changes in any settings and without changes to underlying workflow)

    Feature highlights: Complex data transformation operations within the data function workflow.

    Data function is rather complex because it needs to be universal for any type of data set (any variable names, category names, etc.). Also, there are arbitrarily added optimal and random models which are typically serving for comparisons but need to be handled differently in the workflow. For educational purposes, it is purely transformation based to see how all the points in charts are computed from the input probability table. Every node in the data function has inside the annotation which should help understand all parts and their purpose.

    diagplotsstatistica.thumb.jpg.d03449870681d91f206cfc6e95483140.jpg

    Let us describe the background of property controls in Spotfire in detail.  The whole prediction data are sent as input to the data function but right in the first node after that, it is filtered to use only variables highlighted as response and probability variables (Step 1 and Step 3), it is used the same concept of variable string definition as in Example 4. There is one more input parameter using information from the variable selection in Step 3 and this is called Variables (parameter for node Stacking - to have proper names of probability parameters to stack).

    dplotssetting1.thumb.jpg.0be4fe263e0272e548493ec21f0ffbec.jpg

    For proper functionality, defining the code in question is essential as well. There might be several ways how to do it, we decided to do it via sending the file which will have only chosen code in one column filled with code highlighted (this input parameter is called event code and it is substituted to event code node). This is defined by highlighting the code in Step 2, this response distribution is dependent on what is chosen in Step 1. We are using the $esc function and the response document property to get the string in the form [data.predictions].[responsevariablename] as expression, please take note that Marking (2) is applied for that data input. 

    dplotssetting2.thumb.jpg.813b79674a088b7c1df77031d576b5c0.jpg

     

    Additional examples  (Statistica Enterprise objects)

    Description: If you have an evolved Statistica Server installation, you might benefit from already created objects inside Statistica Enterprise meta-repository. These objects can be accessed as well through the data function feature of Spotfire. These specific features are in more detail described in this community article or showcased in this video:


    Materials:  You can download applications using these integration features on these Community Exchange pages: 


    Remarks

    • Inside the dxp of all mentioned examples, there is an additional explanation on the Cover page.
    • Do not forget that starting with the 13.6 release of Spotfire Statistica, it is possible to send and use dxps with data functions from the library. Consumers running dxps in the web browsers are also allowed to utilize defined Statistica data functions. 

     

    Important links


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...