Advanced Workspace Execution in Statistica

Once user has existing analytics templates in the form of workspaces, there are special ways how to leverage this ready work further. For example through conditional runs, parameterization or looping. This article is describing in detail, how to conduct these tasks in Statistica.

Introduction

The purpose of this article is to show several ways how to leverage your existing analytics templates in the form of visual programming workflows (Workspace object in Statistica). In some situations, you would like to have a special behavior of your computation like conditional running, running more workflows as one job, or run workflow in the loop with different parameters. This is How To article explaining how to do achieve this. We will use for these tasks mainly a special node called Run Workspaces (which you can download from this Exchange page, the downloadable resources also include actual examples, which we are showing in this article).

Tips and tricks below will help you templatize, parameterize, and reuse your workflows to be more productive, have simpler administration, and enable delivering new important use cases.

For better orientation in this article please use the table of contents on the right.

Run workspace as part of other workspace (Situation 1)

Task: Run workspace as part of other workspace

Why: This is relevant if you want to reuse one workspace in more analyses (eg. some sequence of data preparation steps is the same for various analyses). When you reference another workspace instead of copying all needed nodes, you will have simpler maintenance afterward (in case of change, you do not need to change all workflows, only the referenced one (with data prep steps).

Solution/Example:

There are in general two ways how to do that:

Use out of the box “Execute External Workspace” node.
Equivalently use “Run Workspaces” node to achieve the same (the node which can be downloadable for free from our Exchange page).

Both of these can be used for the task of running other workspace as part of your workspace.

You can learn how to do that from Example 1 below (this one is for Run Workspaces node).

Before jumping into example, it would be good to say some words about "Run Workspaces" node. This node is working on the principle that the node has optionally several input nodes where first is defining what will be run and how, the other nodes connected with Run Workspaces node are for transfer of input data set(s0) and/or parameters for execution. Structure of workspaces could look more complicated but at the same time, you can use this same node for many other powerful use-cases.

Example 1: Simplest example – Run one external workspace

The simplest usage of “Run Workspaces” node is to run one other Workspace with the potential of sending inputs and getting back the outputs.

For that, we need to define 3 settings – what workspace to run, how input(s) need to be assigned and Name of the output table (in called workspace).

Settings in “Run Workspaces” node in order to apply this definition about the run need to be set up this way:

Setting is taking these 3 variables from "Reference Workspaces" table. Here is like main workspace looks like:

Under the hood, we are calling the following workspace which will do a cluster analysis on all variables and bring back the assignment of the clusters (“Clustering results” output).

“Clustering results” in the parent workspace is the output calculated by the child workspace but using data sent to the child workspace.

Example 2: Execute external workspace vs Run Workspaces

In this example, we would like to show how this exact example looks like with “Execute External Workspace” instead. This node is able to run one workspace defined by the path to that Workspace, assign inputs to this child workspace and brought back the output file and results in Reporting document. See the settings of that node:

Settings for “Run Workspaces” equivalent to “Execute External Workspace”:

The workspaces showcasing both approaches is here:

Example 3: Any result data function

This is another example of usage of this feature in connection with Spotfire. You can use a simple workflow with “Run Workspaces” node (like in Example 1) as a Spotfire data function. The use case is to retrieve any result from any defined workspace (from Statistica Enterprise) into Spotfire for further interactive analysis.

From Spotfire you are passing input in the form of a table with one line with information on what workspace should be run and what result should be retrieved. This table is substituted instead of the "Reference Workspaces" node, which defines what will be run.

The node “Result” output table is returned to Spotfire. You can see this scenario in this data function.

If you are interested in more Statistica+Spotfire use cases, feel free to check this page.

Run Multiple Workspaces (Situation 2)

Task: Run multiple workspaces

Why: Sometimes, you want to do more analyses on one data set or simply have a job, which will run a set of workspaces in some defined order.

Solution/Example: It is very simple to use "Run Workspaces" node for this task.

Example 4: Run multiple workspaces

One “Run Workspaces” node can run multiple workspaces in a sequence. Each row in the table with definitions for run will trigger one external workspace. The following example is running 3 different analyses defined in 3 different workspaces, input data are transferred down to be used for these calculations. Of course, if you want you can send different input data for each workspace.

Here are the workspaces triggered by the main workspace:

Here is the setting of “Run Workspaces” node:

All results are brought back into one Reporting document of parent Workspace, you can see folders defining separate runs with identification of the name of the workspace and row in the settings table. If the output table is defined, it is brought back as a spreadsheet for further analysis (in this example, each workspace returns one output).

Do not forget that the Status table with relevant feedback about full execution is outputted as well:

Conditional run (Situation 3)

Task: Run workspace conditionally

Why: In some cases, you would like to run an analysis in case of some specific result or another analysis in case of a different result. Or you want to stop execution and not do any other calculations in some cases.

How: We will again leverage “Run Workspaces” node. Thanks to the fact that the definition of what will be run is defined in the form of a table, you can change this table on the fly and decide what will be run based on some results or calculations dynamically. This is a powerful feature enabling calculations with dynamic execution logic.

Example 5: Conditional run

This example shows the scenario when you want to save results into Excel in case of an alarm, but don’t do that action if no alarm is identified.

In this example, we are using control chart alarms to define if the workspace for saving into Excel (called Save to Excel) should be run or not.

First, we are merging “Sample ID” of an alarm with the definition of run (Sample ID will be empty if no alarm is there).

After that, we insert a path to the workspace which will save the results if there is no empty Sample ID. If it is empty, we are placing an empty string which means no workspace will be run.

You can compare “Status” spreadsheet in case of an alarm (child workspace) is executed:

And “Status” spreadsheet if we have no alarm (no workspace is executed):

Remark: If you want for some reason have output of Run Workspaces node, even there is no workspace executed, you need to put string ‘Downstream Input Data’ instead of path to the workspace. This will help you in case, you want to do additional steps after Run Workspaces output.

Parameterization (Situation 4)

Task: Parameterization of the workflow

Why: The possibility of parameterization of the external workspace will enable you to use particular workspace as a template. Therefore, this workspace can be used in more situations and less replication of steps in different workspaces will occur.

How: We will again leverage “Run Workspaces” node. One of the advanced options is to define additional input table with set of parameters in it.

Example 6: Parameterize external workspace

“Run Workspaces” node has the feature to run a child workspace with parameters. So, you can change any setting of the child workspace. As an example of usage, we can mention: defining our own path for the “Export to Excel” node from Example 5 or defining the number of clusters to be created for Example 1. Another usecase will be to control parameters of Spotfire data function in the form of table, not in the form of set of document properties. Below is an example showing clustering use case.

As was already said, you can utilize parameters by sending a special table with parameters into the “Run Workspaces” node and enabling that feature in the node (defining the name of the table in the settings spreadsheet for “Run Workspaces”). See example:

The last table is the most important one because this table defines which parameters should be controlled. In our case, we can control the number of clusters in the clustering (line 1) and True/False value to use cross-validation (CV) for letting the algorithm pick the best number of clusters.

If we run with the settings in the screenshot, we will get this result of frequency of clusters:

So, the CV picked 4 clusters as the best.

If we change the “Parameters” spreadsheet this way:

We will get 5 clusters in the results:

So, the child workspace is influenced by the settings in the parent workspace.

Remark: Tricky part here is to get the information on how the parameters are represented (how to get the proper shape of the Parameters table). If you understand the background Statistica object model, you can look in the help of the “Run workspaces” operator. But we have prepared for you also script which will do the job automatically. You can find it as part of the release.

Loop (Situation 5)

Task: Run the same analysis many times with different settings

Why: You have one task and you want to repeat it more times but with different settings. E.g. trying to run the same method with the different hyperparameters or running the same analysis on multiple data sources.

How: This is the most advanced usage of “Run workspaces” node. You need to use parameterization, run multiple workspaces and you need to update features (which where not shown yet).

Example 7: Loop

We will show a simple example where we will do the clustering. In each loop step we will calculate results with different numbers of clusters. So, we will call the same workspace (clustering workspace) many times, each time with a different parameter.

For such implementation, we need to introduce the feature of updating file in the parent workspace after some child workspace is run.

Above is the execution schedule. We can see that we will run the same “K-Means for loop” workspace 8 times. The first run is a bit different where we are taking as parameter the definition table “Parameter0” (defining the number of clusters as 2). “Parameter0” is used to define the number of clusters in the child workspace in the clustering node and at the same time, it is substituted to the “Parameter” table in the child workspace. Node transformation “new parameter” increases the number of clusters by 1 and the “Update assignment” definition will ensure that in the next run the table “Parameter” (of parent workspace) will be updated by “new parameter” table (from actual run of child workspace).

In the reporting documents, you can see results from all runs, each has different number of clusters.

Execution log looks this way:

Remark: Be aware, the updated table has one specialty. If you open it directly (in parent workspace), it will never show an updated value. The table is updated but it is not visible from that table directly. But if you place any node after the updated table, you will see that table is actually updated. In our example, please run the node “Only to check value of updated parameter”.

Sign In