Statistica Workspace - a graphical UI - Spotfire Statistica / Data Science

This article is focused on the graphical user interface (UI) of Statistica which is called Workspace environment.

Introduction

Workspace is a white canvas where you can drag and drop nodes and logically interconnect them. Each node represents one functionality of the software. By inserting your nodes and connections you are creating your data analysis process. In the workspace you can load the data, clean the data, merge the data, prepare data for data analysis, use statistical methods to get insight from the data, you can build predictive models and in the end save the results.

Advantages

Node-based UI without scripting.

Visual – visualization of the whole workflow of the data analysis, from data input to results. Understandable for more users compared to a script code.
Repeatable - run the workspace multiple times as data update or even on new data sets
Reproducible - project steps are set and can be explored to see exactly what was done to obtain the results, also a wide range of documentation features is available
Flexible - the same analysis options are available in the workspace that you have in the original interactive analyses
Customizable - custom nodes can be created for the workspace and shared with colleagues

Basic information

Workspace is the main object with analysis definitions.
The saved workspace document has the suffix sdm.
By default saved workspace document is a set of steps with all the settings but it is saved in not executed form (computed outputs are not part of the saved document).
By default, all results will be sent to one workbook node called Reporting Documents.
Workspace is of the most important objects which can be stored in Enterprise meta-repository (which means, it can be versioned, scheduled, access permissions can be assigned for this object, etc.). For saving to Enterprise please use the Deploy button or analogous option in the menu: Enterprise/Deploy to Enterprise/Workspace.

Nodes

Nodes are flexible and easy to use together in a single workspace project. Workspace nodes can be accessed from the Node Browser, and from the tab/menu structure. When a workspace is active, the orange highlighting shows available tabs with workspace nodes.

The group of nodes available is governed by the selected configuration from the drop-down menu found at the upper-right corner of the workspace:

Basic information and possibilities:

Each node is representing one functionality of software.
Node or set of nodes can be copied (ctrl+C, ctrl+V) from one workspace to another.
Nodes can be deleted (Delete key).
Nodes can be moved (arrows or dragging).
Node can be renamed.

Node Properties

We have available three types of nodes: standard nodes which are fully compatible with interactive UI functionalities, legacy ones (scripted/SVB nodes), and general code node - which looks like standard node but it is a container for other types of code (R, Python, Spark Scala, C# and SVB). See also a detailed video about the node types.

At first let us describe standard nodes:

The majority of the functionalities have a standard node variant, you can see that the node menu of these nodes has the same options as interactive UI (running functionalities separately not in the workspace). The advantage is that a user, who is able to work in an interactive way will have no problems starting to work in a workspace environment. An example of Descriptive Statistics functionality is below (interactive dialog - left, node - right picture).

The choice of variables and all the settings are done in the node itself.

Icons

The node icon has a pictogram and the color of this pictogram (colors: green for operations with data, blue for statistical analyses, orange for modeling, dark blue for graphs, red for exporting functionalities, and purple for Reporting documents).

Orange background - means that the node has been not run or the settings in the node changed. For the execution of such a node, all previous nodes need to be run
White background - node is executed without problems
Red color - there is some problem with the running of this node. Clicking on the triangle icon with an exclamation mark will show you an error message

Left upper corner (gear) - left mouse click shows you dialog with the settings of the node
Left lower corner (green arrow) - will run this node and all previous not executed nodes in the same branch
Right upper corner (documents) - will open Reporting Documents workbook in place of the results of this node
Right lower corner (sheet) - this is a downstream document, this spreadsheet can be used in consecutive nodes as the input table
Middle of the right edge of the icon (yellow diamond) - arrow starts here, pressing the left button of the mouse and dragging to the different node will create the connection between nodes

Special tabs in nodes

Each standard node has two special tabs:

Downstream tab, where user can define one resulting spreadsheet which will be available for further analyses (this means which file will appear in the right lower corner of the node).
On the Home tab, there is the default description of the node functionality in the Description section. There are dates of creation and modification, which can be useful for the traceability of changes. The annotation section is for including own description which can be useful for a better understanding of the workflow.

SVB Nodes

The second set of nodes is called legacy or SVB nodes. Users will recognize them with the letters SVB at the upper part of the node. If you change the Node Browser to Legacy Procedures you will have in your tabs only SVB nodes.

The nodes are constructed in a way that the nodes have predefined outputs and predefined parameters exposed to the user. The user does not need to choose each and every option like in the standard type of nodes.

The essential difference is that the SVB node does not have typically variable selection options. Variables need to be selected upfront in the previous node. For that, the Select Variables node is appropriate. Thanks to this property SVB nodes are not limited to only one input and one output table. Output spreadsheet(s) appears as separate node(s). See below a comparison of standard and legacy node implementation on a simple example.

Tips and tricks

Please review the page with the most used nodes, together with a wide range of tips and tricks connected with them.
When you create a new workspace, there is a set of workspace templates (quick-starts) that you can choose instead of a blank workspace.
There is an available example folder with workspaces: click on File/Open Examples/Workspaces.
A good practice is to insert annotations and additional info for a better understanding of the flow. This can consist of: Renaming nodes appropriately, inserting descriptions in Home tabs of the nodes, and inserting annotations into the workspace (right-click on canvas and Annotations option), when saved into Enterprise meta-repository there is a section for inputting the description of a workspace when creating a new version of the workspace (with document versioning enabled): write a description of changes from the previous version.
You can run another workspace as part of your workflow by using node Execute External Workspace (reuse of already created work, simplifying complex workflows into parts).
You can use workspace as data function in Spotfire - see Statistica Data Function in Spotfire
You can control parameters available for Citizen Data scientists, who do not need to see the workspace but need to choose parameters for their analysis. Functionalities to control this are User View and Designer view (more can be found in this Knowledge Base article).
You can create your own Ribbon bar menu with arbitrary node functionality arrangement: go to Node Browser click on Create a New Node Browser configuration, and create your folders with desired functionalities (when you open the Node Browser you can see that folders and functionalities in Node Browser configuration have exactly the same structure as Ribbon bar menu - highlighted in orange).
For each node, you can run an additional macro (script) - typically to change the output graph options or to change the format of output spreadsheets: Customize the Output option after right click on the node (for more details please see this Knowledge base article).
If you are not sure where to find a particular node, you can use Feature Finder search functionality.
Order of running nodes: In some cases, it might be important in what order the nodes are executed, you can manage fully the order of the execution via the options in the tab Edit. Option Root Node Order is for setting order for main branches without predecessor. Via option Child Node Order option, you can define the order for the child nodes of the actually selected node.
To have a more flexible workspace, you can benefit from 'wild-card' variable selection in the nodes.
A set of nodes can be copied to another or the same workspace by selecting them and pressing CTRL+c CTRL+v. To move selected nodes within the workspace please use arrow keys.

References and Important Links

Most important nodes in Statistica Workspaces
Glossary of Statistica terms and shortcuts Glossary of Statistica terms and shortcuts
Statistica Data Function in Spotfire

Sign In

Statistica Workspace - a graphical UI