In Spotfire Statistica® (or simply Statistica) you can work in an interactive way with one functionality at a time or through the workflow type of environment when you are "recording" the steps for further usage or for automated running next time in the form of visual workflow. This article will focus on the workflow type of user interface (called Workspace).
Functionalities
We will demonstrate several merging functionalities through the examples and you can pick the functionality closest to your use case situation. All examples mentioned below can be found with all the settings in the attached example Workspace files.
There are in fact several functionalities for merging tables (Merge, ETL, and Query Spreadsheets). Let us start with various options in the tab Data->Merge.
Merging two tables
Important notice for all the nodes from the Merge menu is that these are always merging two tables. Nevertheless, there are lot of merging types you can use. We will show all merging types with the most typical settings but this article is not an exhaustive list of these options.
Concatenate Variables (Example 1)
This feature is for merging two tables. This means that the merging node you will use should always have two active arrows heading to it in order to be set up properly. Please note that thanks to visual workflows, you can do any data prep and transformation steps before actual merging.
With the default setting of the node, the tables are simply put together next to each other, merging key for rows is simply the row number. How the tables are merged is clear from the picture below (an example is only for demonstration, typically you would merge tables with a different structure for this type of merge):
You can find this example in attached workspace file (node name is "Example 1 - Concatenate Variables").
Merge Cases (Example 2)
This functionality can be used for example when you have the same structure of the tables and want to create one table with all information. It is adding rows from the second table after the rows from the first file.
For this, you should choose the option Match by variable name inside the node:
You can find this example in attached workspace file (node name is "Example 2 - Merge Cases").
Merge Cases (Example 3)
We will show another example of the same functionality and setting as in Example 2 in order to demonstrate what the situation looks like when we have different variable names and the same ID values (as you can see this functionality is not taking into account any key for matching).
You can find this example in attached workspace file (node name is "Example 3 - Merge Cases").
Merge Variables (Example 4)
This type of merging is used when you want to join tables according to some variable (some key or ID variable for example). We will show several settings of the functionality Merge Variables:
Merging of that kind can be achieved with these settings in the node (variables used as joining key can be defined in section "Input")
Merge Variables (Example 5)
If you change the setting in Example 4 to have the Copy down option in Multiple cases section, you will have data without missing values:
Merge Variables (Example 6)
On the other hand, if you do not want multiplicities for IDs, you can use only first occurrences in the tables per ID (important setting here is in section Drop multiples).
Cartesian Variables (Example 7)
Functionality is simply multiplying whole table by options in other table:
Match Cases
We won't show this type of merging as an example. It is the same as Merge Variables but the key variable for merging is here case name (header for each row which can be numeric - row number or in the form of Text).
Merging more tables
What if we would like to merge more than 2 tables in one step?
Query Spreadsheets (Example)
Query Spreadsheets node is an elegant way to merge various numbers of tables using SQL query type manner. Each file that is connected with (heading to) Query Spreadsheet node will appear in the Advance Query Builder (or Query Builder) and form some kind of database of inputs that can be used for the construction of an SQL query. It is possible to work in a drag-and-drop way without needing to know SQL, or you can write SQL code for the required merging operation.
Remarks about the dialog: In the section on the right you can see the tables/spreadsheets available for merge, in the middle is the schema of the join (for example we can see that in the output we will not take any information from Table2). Connections between spreadsheets are done by dragging and connecting of joining keys in both tables. After double-clicking on the table connection, you can specify the type of join. When you are ready with the query go to Action->Return data.
Original data and joined table of the example above are here:
You can find this example in the attached workspace file (node name is "Example 8 - Query Spreadsheets").
Remark: Query Spreadsheets node can be used as well to do merging described in Example 2 (also for more data tables) utilizing Union operation.
Remark: Query Spreadsheets node can be used in addition also for filtering and aggregation.
Merging larger amount of csv files
Let us mention one additional merging option. If you want to perform an import of multiple csv files into one table (the data structure of all csv files is expected the same), you can use the custom Folder CSV Import node which can be downloaded from Folder CSV Import Node for Spotfire® Data Science - Workbench in the Exchange section of the community. In other words, this is solving the merging situation of Example 2 for a large number of tables.
ETL (Extract, Transform and Load)
This differs in functionality compared to previous ones because it is not only about merging but also about aggregating the data in one step. Statistica ETL provides two menu options for aligning data:
-
Time-indexed
- Aggregates data from multiple data sources based on a date/time stamp variable
- Aligned data by minute, hour, day, week, month, quarter, or year
-
ID-based
- Aggregates data from multiple data sources based on an identifier variable and an optional time variable
- Optionally aligns data by N equal intervals or N user-specified intervals
This functionality can be used in various ways. Typically, you need to define ID or/and the time identifiers in each merged file, you can define summarization characteristics for aggregated variables for each file and you need to define time variable interval settings for each file.
We will not describe in detail examples of ETL functionality, in the attachment you can find two examples of ID-based ETL in ETL examples.sdm Workspace. In addition to that, there are examples available in the help (section Statistica ETL Examples), where all the steps for settings are described in detail.
Relevant Links
- Statistica Workspace - a graphical UI - Statistica Workspace introduction
- Most important nodes in Statistica Workspaces
- Folder CSV Import Node for Spotfire® Statistica - Spotfire Community Exchange Item
- Statistica Data Function in Spotfire
- Statistica terms and shortcuts
- Spotfire Statistica® Enablement Hub
Download Examples
You can download examples used in this article in this archive: merging_examples.zip
Recommended Comments
There are no comments to display.