How to consume and organize IOT or any data from big data sets in TIBCO Spotfire®?
In this context of big data sets we mean especially the following:
- Lots of tables in one or different databases;
- Lots of columns;
- Very high frequency of new records in these tables (data from machines that record state or output in real-time, for example, 1 record per second or at least per minute).
- Poorly organized data tables with technical non-descriptive names.
We often get access to such data environments and the first challenge is to indicate tables and fields that really matter. It is important to name them in a way that end users understand the meaning of the underlying data. We have in mind hundreds of data tables which include thousands of columns in total. In most cases, we see very technical names of columns like machinetemperatureactual or machinetemperatureset or even data1, data2, data3, data4..... In some cases, the content of these columns is not precisely known in advance until users see the data. And here comes the challenge.
You can set up column naming in SQL view, connections, TDV, information designer, or inside Spotfire, but that requires access to these features. Changing names in the early stages usually means a lot of work in each step that follows and it is very time-consuming. Sometimes this is an ongoing process in many loops so this approach is better to be avoided. How to make the column naming and displaying data in Spotfire dynamic?
Developers can set up one or two tables with descriptions of each table and columns inside. To each technical name, we can assign a human-readable column name. This table can be maintained within Spotfire through different extensions or independently in any other application or even excel. First, we import this table into Spotfire.
Next, we import different tables into Spotfire with original technical names through data connection or information link. Both will work fine but the data connection is maybe faster when adding new data sources. We do not change column names in Spotfire but we create expressions with a data function that generates a string on marking. Properties are used to store those values. Each marking inserts values in two properties which are then consumed in the expression. For example: sum([property1]) as [property2] or final result: sum([data1]) as [Actual machine temperature]. Both those property values come from our master data but they are applied in the data table from machinery.
When underlying master data changes, all connections, information links, or queries are not changed. For example, someone would like to change the name of data1 - Actual machine temperature to Machine temperature (act.). The change is done only by changing this line in the master data. When end users reload data into Spotfire, changes are visible immediately.
What about long data sets where tables consist of millions of records? Use data on demand in this case. Usually, some date/time properly is used for that with some other additional parameters like department code, machine Id, or similar. In this way, only segments of data are loaded into Spotfire. With machinery data that is a requirement in most cases anyway.
Recommended Comments
There are no comments to display.