Issue
This issue is often asked about in other ways, for example:
- What should I use for a primary (unique) key when my data doesn't have one?
- I have time-series streaming data that I want to visualize with Spotfire®. Is there a way to get an auto-incrementing field to use as a primary key?
- How to use the auto-increment feature to generate primary key values for streaming data that has no natural primary key
Solution
All LiveView tables require a primary key. A primary key is required to uniquely identify each row of data in the table (akin to a primary key in a relational database). It is the case that some data has a field, or a set of fields that combined, uniquely identify each row of data. One such example is a social security number. However, not all data sets have a naturally occurring primary key, even when multiple fields from the dataset are combined. In such cases where there is no naturally occurring unique key, one can be synthesized.
Spotfire® Data Streams, also referred to as LiveView tables, have an autoincr feature. This feature is designed to be used to generate a primary key for a LiveView table.
If your data is in JSON format on a Kafka, MQTT or TCM bus, you can use the connectivity wizard to generate the project that will read the data off the bus and create a data stream for you. Within the wizard dialogs, there is a choice of time-series data with an auto-incremented primary key. A default name for the primary key is provided (EventID) but can be changed:
This choice will add the named field (EventID, type long) to the defined schema for your data stream, mark it as the primary key for the table and set it to use the autoincr feature.
If not using the Connectivity Wizard, you can still take advantage of this feature to generate a primary key for your data. To do so, you must name a field in your schema that will represent the primary key; its type needs to be long:
On the following dialog, this designated field (UID, type long) is selected as the primary index (primary key):
Once the table is defined, manually modify the table definition file (.lvconf) for the field designated to be the primary key:
Using the lvconf editor within StreamBase Studio, on the Source tab, change the primary key tag from its original form:
<primary-key> <field ref="UID"/> </primary-key>
to use the autoincr directive:
<primary-key> <field ref="UID" autoincr="true"/> </primary-key>
Each time a new row of data is added to the table, the UID field in the new row will be populated with a value that is unique. The value of that field is automatically incremented to be one more than the previous row, and thus unique.
Advantages
-
Ease of use
- Simply add the autoincr designation to the table's definition (.lvconf file) for the field named as the primary key. The uniqueness of the key is maintained for you. No additional components need be added to the publisher application.
-
Recovery
- If the implementation is configured to recover from a peer, the autoincr field is recovered from the table with peer recovery.
- If the implementation is configured to recover from logs, on restart the autoincr long starts at the next highest value recovered from the logs.
Additional Notes
Though not required, the autoincr field can be combined with whatever other key the author likes to think of as "primary" (but may not be). Restated: the field with autoincr directive is unique. As such, it need not be combined with any other piece of data, but could if you wanted.
If the user wants to recover a Spotfire® Data Stream (LiveView table) via some message bus, or other data source of their own, then the autoincr key does not work as intended.
It is true that an incrementing long is not "cluster friendly"; however, by the time a tuple has arrived at a LiveView table it's actually not a cluster anymore; rather, it's just a table. At that point in time, the only requirement is that the field designated to be the primary key be unique, which an incrementing long will be.
Recommended Comments
There are no comments to display.