Deleting rows using a row filter data transformation
If you have rows that you know will not be useful in your analysis going forward and you expect these rows to stay excluded in future user sessions and after data reloads, then deleting rows using a filter transformation is most probably the way to go.
The filter rows data transformation allows you to remove rows based on what you have selected in filters. If you know that the values you have selected in a filter are the only values of interest going forward, you can remove all other corresponding values (rows) with one click. This automatically creates a new 'Filter rows' data transformation. The data transformation is automatically documented in the data canvas, and can be edited just like other data transformation types.
In the image below, we have created a filter and selected values to filter on. Once done, right-click anywhere on the filter and select Create Filter Transformation.
The filter transformation is automatically created and you can continue with your analysis. The transformation will be applied and your rows will be removed also when you reload data or when the analysis file is opened the next time.
If you would like to review, edit or remove the filter transformation, just navigate to the data canvas and click on the transformation step in the source view.
The expression defines what values to include. For a value column, the expression could look something like the example below.
(([fare_amount]>=20.0) AND ([fare_amount]<=150.0)) OR ([fare_amount] IS NULL)
You can also remove rows based on dates, as seen below.
The expression editor is reached through the data canvas, as for all data transformations.
This is an example of a filter expression on an integer column:
([payment_type] NOT IN (1, 2)) OR ([payment_type] IS NULL)
You can also insert a filter rows transformation manually from the summary view when adding imported data, or later, from the data canvas.
Deleting rows in a visualization (session-based delete rows)
When marking data in an in-memory visualization you can delete marked rows from the visualization and from the final data table, see image below. This feature is designed as a per session feature. This means that information about which rows were deleted is not stored in the analysis file. The next time you open the analysis file, the rows might be visible again. This example dives deeper into this behavior. We will also explain why the rows actually continue to be deleted when the final data table is embedded in the analysis.
The image below shows that the delete rows operation is visible in the source view of the data canvas:
Two ways of storing data
In this first example, data is stored in the analysis. This has been chosen to demonstrate that the delete rows behavior is not related to data being (re)loaded from the data source file or database. The 'Stored data' option means that Spotfire will only reload data from the file or database if you manually decide to do so.
In the image below, you can see that the data from the data source has been configured to be stored in the analysis file. Keeping the data linked to the source, but setting data loading to 'Stored data' is, in most cases, the preferred way of storing data in your analysis. With this setting, you allow editing in the source view in multiple ways. You can also eaily switch to the other data loading options: 'Always new data' or 'New data when possible'.
In the image below, the final data table has instead been embedded using the 'Embedded in analysis' setting in the Data Table Properties dialog.
Notice how embedding the final data table locks the source view from being editable, a state you probably want to avoid.
Note: To prevent users from embedding final data tables by accident, there is no longer an option to do so in the Save dialog. Instead, an overview of all your data tables is provided, to make it easy to see your current data loading configuration for all data sources:
See also the topic 'Embedded or Linked Data' in the Spotfire Analyst help, for more information.
Two ways of refreshing stored data
Stored data per data source
If you have selected to use stored data for many data sources, you can refresh each source individually from the data canvas, using the two-arrow button seen in the image below. Refreshing individual data sources is very useful if you only need to refresh a smaller data source and not a larger data source, which both contribute to a single data table.
If you use the reload options in the menu to refresh data, the final data table is refreshed for either data provided by all linked data sources only, or by both linked and stored data sources.
The 'Reload linked data' option will not refresh data from a file or database data source configured to use 'Stored data'. You can reload individual stored data sources by clicking the reload button for that specific data source in the data canvas.
The 'Reload all data' option will refresh any data that is possible to reload, including linked data with Data loading set to Stored data and embedded data from sources that are possible to reload (provided that the source is available).
Embedded data in the final data table - should in most cases be avoided
If you have embedded data in the final data table this overrides all individual data source settings and all data that can be reloaded will be reloaded when clicking on the Reload all data option. A refresh of an embedded data table will therefore bring back your deleted rows, see next section.
How do these two ways of storing and refreshing data impact the delete rows operation?
The row filter transformation is basically not affected. If a data source having a row filter transformation is reloaded, the transformation will be executed and the rows corresponding to the expression will be kept.
The session-based delete rows operation works differently depending on whether the final data table is embedded or whether data has been stored for each data source.
If the data is stored for each data source and not embedded in the final data table, the session-based delete rows function is only available during a session. In your next session, Spotfire doesn't have a record of which rows were removed and when the final data table is recreated from the data stored in the data sources upon loading, the remove rows operation won't execute. You will see that the number of removed rows is 0 in the source view:
If the data is embedded in the final data table, data is not refreshed towards the data sources or from the file or database and thus the final data table is not changed when opening the analysis. However, if you refresh the final data table, the delete rows operation will be executed, but since Spotfire doesn't have information about the deleted rows stored, all rows will now be available again. As long as the final data table is not refreshed, the deleted rows remain deleted, for example, when you save and open the file.
Recommended Comments
There are no comments to display.