Spotfire Statistica® Extract, Transform, and Load

This article describes the capabilities of the Statistica ETL module.

Overview

This module provides analytics for:

aligning time-stamped process data with other data sources, such as process data collected at different time intervals, or only collected once per part, ID, batch, etc.
aligning id-based data with other data sources
processing of batch-time data, to achieve equal batch "lengths", and unstacking such data to make them available for subsequent analyses and process monitoring of the maturation process (Multivariate Statistical Process Control module helps with the computations).
aggregating and/or smoothing of data, so that meaningful subsequent process monitoring methods (e.g., for change-point or trend detection) can be applied to robust or smoothed estimates of process averages within aggregated time intervals.

Spotfire Statistica® Server needs to added onto this product to provide the following benefits:

configuring complex data alignment tasks of multiple diverse data sources into a single ETL object, which can be deployed into a metadata store, to be applied ad-hoc or as a scheduled task, to support a dedicated data warehouse that maintains validated and aligned data for comprehensive process monitoring and optimization
secure platform for managing efficiently multiple database connections to various types of databases, including process databases (e.g., via the specialized Statistica PI Connector). The product will store the metadata describing the nature of the tables that are queried, such as control limits, specification limits, valid data ranges, etc.

Extract Data

Statistica Extract, Transform, & Load (ETL) combines the capabilities of the Statistica system for efficient processing of data from standard databases (Microsoft SQL, Oracle) as well as specialized process databases with the PI Connector (e.g., OSI Pi), with Statistica's data processing capabilities for data filtering, aggregation, and analyses. As mentioned above, Statistica ETL can be combined with the capabilities of Statistica Server for an advanced Statistical process monitoring solution. This solution can support highly specialized data warehouses that can integrate time-stamped parameter data for multiple process steps with quality, rework, and outcome data.

Transform Data

The Statistica ETL module provides unique capabilities for processing and merging data, in particular process data that are difficult to manage using standard database tools.

Aggregation, alignment, and replication of time-stamped data

In order to monitor ongoing continuous processes, such as chemical or pharmaceutical manufacturing, power generation, refining, and so on, it is necessary that critical process parameters be recorded into a process "historian" at regular time intervals. Dedicated high-performance databases, such as the OSI Soft's PI database, are typically deployed to provide efficient high-frequency data recording capabilities. However, to make such data available for useful data analyses, e.g., for root-cause analyses or process monitoring, it is necessary that such data are aggregated and aligned, for example, with outcome data.

Statistica ETL provides simple tools to automate the process of aligning time-stamped process data with other data sources, such as process data collected at different time intervals, or only collected once per part, ID, batch, etc.

Automatic stacking and unstacking, and normalizing of batch-time data, for batch processes

The manufacture of pharmaceuticals and chemicals often involves the processing of batches of materials through multiple steps, where in each step some maturation of the batch is recorded. The resulting data, recorded into some laboratory information management system (LIMS) consist of time-stamped process data, organized by batch ID. In order to make such data available for useful data analyses, it is necessary to transform the time-stamps into elapsed-within-process-step times and to normalize the data so that for each batch a comparable number of elapsed time recordings are available for analyses.

Statistica ETL provides efficient tools for processing batch-time data, achieving equal batch "lengths," and unstacking such data to make them available for subsequent analyses and process monitoring of the maturation process (see also Statistica MSPC for details).

Aggregating data using robust statistics

The aggregation of real process data (e.g., time-stamped one-minute-interval data to align with hourly data) usually requires the application of aggregation methods that go far beyond the capabilities of standard database tools. For example, time-stamped data may include outliers, or may be very "noisy," thus hiding important trends or changes in trends.

Statistica ETL provides numerous tools and methods for aggregating and/or smoothing data so that meaningful subsequent process monitoring methods (e.g., for change-point or trend detection) can be applied to robust or smoothed estimates of process averages within aggregated time intervals.

Aggregation and alignment of multiple varied sources

Complex processes, such as the manufacture of semiconductors, pharmaceutical manufacturing, etc. require complex data storage, suited to the specific nature of the process that is to be recorded and monitored. Therefore, it is common that multiple separate databases or data sources, such as automatically created (from gages) CSV files, data from OSI PI, assay data from a LIMS system, etc., must be aggregated and aligned, to enable meaningful root cause analyses of problems, or comprehensive process monitoring.

Statistica ETL provides tools for configuring complex data alignment tasks of multiple diverse data sources into a single ETL object, which can be deployed into Statistica Enterprise, to be applied ad-hoc or as scheduled ETL tasks, to support a dedicated data warehouse that maintains validated and aligned data for comprehensive process monitoring and optimization.

The Transformation capabilities of Statistica ETL go far beyond those available in standard database or querying tools, and will allow you to build dedicated specialized data warehouses to optimize your processes without the need to program custom-applications in-house. Statistica ETL is the one-stop solution for creating data warehouses with automated simple and sophisticated analytic capabilities that will allow you to derive the full value from the data that you are collecting!

Load Data

The Statistica ETL solution will automate the process of validating and aligning multiple diverse data sources into data tables suitable for ad-hoc or automated analyses. When deployed inside the Statistica Enterprise framework, data can be written back to dedicated database tables, or to Statistica data tables, to provide analysts or process engineers convenient access to real-time performance data, without the need to perform tedious data preprocessing or cleaning before any actionable information can be extracted.

Sign In

Spotfire Statistica® Extract, Transform, and Load

Overview

Extract Data

Transform Data

Aggregation, alignment, and replication of time-stamped data

Automatic stacking and unstacking, and normalizing of batch-time data, for batch processes

Aggregating data using robust statistics

Aggregation and alignment of multiple varied sources

Load Data

Table of contents

User Feedback

Recommended Comments

Industries