Jump to content
  • Time Series Analytics for Spotfire


    The article discusses the creation of functions and templates for handling time series data in Spotfire, covering preprocessing, smoothing, forecasting, decomposition, pattern exploration, and anomaly detection to facilitate analysis and improve data quality.

    Handling time series data is no trivial task. Often messy, time series can be challenging to deal with and analyze correctly. To make this easier, we created a series of functions and templates available for Spotfire. In our design we considered a few principles. We aimed at enabling the developer, data scientist, and analysts in mind. We identified a number of common time series tasks and broke them down into the required practical steps. The individual functions that we are offering can be mixed and matched like lego pieces according to the requirements. We welcome feedback from any and all users to improve these functions. We continue to add new functionality to this mix based on what we observe in the community. The areas of time series analytics covered are:

    You can find our work available as part of the spotfire-dsml package via PyPI.

    time_series_preprocessing(1).thumb.png.30f3b1b64aff0e7c6ff44f36860d84e7.png

     

    Preprocessing

    It is unlikely that, without any preprocessing, you run into a perfectly clean time series. The data can be unevenly spaced, contain different time and measurement scales, and be littered with missing data. To combat these challenges, we created 4 different functions to simplify cleaning up time series data. These include missing value imputation, resampling, min/max normalization, and index normalization.

    • Missing Value Imputation: Imputes both numeric and non-numeric data gaps. Use when the dataset has missing values.
    • Resampling: Adjusts the frequency of the time series data. Use when needing to increase or decrease data granularity.
    • Min/Max Normalization: Scales numeric columns between specified minimum and maximum values. Ideal for standardizing data range.
    • Index Normalization: Standardizes the time index to fit between specified start and end datetimes. Useful when you want a consistent time index range.

    Smoothing

    The nature of time series is not always easy to interpret. Noise, along with several other factors, can inhibit our understanding of the data. Smoothing is a great way to make sense of the trend, even when it is not obvious. We have written 6 functions that perform different smoothing techniques. Included are moving average, simple exponential, supersmoother, loess, fourier, and kernel smoothing.

    • Moving Average: Averages data over a specified window. Helps reveal underlying trends in the data.
    • Simple Exponential: Applies exponential weighting to observations. Useful for giving more importance to recent data.
    • Supersmoother: Uses adaptive linear filtering to smooth time series. Great for handling data with varying noise levels.
    • LOESS (LOcally EStimated Scatterplot Smoothing): Combines multiple regression models for local smoothing. Ideal for capturing non-linear trends in data.
    • Fourier Smoothing: Uses Fourier transformation to filter out high-frequency noise. Best suited for time series with periodic patterns.
    • Kernel Smoothing: Utilizes kernel functions to smooth data. Useful for non-parametric smoothing needs.

    Forecasting

    Forecasting is a crucial step in time series analysis, allowing us to make informed predictions about future data points based on historical data. Given the complexity of time series data, which often exhibit intricate patterns, trends, and seasonality that are not immediately apparent, it is essential to apply sophisticated modeling techniques that can decipher these characteristics. Utilizing methods such as ARIMA, Holt-Winters, and Long Short-Term Memory (LSTM) networks, we have developed a suite of forecasting models tailored to address the diverse and complex characteristics inherent in time series data.

    • ARIMA: Well-suited for forecasting data where trends and patterns emerge over time, even when these don't follow a seasonal cycle. Incorporates autoregression (AR), differencing to achieve stationarity (I), and moving average (MA) components to capture the autocorrelation within the series.
    • Holt-Winters: Ideal for univariate time series with a clear seasonal pattern and trend, leveraging smoothing techniques to adjust for seasonality, trend, and level in the data.
    • LSTM: Well-adapted for a wide range of univariate time series, including those with complex patterns that may not be well-captured by traditional methods. LSTM excels in learning from long-term dependencies, making it effective for data with large seasonal patterns or when capturing subtler trends and relationships is crucial.

    Decomposition

    Another important step in time series analysis is the decomposition. This process breaks down the original time series into several components, such as trend, seasonal, and residual parts. For example, analyzing the daily precipitation patterns may call for measuring long term trends along with seasonal/temporal variations in measurements. Decomposing a time series allows for a clearer analysis of each component separately, rather than attempting to interpret the raw, combined data. We created 3 three functions in this regard:

    • Data Preparation Function: This function prepares the input data frame to be processed either by convolution of Fourier by creating a datetime index and data column.
    • Decomposing using Fourier Series: Using this function, the time series will be transformed from time domain to frequency domain and the output will be frequency range and amplitude as is shown in the following visual. Source here.
    phase_of_fft.png.2434e76f6f65acedc9de3201e0a5c3f6.png

    magnitude_of_fft.png.44d6d56b210f3bf89f5e3b46a126fb93.png

    • Decomposing using Convolution: This function will use convolution to analyze the time series into three components: seasonality, trend, residual. It assumes that the time series is additive unless the user changes it to Multiplicative. The output of the function is visualized in the following plot. Source here.
    season_trend_resid.png.7215953ff6e9551104381953f26cf2d9.png
     

    Pattern Exploration

    Identifying patterns in time series data is a key mechanism to unlock insights that can drive decision-making and strategy. By exploring and understanding these patterns, we can reveal hidden structures and relationships within the data that are not immediately apparent.

    • SAX Encoding: Symbolic Aggregate approXimation (SAX) is a technique used to convert a time series into a symbolic representation, making it easier to identify and compare patterns. This method reduces the dimensionality of the data by dividing the time series into equal-sized segments and then representing each segment with a discrete symbol. 
    • Matrix Profiling: Matrix profiling is a powerful tool for time series analysis that helps in identifying patterns, motifs, and anomalies within the data. It involves creating a matrix profile, which is a vector that stores the z-normalized Euclidean distances between subsequences of a time series and their nearest neighbors.

    In this example, SAX encoding is used to compress a time series representing daily weather data. The data is divided into equal time segments, and a summary statistic (such as the average) is calculated for each segment. Each segment is then assigned a "letter-grade" based on its statistic. By combining these letter-grades, the entire time series is represented as a single "word."

    image.thumb.png.4c6112e9799aa166b8482a6e989baa1e.png

    SAX not only significantly compresses the data but also aids in anomaly detection. By generating multiple SAX strings, we can isolate unique strings that indicate patterns not seen elsewhere in the data. This makes SAX an effective tool for identifying unusual events or trends in time series data.

    Anomaly Detection

    Anomaly detection is a way of detecting abnormal behavior. One definition of anomalies is data points that do not conform to an expected pattern compared to the other items in the data set. Anomalies are from a different distribution than other items in the dataset. Anomalies in data translate to significant (and often critical) actionable information in a wide variety of application domains. This work stream has grown into a significant piece of development in our multivariate time series work. Please see this page for a comprehensive overview of the topic, our approach, and downloadable assets.

    In conclusion, the Time Series Toolkit for Spotfire is a comprehensive suite of functions designed to streamline and enhance the analysis of time series data. From preprocessing to decomposition and forecasting, our toolkit allows data scientists and analysts to efficiently handle time series data, revealing deeper insights and aiding in data-driven decision making. For Python users, you can also install our Python time series work via the spotfire-dsml package on PyPi. If you have any suggestions or feedback, please feel free to reach out to us. We're always looking to improve and expand our toolkit.

     

    More details about this important release can be found in this community article.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...