Jump to content
We've recently updated our Privacy Statement, available here. ×
  • Python toolkit for data science and machine learning in Spotfire


    We show what spotfire-dsml and our motivation behind creating a python toolkit are. The goal is to enhance data science and machine learning capabilities in Spotfire along with its components.

    Introducing spotfire-dsml: Enhancing Data Science and Machine Learning in Spotfire

    If you ever create data functions in Spotfire, our Python package will be a new, valuable resource for you. In the ever-evolving landscape of data science and analytics, ease of access and efficiency are of paramount importance. That's precisely where spotfire-dsml comes into play. This Python package is set to augment the way we approach data science within the Spotfire platform. With a robust vision, spotfire-dsml seeks to empower data scientists and analysts by significantly reducing the time to value for creating analytics-rich applications within Spotfire.

    The vision behind spotfire-dsml is to create reproducible machine learning pipelines seamlessly integrated with Spotfire. This integration enhances the analytic capabilities within Spotfire, making it a powerhouse for data scientists and analysts across several industries like Pharmaceuticals, High-tech manufacturing, Energy, and others. 

    By providing ready-to-use Python functions spanning various data science, analytics, and data manipulation use cases, spotfire-dsml aims to democratize data science within Spotfire. The package aims to evolve continuously, ensuring it stays on top of the latest and greatest in data science.  

     

    What is inside spotfire-dsml?

    The spotfire-dsml package includes the following modules:

    1. ML Modeling (ml_modeling): Dive into pipeline-centric model training and evaluation. Whether you're a seasoned data scientist or just starting, this module equips you with the tools to build robust machine learning models effortlessly.

    2. DS Module (time_series): Time series can be messy and challenging to work with. This sub-module contains functions for time-series data which specializes in time-series preprocessing, smoothing, decomposition and forecasting ensuring your analyses are fast, accurate, and reliable.
    3. DS Module (nlp_preprocessing): For those delving into the world of text analytics, this sub-module offers pipeline-centric preprocessing solutions. It simplifies text data preparation, a critical step in natural language processing tasks.
    4. Explainability Module (ml_explain): Uncover the mysteries of model explainability using the XWiN methodology. Gain insights into your models, making your predictions more transparent and trustworthy.

    5. Monitoring Module (ml_drift): Detect and measure drift in your models with ease. Keeping your models up-to-date and accurate is crucial. This module simplifies the process by enabling you to decide when to trigger a new rebasing or retraining process.

    6. Distribution Fitting (distribution_fitting): Distribution fitting and normality testing is useful, and at times, even a critical process across numerous industries. This sub-module aims to simplify the distribution fitting and normality testing processes with functions that can be applied to full datasets, rather than working with one column at a time.

      Screenshot2023-09-29at11_33_39AM.thumb.png.c78a63f2a61ca84b2d3aaa8a97e3edd3.png
       

    Let's delve deeper into each module, providing detailed insights into how spotfire-dsml can transform your data science workflows. 

     

    ML Modeling

    Screenshot2023-09-28at4_15_25PM.thumb.png.815f48bfaee1429452ca5180cbd53d4f.png

    This module addresses need to train reproducible machine learning models. The functions within this module help users create general-purpose Python pipelines for regression and binary classification for tabular data (to begin with). In most cases, data preprocessing must be done. Creating pipelines that include both preprocessing and model training is unavoidable, because all steps - some of which may be learned preprocessing steps, like imputation or encoding -  must be executed at scoring-time as well. 

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    DS Module - Time Series

    Screenshot2023-09-29at11_55_17AM.thumb.png.9746a48e4657b47b0efa92da883ccf4f.png

    Time series can be messy and challenging to work with. This sub-module makes time series analytics more accessible by providing a series of functions for preprocessing, smoothing, decomposition and forecasting, mitigating the usual issues that come up. Functionality includes normalization, both on the time and measurement axes, resampling, missing value imputation, along with a handful of different smoothing techniques. With these functions, understanding your time series will be both easier and faster.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    DS Module - NLP preprocessing

    SpotfireNLPToolkit-search.thumb.png.180da5b60674aada08f4571073312b7a.png

    Natural Language Processing (NLP) is a branch of Artificial Intelligence that focuses on the ability of machines to understand text. This sub-module performs a range of exploratory text analytics and text classification on any text. This pipeline centric solution preprocesses or cleans the text and engineers the text into n-grams. It includes the option to remove numbers and special characters (these are with respect to the English language), normalizes text using stemming/lemmatization and supervised classification.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Explainability Module

    explainability.thumb.png.65398a04f104ab22c27f2b1758efddd7.png

    It is time to deploy the model and make predictions! Machine learning models are increasingly being used for making decisions in situations that directly affect humans. Examples are financial, legal and medical settings. This module introduces a novel model agnostic technique developed by us, named XWIN(eXplanations from the impact of Withholding INformation) to assess the importance of an input-feature for a particular model-outcome that works with numeric and non-numeric features. 

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Monitoring Module

    drift.thumb.png.84dd549e7e711dc2b8b9ea7f56193922.png

    Supervised machine-learning models derive their predictive power from exploiting statistical patterns in the space of the predictor and target variables. Models operate under the implicit assumption that the patterns found during training also exist in the prediction data, or (in more technical terms) that the training and prediction data are drawn from the same distribution. This module evaluates potential drift when a trained model is applied to data where either the data or the underlying rules are likely to change with time.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Distribution Fitting

    Screenshot2024-03-06at4_52_27PM.thumb.png.c5231ed04f0255dd08e0c0388f3ec617.png

    Distribution fitting and normality testing is useful, and at times, a critical process across numerous industries. However, it can be a tedious process if you have several columns and are fitting one column at a time. This sub-module aims to simplify the distribution fitting and normality testing processes with functions that can be applied to full datasets, rather than just individual columns. With this sub-module, you can perform visual and statistical methods of distribution fitting and normality testing, parameter estimation, distribution prediction, and probability prediction.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...