Jump to content
  • Python toolkit for data science and machine learning in Spotfire


    We show what spotfire-dsml and our motivation behind creating a python toolkit are. The goal is to enhance data science and machine learning capabilities in Spotfire along with its components.

    Introducing spotfire-dsml: Enhancing Data Science and Machine Learning in Spotfire

    If you ever create data functions in Spotfire, our Python package will be a new, valuable resource for you. In the ever-evolving landscape of data science and analytics, ease of access and efficiency are of paramount importance. That's precisely where spotfire-dsml comes into play. This Python package is set to augment the way we approach data science within the Spotfire platform. With a robust vision, spotfire-dsml seeks to empower data scientists and analysts by significantly reducing the time to value for creating analytics-rich applications within Spotfire.

    The vision behind spotfire-dsml is to create reproducible machine learning pipelines seamlessly integrated with Spotfire. This integration enhances the analytic capabilities within Spotfire, making it a powerhouse for data scientists and analysts across several industries like Pharmaceuticals, High-tech manufacturing, Energy, and others. 

    By providing ready-to-use Python functions spanning various data science, analytics, and data manipulation use cases, spotfire-dsml aims to democratize data science within Spotfire. The package aims to evolve continuously, ensuring it stays on top of the latest and greatest in data science.  

     

    What is inside spotfire-dsml?

    The spotfire-dsml package includes the following modules:

    1. ML Modeling (ml_modeling): Dive into pipeline-centric model training and evaluation. Whether you're a seasoned data scientist or just starting, this module equips you with the tools to build robust machine learning models effortlessly.

    2. Time Series (time_series): Time series can be messy and challenging to work with. This module contains functions for time-series data which specializes in time-series preprocessing, smoothing, decomposition, pattern exploration and forecasting ensuring your analyses are fast, accurate, and reliable.
    3. NLP (nlp_preprocessing): For those delving into the world of text analytics, this module offers pipeline-centric preprocessing solutions. It simplifies text data preparation, a critical step in natural language processing tasks.
    4. Explainability Module (ml_explain): Uncover the mysteries of model explainability using the XWiN methodology. Gain insights into your models, making your predictions more transparent and trustworthy.

    5. Monitoring Module (ml_drift): Detect and measure drift in your models with ease. Keeping your models up-to-date and accurate is crucial. This module simplifies the process by enabling you to decide when to trigger a new rebasing or retraining process.

    6. Distribution Fitting (distribution_fitting): Distribution fitting and normality testing is useful, and at times, even a critical process across numerous industries. This module aims to simplify the distribution fitting and normality testing processes with functions that can be applied to full datasets, rather than working with one column at a time.

    7. Missing Data (missing_data): Across industries, handling missing data is a crucial step in any project. Without properly handling missing data, models can become biased, and results can be inaccurate. This module aims to simplify the process of handling missing data by summarizing, removing and imputing missing values for tabular data

    8. Geoanalytics (geo_analytics): Geospatial analytics involves analysing data with a focus on location awareness, managing relationships between different locations, and measuring quantities at various sites. Location data poses additional challenges - the Earth is not flat, not even a sphere or a perfect ellipsoid. This module aims simplify the process of handling geospatial data in Spotfire by integrating diverse coordinate reference systems, creating and transforming shapes, performing spatial joins and proximity analysis, and exporting geographic datasets as Shapefiles or GeoJSON 

     

    Let's delve deeper into each module, providing detailed insights into how spotfire-dsml can transform your data science workflows. 

     

    ML Modeling

    Screenshot2023-09-28at4_15_25PM.thumb.png.815f48bfaee1429452ca5180cbd53d4f.png

    This module addresses need to train reproducible machine learning models. The functions within this module help users create general-purpose Python pipelines for regression and binary classification for tabular data (to begin with). In most cases, data preprocessing must be done. Creating pipelines that include both preprocessing and model training is unavoidable, because all steps - some of which may be learned preprocessing steps, like imputation or encoding -  must be executed at scoring-time as well. 

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Time Series

    Screenshot2023-09-29at11_55_17AM.thumb.png.9746a48e4657b47b0efa92da883ccf4f.png

    Time series can be messy and challenging to work with. This module makes time series analytics more accessible by providing a series of functions for preprocessing, smoothing, decomposition, pattern exploration and forecasting, mitigating the usual issues that come up. Functionality includes normalization, both on the time and measurement axes, resampling, missing value imputation, along with a handful of different smoothing techniques. With these functions, understanding your time series will be both easier and faster.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    NLP preprocessing

    SpotfireNLPToolkit-search.thumb.png.180da5b60674aada08f4571073312b7a.png

    Natural Language Processing (NLP) is a branch of Artificial Intelligence that focuses on the ability of machines to understand text. This module performs a range of exploratory text analytics and text classification on any text. This pipeline centric solution preprocesses or cleans the text and engineers the text into n-grams. It includes the option to remove numbers and special characters (these are with respect to the English language), normalizes text using stemming/lemmatization and supervised classification. This module also allows to perform topic modeling and unsupervised clustering.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Explainability Module

    explainability.thumb.png.65398a04f104ab22c27f2b1758efddd7.png

    It is time to deploy the model and make predictions! Machine learning models are increasingly being used for making decisions in situations that directly affect humans. Examples are financial, legal and medical settings. This module introduces a novel model agnostic technique developed by us, named XWIN(eXplanations from the impact of Withholding INformation) to assess the importance of an input-feature for a particular model-outcome that works with numeric and non-numeric features. 

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Monitoring Module

    drift.thumb.png.84dd549e7e711dc2b8b9ea7f56193922.png

    Supervised machine-learning models derive their predictive power from exploiting statistical patterns in the space of the predictor and target variables. Models operate under the implicit assumption that the patterns found during training also exist in the prediction data, or (in more technical terms) that the training and prediction data are drawn from the same distribution. This module evaluates potential drift when a trained model is applied to data where either the data or the underlying rules are likely to change with time.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Distribution Fitting

    Screenshot2024-03-06at4_52_27PM.thumb.png.c5231ed04f0255dd08e0c0388f3ec617.png

    Distribution fitting and normality testing is useful, and at times, a critical process across numerous industries. However, it can be a tedious process if you have several columns and are fitting one column at a time. This module aims to simplify the distribution fitting and normality testing processes with functions that can be applied to full datasets, rather than just individual columns. With this module, you can perform visual and statistical methods of distribution fitting and normality testing, parameter estimation, distribution prediction, and probability prediction.

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Missing Data

    image.thumb.png.293a871dd600086ba570a6f4c1a78b0d.png

    Across industries, handling missing data is a crucial step in any project. Without properly handling missing data, models can become biased, and results can be inaccurate. This module aims to expedite and simplify the process of handling missing data, and currently contains functions to explore, remove, and impute missing values, as well as run comparison analyses before and after missing data is handled.

     Learn More!  - for Python developers using this package

     Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     

    Geoanalytics

    image.thumb.png.c9f8b197cdbd8c1c181566d2b74058b5.png

    An indispensable tool for analysts dealing with location-aware data in various industries. You can conduct advanced geospatial analyses directly within Spotfire, such as integrating diverse coordinate reference systems, creating and transforming shapes, performing spatial joins and proximity analysis, and exporting geographic datasets as Shapefiles or GeoJSON

    Learn More! - for Python developers using this package

    Example Usage - download an example Spotfire application (dxp), using data functions using functions from this spotfire-dsml package

     


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...