Jump to content
  • Assembly Line Failure Detection


    This article demonstrates how to build a classification model in Spotfire Data Science - Team Studio, for early detection of faulty parts in an assembly line. Early detection increases productivity and reduces component and operational costs, by remove the affected part from the assembly line before it reaches later processing stages.

    Use Case Overview

    This article demonstrates how to build a classification model for early detection of faulty parts in an assembly line. Early detection increases productivity and reduces component and operational costs, by remove the affected part from the assembly line before it reaches later processing stages.

    Data Requirements

    We use simulated process and test data to illustrate the concepts in this Playbook, encompassing 12 months of operations, two workstations, and two million manufactured parts. Features include a unique part identifier, date and time, test measurements, and the outcome for the part (whether it passed the tests and whether it was scrapped). The raw data are contained in XML and TXT documents.

    Machine Logs Extraction

    screen_shot_2017-06-23_at_11_01.09_am.png.50cd1614c50d4d12e48a37e72d8be7a3.png

    First template workflow for text extraction.

    Our first step will be to extract and standardize the data from the two workstations. Workstation 10's data are stored in XML documents, while 20's data are stored in a custom log format. This flow extracts both data formats into a tabular representation for further processing using our Text Extractor operator and Variable operators with regex patterns that isolate the individual features.

    In some cases, the incoming data can be in a binary or unstructured format. In that case, developers can build custom connectors with the Extensions SDK  to convert the incoming data into a structured form.

    Data Transformation and Modeling

    screen_shot_2017-06-23_at_11_06.28_am.thumb.png.b2376ab4753bd53bf2d5a8d46b3710e9.png

    Second template for data transformation and modeling. 

    The second step is to join part data across both stations into a final combined feature set, then use a range of classification algorithms to predict whether a part will be scrapped. The main transformation we apply is a series of aggregations by part ID - instead of tracking each individual pressure reading, for example, we capture the minimum and maximum pressure each part was exposed to.

    We evaluate a range of classification models: logistic regression, naive bayes, and random forest, using the first 11 months of operational data as a training set, and the final month as a hold out validation set. Logistic regression is the best-performing algorithm, with a scrap prediction accuracy of 98% resulting in a ~3% cost savings from pulling scrap parts off the assembly line earlier in the process.

    Key Technique - Export to Spotfire® Data Science Model Format

    By exporting our logistic regression model to Spotfire® Data Science Model Format, we enable its use in other workflows, and even in other data sources. Models trained in Hadoop, like this one, can be used to score on database and vice versa. Other export options include PMML and PFA, which enable scoring in external execution engines such as jPMML and Hadrian.

    Accuracy Monitoring

    screen_shot_2017-06-23_at_11_41.44_am.thumb.png.57a56fcdb97b63ee4e2e63704a59172f.png

    Third template for monitoring accuracy of trained models on new data.

     Using the Load Model operator, this flow applies the most recently trained classification model to new data samples. If the fresh model performs better than the model running in the production scoring engine, the fresh model is pushed out to take its place.

    Key Technique - Flow Control

    The flow control operator allows you to halt execution of a workflow when appropriate. In this case, we don't want to push out a new engine for scoring if its error rate is high on recent data. The flow control operator lets us automatically make this determination, which is helpful for scheduled workflows where manual intervention is not possible.

    Self-Serve Reporting with Touchpoints

    screen_shot_2017-06-23_at_12_03.35_pm.thumb.png.f3e744ebe10db044fc8005aa8f9cd296.png

     

    Touchpoint interface for parameterizing failure rate query.

    The Touchpoint included with this Playbook demonstrates how to wrap a data visualization flow in a simple interface for business users. Users select which stations and time ranges they would like data from, and as output, they get a series of bar charts that summarize the observed failure rates.

    Check It Out!

    For access to this Playbook, including its workflows, sample data, a PowerPoint summary, and expert support from Spotfire® Data Science data scientists, contact your Spotfire® Data Science sales representative.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...