Machine learning is the latest buzzword and trend in the world of data analytics because of its powerful ability to make predictions and calculations based on large amounts of data. With the explosion of machine learning and data visualization, there is a potential for great opportunity, but many customers are challenged when it comes to finding a cleaner way to integrate them.
To overcome this challenge, TIBCO's Data Science field team has released a new Spotfire extension, Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®. This powerful extension enables the users to execute a workflow in the "Teamstudio DataScience" (TSDS) platform from Spotfire and bring back the workflow results into Spotfire as tables.
Customers now can easily gather data from complex models built and deployed in the TIBCO Data Science platform and quickly visualize and act on it. Speed is key, and Spotfire data visualization aids in the understanding of this data by applying visual representations to them. This allows customers to make faster, more informed decisions.
In this blog, we are going to explore a manufacturing demo about Digital twins for yield using the Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®. The use case involves analyzing manufacturing sensors and processing data at very large scales to understand the causes of semiconductor product yield loss. Digital twins are virtual representations of physical systems. The recent intense interest in them is fueled by the convergence of IoT, machine learning, and big data technology directed at the growing volumes of data available from sensors on process equipment. As the process complexity increases, these digital twins are becoming really key to efficient operations and high product yields. This demo is an example of a semiconductor manufacturing digital twin for yield that detects associations between product quality metrics and up to millions of predictor process parameters.
There are three components in this demo:
1. Team Studio Data Science workspace - an advanced analytics workflow that computes the sensor importance metric.
This workflow is comprised of three main steps:
First, the "Time Series SAX Encoder" operator reduces the size and complexity of the sensor time series data by using a method known as sax encoding. This reduces the input size down to thousands of variables as opposed to hundreds of thousands before the encoding.
Secondly, with this reduced size, we can easily compute correlations between the outcome of the manufacturing process, pass vs failure, and thousands of input variables in the form of time series data using the "Wide Data Variable Selector" operator.
Finally, we can then filter those variables that are the most highly correlated with the outcome, and feed the reduced and filtered set of variables to a "Alpine Forest Regression" operator which generates the final variable importance metrics.
So, at each stage of the workflow, we are doing all of the computations, including the three main algorithms, SAX encoding, Wide Data Variable Selection, and the Alpine forest regression in Spark. In other words, we're pushing all of these computations down to be computed in Spark, running directly on the input data in Hadoop. This means that no data has to be moved which translates into unlimited performance and scalability. Even with millions of input variables, this flow runs in just a few minutes.
In addition, I am also creating Workflow Variables "@min" and "@max" in my workspace and substituting them for the default workflow parameters in 'Filter out Poor Correlations" operator. By doing this, I can pass my input values from the Spotfire dashboard through the data function and override these Workflow Variables at run time.
2. Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire® - To execute TSDS workspace from the Spotfire dashboard.
Please refer to the link for detailed information on how to create a "Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®".
In this step, we are creating a data function to execute our Team Studio Data Science workspace from the Spotfire dashboard.
Here, I am entering my Team Studio credentials and selecting the Team Studio workflow that is to be executed via the data function.
Next, I am declaring my Input Parameters "min" and "max", which should have the same name as the Workflow Variables declared previously without the prefix "@". Upon executing the data function, the default values of the Team Studio "Workflow Variables" will be overridden with the values of the Input Parameters.
In the final step, I am declaring the outputs. The outputs are the results of the Team Studio workflow operators that will returned to Spotfire as tables once the data function has run successfully. Here, I am bringing back the "Variable Importance" table from "Alpine Forest Regression" operator and the "ProcessTime.sbdf" file exported by the "Export to SDBF" operator which is saved and stored in your workfiles under your workspace.
3. Spotfire dashboard - The goal is to understand what is driving product yield loss.
In the above diagram, we see a manufacturing process flow that is composed of a sequence of process steps. At each process step there may be sensor trace data from multiple equipment sensors. For example, pressure and temperature sensor readings will evolve throughout the course of a process step and can be visualized with sensor traces.
Once the processing is complete, the product is tested. There will be a determination of whether the product fails or passes all tests. If you aggregate that pass / fail data up over a large number of parts, then you can calculate the % of failing parts, which is called the Yield Loss percentage.
But often it's not good enough just to look at the pass / fail data, you need some information about how the parts fail - their failure modes and patterns - in order to have the best chance of later determining why they fail. So if we take this product test data and combine it with all of the process data we can create a model for each yield loss mode as a function of all the process parameters. That is the digital twin yield model. It is worth noting that there can be hundreds of steps and and millions of individual process parameters, so big data technology is going to be needed.
These digital twin yield models can be used in different ways. Since they identify the process variables with the greatest impact on each yield mode, they are a guide to improving the process. If the system is architected for high performance, it's possible to automate and run these digital twins continuously to capture new relationships in real time, just as they start to emerge. So a new problem at a process step that is impacting product yield will reveal itself as soon as material with that problem is tested.
Now we go ahead and run the model for the most recent set of lots. When we do that we're re-running the Team Studio big data digital twin yield model and populating the Barchart with the results.
In the dashboard above, we're doing the big data analysis comparing the Yield Loss for one failure mode to the sensor readings for all of the different processes. The Bar Chart shows the digital twin results. Each row shown represents one of the sensors and process time combinations. So, the bar chart shows which sensors and process times we should examine to understand what is driving cluster 2 yield loss.
So let's go ahead and mark one of the bars. The actual traces for that sensor are now visible in the chart on the right. We've colored by the anomalous versus the Normal wafers and we see a dramatic difference between lots in the yield loss trend. Some have a significant yield loss while others don't. We're also seeing that the good material has elevated readings in this portion of the recipe while the bad material is just flat there.
So this is very significant. We're able to go directly from product yield loss data to assess which are the most important sensor readings that correlate to those losses and we can actually look at differences between good vs bad sensor traces. This may provide an important clue to understanding the cause of the yield loss.
Please feel free to ask any questions on the Spotfire community with a link to this blog.
Author - Vinoth Manamala
Co Authors - Nico Rode, Mike Alperin
Special Mention - Dan Rope, Andreas Laestadius
TIBCO Data Science Team.
Recommended Comments
There are no comments to display.