Part 1: Using Python for Building a Light GBM Model in TIBCO Data Science
The main steps for using Python in TIBCO Data Science are:
-
Create a Jupyter Notebook in the workspace
-
Open the Jupyter Notebook and fill in the Python code
-
Create a workflow in the workspace and add a Python Execute operator
Let?s go through the details of each step.
Step 1: Create a Jupyter Notebook in the workspace
After creating a workspace, we can easily add a Jupyter Notebook in the work files section.
Step 2: Open the Jupyter Notebook and fill in the Python code
From the notebook content section, we can start importing Python packages or installing any packages needed (i.e. typing in the code "! pip install lightgbm" and running it to install the LightGBM package).
Then we can import the dataset for analysis. Here we need to use the iris data table from a PostgreSQL database, so we should define the data source name, table name, schema name, and database name for targeting and importing the data.
Note that we have set "use_input_substitution" to be the true value and the "execution_label" to be 1. This will serve the purpose of reading in data from the workflow later.
There is a useful trick for importing the dataset without typing any code. First of all, we need to find the required dataset in "Data Sources" (find it by clicking the top left menu icon), and associate it with our workspace.
After making this association, we can go to the notebook and find the function "Import Dataset into Notebook" under the data menu.
By clicking the "Import" button, the data importing code will be automatically generated in the notebook with the default settings, which means you may want to modify the values of parameters such as "use_input_substitution" and "execution_label" manually.
Then we can split the dataset for training and testing and build a Light GBM classifier. For predicting the iris species, there exists strong explanatory power in the flower's sepal length, sepal width, petal length, and petal width; therefore we can obtain a pretty good classifier with all default settings. The evaluation shows that the testing data can be predicted accurately.
At the end of the notebook, we can define the output and save it as a data table in our database. Here we show the example of saving testing data with prediction results generated from our light GBM model.
So far we have learned how to create a Jupyter Notebook and build a model in Python. In the next step, we will integrate the notebook with our workflow.
Step 3: Create a workflow in the workspace and add a Python Execute operator
Here we have created a simple workflow including importing the iris dataset, standardizing the column types (this step is only about changing the data column types — you may omit it in your own project), and passing it to the Python Execute operator.
Editing the Python Execute operator, we will need to select the desired notebook as well as the substitute input. As we have set the "execution_label" to be 1 previously, here we need to configure the input of the "Substitute Input 1" section.
Now we are able to run the workflow and see the output of running the Python Jupyter Notebook. We can also observe that the results have been saved in a new data table as we defined.
Part 2: Using R for Building a Decision Tree Model in TIBCO Data Science
We can also run R code in a workflow by using the R Execute operator.
Here we have created a simple workflow of importing the iris dataset and then connecting it with the R Execute operator.
Then we can start writing R code by clicking the "Define Clause" button on the editing page.
As we have learned that the iris data can be used for building a classifier, here we build a decision tree model for better understanding variable importance using the full dataset as our training data.
Running the entire workflow, we are able to view the detailed modeling results including variable importance from the R Execute operator.
Conclusion
Through these simple examples of using Python and R to build machine learning models on the iris dataset in TIBCO Data Science, we hope you can understand how convenient it is to integrate code in your visual workflows. In this way, you can extend the analytics capabilities of your workflows, and use the combination of highly scalable data prep operators with advanced open-source functions in Python and R.
Recommended Comments
There are no comments to display.