Jump to content
  • Fitting a model using Python and TIBCO® Enterprise Runtime for R


    In this article, we install the tools and packages in TIBCO Enterprise Runtime for R that are required to pass data to Python, send a script to run in Python, and then get the fitted model back from Python. We compare the results with a model fitted using the TIBCO Enterprise Runtime for R function lm.

    Note: As of Spotfire 10.7 there is native support for Python data functions which is the preferred method. Read more about this capability here.

    You might find yourself working in a situation where you have Python programmers writing Python scripts and R programmers writing R scripts, but you need to share results from data across the organization. Using TIBCO Enterprise Runtime for R, Python, and a set of available packages, you can span the chasm of programming languages for meaningful results.  Optionally, you can create a data function in Spotfire to call this code, and then use the results returned from Python to create a visualization.

    In this article, we install the tools and packages in TIBCO Enterprise Runtime for R that are required to pass data to Python, send a script to run in Python, and then get the fitted model back from Python. We compare the results with a model fitted using the TIBCO Enterprise Runtime for the R function lm.

    Requirements

    For this solution, we work in Windows, because Spotfire Analyst is a Windows desktop application. We need to make sure our system meets the requirements, and we have the software applications and packages to run the code.

    System

    We are running Windows 10, which is 64-bit system, and we have a modern system with adequate hard disk space, CPU power, and memory.

    Software

    The software for our solution includes the following.

    • Spotfire Analyst 7.7, which includes TIBCO Enterprise Runtime for R version 4.2.

      Optionally, we can run the TIBCO Enterprise Runtime for R version 4.2 Developer Edition from our installation of RStudio.

    • Anaconda Python 4.1.1 or later, which includes Python 3.5 (64 bit). The installation requires 631 MB.

      Download from https://www.continuum.io/downloads.

      We recommend using the Anaconda installation because it includes many packages for data science including numpy, scipy, pandas, and statsmodels.  

      Note:  You must put Python in your path because the code you need to run this example looks only in the directories listed in the environment variable PATH to find the Python executable and DLL files.  If you see the following message, check to see that Python is in your PATH. 
       

       INFO: Could not find files for the given pattern(s).
       

      Note: To see system requirements for installing the software, see their individual Help topics or Support information.
       

    • Spotfire system requirements: <docs.tibco.com/pub/spotfire/general/sr/sr/topics/tibco_cloud_spotfire.html>

    • Anaconda system requirements<https://docs.continuum.io/navigator/>

    • Packages

      Both TIBCO Enterprise Runtime for R and Python use packages that contain specialized functions to solve specific programming and industry problems. In this case, the packages enable the two systems to connect and to communicate, exchanging data frames.

    • TIBCO Enterprise Runtime for R uses the following packages (plus all their dependencies) from the Comprehensive R Archive Network (CRAN).

      • PythonInR
      • feather
    • Anaconda manages to find, installing, and building binary Windows packages from available Python package resource sites. Python uses the following packages (plus all their dependencies).

      • feather-format (binary package provided with this exercise: scroll to the bottom of the page).
      • statsmodels (provided in the Anaconda installation).
    • Example

      Our solution demonstrates calling Python from TIBCO Enterprise Runtime for R to fit a linear model in Python using the ols function from the statsmodels package. Fitted values from Python are passed back to TIBCO Enterprise Runtime for R and compared with the fitted values from the lm function in TIBCO Enterprise Runtime for R.

      Note  
      The complete script demonstrated in this example is attached to this article, for your convenience. To download and review the script, scroll to the bottom of the article and select 
      TERRandPython.R.txt

      The data and data translation packages

      For analysis in TIBCO Enterprise Runtime for R, statisticians use the  data.frame object to contain the data. For analysis in Python, programmers use pandas, a powerful Python data analysis toolkit, which contains the data structure DataFrame.  

      These two object types are not compatible. We can use the CRAN package feather and the Python package feather-format to provide the means to translate the data between the two programming languages while maintaining the structure and integrity of the data.

      We use the CRAN package feather to send the data.frame object from TIBCO Enterprise Runtime for R to Python. We use the feather-format package on the Python side. Python reads in the data as a DataFrame, adding a column needed by that data structure. After running the script to process the data (fitting the model, in our example), we perform the reverse process, using feather-format in Python to send the data back to TIBCO Enterprise Runtime for R, which reads in the data, with the help of the feather package, as a new data.frame (with an additional column).

      Procedure

    1. Download the attached .zip archive, feather_format.zip, included with this article. This zip archive contains the feather-format package. Copy the zip archive to the site library for your Anaconda Python installation, and then extract the .whl file it contains. For this example, we provide the .whl archive feather_format-0.3.0-cp35-cp35m-win_amd64.whl.

    2. From a Windows command prompt, install the feather-format package.

       pip install feather-format
       
    3. From the Spotfire menu, click Tools > TERR Tools, and then open the TIBCO Enterprise Runtime for R console.

      Note  Optionally, you can use RStudio, specifying TIBCO Enterprise Runtime for R as the engine.

    4. At the command prompt, install the required packages.
       install.packages(?feather?)
       install.packages(?PythonInR?
       
    5. Load the packages.
       library("feather")
       library("PythonInR")
       
    6. Connect to Python calling the pyConnect function from the PythonInR package. You should not need to specify a path.
       PythonInR::pyConnect()  # only needed on Windows
       
    7. Assign the data set (in this case, fuel.frame) to the name ff.
       ff <- Sdatasets::fuel.frame
       
    8. Create a temporary path for the data set, and then write a data.frame to a feather file, passing in the data set and the temporary path.
       tempff <- tempfile("ff")
       write_feather(ff, path=tempff)
       
    9. Set the name of the feather file in Python by calling the function PythonInR::pyExecp. Note the use of r before the file name tempff. This specifies creating a raw string.
       PythonInR::pyExecp(paste0("fthrfile = r'", tempff, "'"))
       
    10. Using PythonInR::pyExec, create and pass to Python the following script.
      converted-file.thumb.png.dce5cdc0e7eba080baa87c2f91d802c9.png
      PythonInR::pyExec('
      import feather
      from statsmodels.formula.api import ols
      df = feather.read_dataframe(fthrfile)
      linmod = ols(formula="Fuel ~ Weight + Type", data=df).fit()
      pred = linmod.predict(df)
      df["Fitted"] = pred
      feather.write_dataframe(df, fthrfile)
      ')
       

      The script performs the following tasks.

      1. Reads the data  (fthrfile).
      2. Fits the model.
      3. Adds fitted values as a new column in the data (Fitted).
      4. Writes out in a new feather file (fthrfile).
         
    11. Read the feather file written by Python, passing in the path created earlier to contain it. The file contains the new column with the fitted values.
       ff2 <- read_feather(tempff)
       
    12. Fit the same model with the TIBCO Enterprise Runtime for R function lm and extract the fitted values with the predict function.
       m1 <- lm(Fuel ~ Weight + Type, data = ff)
       p1 <- predict(m1, ff)
       
    13. Compare to the fitted values that Python computed.
       all.equal(unname(p1), ff2$Fitted)
       

      The returned value should be TRUE, which indicates that the fitted values returned by TIBCO Enterprise Runtime for R and those returned by Python are identical. We can be assured that the code ran correctly and gave us identical results.

     

    feather_format.zip

    terrandpython.r.txt

     

     


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...