Jump to content
  • Effective R package management using Packrat with Spotfire® Statistics Services


    Introduction

    If you've ever written a substantial Data Function you'll have come across the idea of R packages. Packages provide additional functionality to your R scripts, e.g. someone may have written a specific statistical model you need, or maybe you're using a package to call RESTful Web Services. There is a huge range of R packages available for use, often available through CRAN - there are over 12,000 such packages at the time of writing.

    If you've used a package you will also have come across the question of how you can get your packages to work with Spotfire. There are Spotfire documented practices for installing packages onto Spotfire® Statistics Services, but there is one major flaw in those approaches - everyone using Spotfire would share the same version of each package. Packages are regularly updated and any substantial package tends to rely on other packages. This dependency hierarchy introduces a very ugly problem - some packages may rely on versions of packages that are not compatible with other packages that are in use, or may not be compatible with data functions written by other users. Because of these problems, it is very common to assign an individual as the R Package Manager and then ask that individual to handle the work of tracking dependencies, working with teams to update their code, deal with upgrades etc. - a very large job in any substantial organization.

    There is a simpler automated solution to package dependency problems - Packrat. Packrat can be used within a "project" to maintain packages and ensure they are isolated from other projects, while also easily restoring the specific versions that were used by a developer.

    A Packrat Wrapper for Spotfire® Statistics Services

    To set up Spotfire® Statistics Services to work with Packrat you'll need to follow these steps (or some variant if you feel like working it out - the basic approach should work if you need to tweak to your environment):

    • Create a shared folder on your network that can contain the wrapper (the wrapper folder). Ensure that the Spotfire® Statistics services have read access to this folder.
    • Download the 'RunDataFunction.R' script from this page and save in your wrapper folder.
    • Find a machine that's running a copy of R that is binary compatible with your Spotfire® Statistics Services (e.g. if your stats services are running 3.4.1 on Windows, then you need 3.4.1 on a Windows machine). Install the Packrat package on this machine and then copy the Packrat folder from the R library to the wrapper folder.
    • Install the R dev tools on your Spotfire® Statistics Services servers - this step will only be necessary if Packrat has to install versions of packages from their source code, which is only likely if you're running a version of R on Spotfire® Statistics Services that is significantly different from the one used to develop the Packrat based projects.

    You should now have a wrapper folder that contains a copy of the RunDataFunction script and the Packrat package itself.

    To test, you can follow these steps:

    • Download the 'Packrat Demo.zip' file on this page and unzip into a subfolder of your wrapper folder.
    • Within Spotfire, create a very simple Data Function is used to call out to the Packrat based project. The data function has a single output parameter named 'output' that returns a table containing the list of packages loaded by the R engine. The code is shown below - simply change the path in the source call to point to your wrapper script.
    • If everything works, you should see the list of packages loaded into a data table. If not, take a look at the debug output from the script (enable Data Function debugging if you haven't already) to see what's happening - you should see messages from the wrapper and from Packrat itself as it sets up your project.
    projectDir='Packrat Demo'
    source('<wrapper folder>\\RunDataFunction.R')
     

    Developing Data Functions using Packrat

    The fundamental difference between developing a Data Function within Spotfire vs. developing when using Packrat is that development with Packrat takes place outside Spotfire using tools such as RStudio. Packrat relies on having the source code for your R scripts stored on the file systems, whereas Spotfire Data Functions are typically stored within Spotfire itself and may never be saved to the file system. It is possible to call Packrat from within a Spotfire Data Function (just like you would any other R package), but it would be very difficult to effectively manage a set of package dependencies in that way.

    The basic process is:

    • Create a Packrat based project on your local machine.
    • Develop and test a script locally - perhaps using RStudio. The script should look much the same as a Spotfire Data Function - i.e. you should expect that parameters are passed into the script using variables that are pre-assigned and that results are passed out using variables.
      • You can name your script whatever you like, but 'DataFunction.R' is the default used by our wrapper script.
      • You may find it easier to test by creating a second script that sets up a set of test parameters, then uses the source function to call your data function itself. You could go one step further and incorporate automatic test checks in this second script, or create multiple such scripts with additional test cases.
    • Once ready to test in Spotfire, copy the entire Packrat project to a network shared folder.
      • The 'packrat' folder within your project folder must be writable by the Spotfire® Statistics Services. You'll need to work with the administrator for your Spotfire® Statistics Services to work out the best method to achieve this (at IQVIA we have an Active Directory group which contains the machine accounts for all our Spotfire servers and we ask teams to grant access for that group).
    • Create a Data Function within Spotfire that calls your new Packrat based project. The function should be similar to that one used in this example
      • You will need to define the parameters (input and output) that your script expects. Spotfire will warn you that the script doesn't reference the parameters, but that's OK - Spotfire can only check the source code of the Data Function itself, so doesn't see references in the scripts that are called via source
      projectDir='<the path to your project folder>'
      scriptFile='<the name of your script - defaults to "DataFunction.R" if not specified>'
      source('<the path to your instance of the Packrat wrapper described above>')
       
    • Test and ensure all working as expected. It can be useful to use the option within Spotfire that enables Data Function debugging, then use the cat function to output messages in your script so you can review them in Spotfire. The Packrat wrapper will output messages in this way, so you can see if it's loading new versions of packages or having any sort of problem
    • Handle updates to your script, or the referenced packages, in the same way - develop locally and then publish to the network share. Note it's also perfectly valid to use source version control with your project (e.g. Git) and this may be helpful given Packrat's native support for such tools.

    Technical details

    The wrapper script carries out the following steps:

    • Load Packrat from the wrapper folder.
    • Find the absolute path of the project folder (the function supports both absolute paths and paths relative to the wrapper folder).
    • Create a temporary folder that is a symbolic link to the project folder. This is done because R doesn't support installing packages to a library located on a network folder using UNC paths - so we 'cheat' by fooling R into thinking that the packages are being installed on a drive with a drive letter. This could be handled by mapping a Windows drive to the folder, but that creates problems if there are different projects being accessed concurrently on the Spotfire® Statistics Services.
    • Set the current folder to be the project folder, using the symbolic link.
    • Initialize Packrat (packrat::init()) - this will install any new versions of packages that are required (e.g. if the Packrat folder contains Mac binaries but Spotfire® Statistics Services is running on Windows). Note that we force Packrat to do this work without any prompting.
    • Turn Packrat on (packrat::on()). Note that we force Packrat to leave any pre-loaded packages loaded, this is because Spotfire® Statistics Services relies on loading various Spotfire specific packages when marshalling data to and from Data Functions.
    • Call the Data Function script using the source function.
    • Remove the temporary symbolic link.

    At some point in the future it will hopefully be possible to 'bootstrap' Packrat without the need for loading it first. Packrat has a mechanism for this (the init.R) script, but it doesn't currently support leaving pre-loaded packages loaded. The advantage of bootstrapping will be that it will no longer be necessary to keep a binary compatible version of the Packrat folder stored in the wrapper folder.

    A quick note about TERR

    The wrapper currently only works with Open Source R. The wrapper script itself uses a function that isn't implemented in TERR (getSrcDirectory), but this isn't worth fixing as Packrat itself throws errors when running within TERR.

    Attachments

    Download attachments from resources.

    packrat_demo.zip

    rundatafunction.zip


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...