Jump to content
  • TERR (now called Spotfire Enterprise Runtime for R): fast writeback to SQL Server 2016


    TERR (now called Spotfire Enterprise Runtime for R): fast writeback to SQL Server 2016

    So, the other day we got SQL Server 2016 installed and what is it good for?  We use TERR (now called Spotfire Enterprise Runtime for R) for a lot of complex batch jobs, but in particular, writing large result sets back to SQL Server is very very slow. We were able to speed things up with a factor 100 using the new OPENJSON feature in SQL Server 2016.

    Here is the trick. We first implement a read procedure that uses the new OPENJSON reader in SQL Server to parse a JSON string argument:

    Create PROCEDURE [pa].[readJSON] @json varchar(max)
    AS
    BEGIN
         SET NOCOUNT ON
    
    SELECT *
     FROM OPENJSON(@json) 
    
     WITH (Coloumn_A float '$.cola', 
    
           Coloumn_B float '$.colb')
    
    END
     

    In TERR, we can use the jsonlite package to serialize a data.frame to  json. The sqlQuery function in the RODBC package can be used next to execute the readJSON procedure with the json string as an argument. Assuming connection_string is given, the code can look something like

    dbcon <- RODBC::odbcDriverConnect( connection_string )
    # create a dummy data
    dataf <- data.frame(cola = rnorm(1000), colb=rnorm(1000))
    
    # create the SQL string to execute
    sql_stre <- gettextf( "execute [pa].[readjson] '%s' ; ", jsonlite::toJSON( dataf ) )
    
    #execute
    a <-  RODBC::sqlQuery(dbcon,sql_str  )
     

    Nothing too it. How about performance?

    system.time( {
    
    sql_stre <- gettextf( "execute [pa].[readjson] '%s' ; ", jsonlite::toJSON( dataf ) )
    
     a <-  RODBC::sqlQuery(dbcon,sql_str  )
    
     } )
     

    user system elapsed

    0.45  0.05 0.23

    Where the more classic INSERT call will give

     system.time({
    
    +   RODBC::sqlSave(dbcon, dat =  dataf, "pa.sjknu_terr_temp")
    
    +   RODBC::sqlQuery(dbcon, "drop table pa.sjknu_terr_temp;")
    
    + })
     

    user system elapsed

    0.72 0.11 24.93

    It's clear that the speedup is on the database side. However, the awesome jsonlite toJSON writer is dead simple and it does not slow things down as other serializes in R, such as XML, has a tendency to do. Further, the JSON format and a small Python script in Spotfire could do magic. And finally, some of our computations takes hours and hours on a huge machine, and we don't waste expensive in database processing (making it a good business case and a happy DBA)


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...