So, the other day we got SQL Server 2016 installed and what is it good for? We use TERR (now called Spotfire Enterprise Runtime for R) for a lot of complex batch jobs, but in particular, writing large result sets back to SQL Server is very very slow. We were able to speed things up with a factor 100 using the new OPENJSON feature in SQL Server 2016.
Here is the trick. We first implement a read procedure that uses the new OPENJSON reader in SQL Server to parse a JSON string argument:
Create PROCEDURE [pa].[readJSON] @json varchar(max) AS BEGIN SET NOCOUNT ON SELECT * FROM OPENJSON(@json) WITH (Coloumn_A float '$.cola', Coloumn_B float '$.colb') END
In TERR, we can use the jsonlite package to serialize a data.frame to json. The sqlQuery function in the RODBC package can be used next to execute the readJSON procedure with the json string as an argument. Assuming connection_string is given, the code can look something like
dbcon <- RODBC::odbcDriverConnect( connection_string ) # create a dummy data dataf <- data.frame(cola = rnorm(1000), colb=rnorm(1000)) # create the SQL string to execute sql_stre <- gettextf( "execute [pa].[readjson] '%s' ; ", jsonlite::toJSON( dataf ) ) #execute a <- RODBC::sqlQuery(dbcon,sql_str )
Nothing too it. How about performance?
system.time( { sql_stre <- gettextf( "execute [pa].[readjson] '%s' ; ", jsonlite::toJSON( dataf ) ) a <- RODBC::sqlQuery(dbcon,sql_str ) } )
user system elapsed
0.45 0.05 0.23
Where the more classic INSERT call will give
system.time({ + RODBC::sqlSave(dbcon, dat = dataf, "pa.sjknu_terr_temp") + RODBC::sqlQuery(dbcon, "drop table pa.sjknu_terr_temp;") + })
user system elapsed
0.72 0.11 24.93
It's clear that the speedup is on the database side. However, the awesome jsonlite toJSON writer is dead simple and it does not slow things down as other serializes in R, such as XML, has a tendency to do. Further, the JSON format and a small Python script in Spotfire could do magic. And finally, some of our computations takes hours and hours on a huge machine, and we don't waste expensive in database processing (making it a good business case and a happy DBA)
Recommended Comments
There are no comments to display.