Jump to content

How to write a DataTable from a list of DataSets


Alexandre Verri

Recommended Posts

Thanks, but I think the problem is clear enough, and you have given a solution that I haven't asked. I have asked about creating a DataTable from existing list of DataSets, not creating a DataTable from existing DataTable! 

 

This is the context using code examples:

 

from System.Data import DataSet
from System.Data.Odbc import OdbcConnection, OdbcDataAdapter

# Get a dataset for the given SQL
def get_dataset(sql, conn_str):
   connection = OdbcConnection(conn_str)
   connection.Open()
   dataSet = DataSet()
   adaptor = OdbcDataAdapter(sql, connection)
   adaptor.Fill(dataSet)
   connection.Close()
   return dataSet

# The connection string to the target database.
conn_str = 'DSN=mydatabase;Pooling=true;Max Pool Size=20;Enlist=true'

# A list of different queries.
sqls = ['sql1', 'sql2', 'sql3']

# This is the list of data sets, one for each query.
data_sets = [get_dataset(sql, conn_str) for sql in sqls]

# Now, having the data sets, how to create a table with this list in the most efficient way

 

 

Link to comment
Share on other sites

Not sure if you would be able to use the above DataSets as a datasource as they are.

Something below may just help:

from Spotfire.Dxp.Data.Import import DatabaseDataSource

from Spotfire.Dxp.Data.Import import DatabaseDataSourceSettings

from System import *

 

sqlCommand = "SELECT * from MYTABLE"

 

dbsettings = DatabaseDataSourceSettings( "System.Data.OracleClient",

"Server=localhost;UID=test;PWD=test",sqlCommand)

 

ds = DatabaseDataSource(dbsettings)

 

Document.Data.Tables.Add("Test",ds)Again, I am sorry if this does not really help much.

Link to comment
Share on other sites

Hi Rayees, I appreciate your time trying to answer the question. I had found the example you provided, but it does not solve our issue. We need to convert from DataSet to DataTable, the main reason is performance. There is a big performance impact in our use case. We have to run multiple SQL queries and build a DataTable with the results. This is taking more than a minute, which is too much for the user to wait. One way we have found to improve the performance is to run the queries in parallel, using multithreading. The IronPython multithreading works fine in Spotfire if we don't use any Spotfire API in a different thread. The code you provided would work if we could run it in parallel, like this:

from Spotfire.Dxp.Data.Import import DatabaseDataSource

from Spotfire.Dxp.Data.Import import DatabaseDataSourceSettings

from System import *

from multiprocessing.pool import ThreadPool # Create a thread pool with 3 threads.

pool = ThreadPool(3)

def build_table(sql):

  dbsettings = DatabaseDataSourceSettings( "System.Data.OracleClient", "Server=localhost;UID=test;PWD=test", sql)

  ds = DatabaseDataSource(dbsettings)

  Document.Data.Tables.Add("Test",ds)

def build_tables_in_parallel(sqls):

  pool.map(build_table, sqls)

sqls = ['select * from t1', 'select * from t2', 'select * from t3'] build_tables_in_parallel(sqls)

 
 
... but it does not work, Spotfire can't run the function build_table in a different thread. That's why we need to run the SQL and get DataSet instead, because there is no Spotfire code involved in this, so we can run in different threads.Maybe you can give a hint about how to run multiple queries in parallel Using DataFunctions maybeThank you.
Link to comment
Share on other sites

  • 3 weeks later...

Here is the logic we are currently using to write the datatable but this is extremely slow. 

 

This function gets data from generator function that iterates over the dataset. 

 

def createDataTable(textData):
# Turn it into an in-memory text data source.
   dataTableName = 'Data Table'
   columnNames=orderedColumns
   stream = MemoryStream()
   writer = StreamWriter(stream)
   writer.WriteLine(';'.join(columnNames) + 'rn')
   writer.Flush()
   for line in textData:
       writer.WriteLine(line)
   writer.Flush()
   settings = TextDataReaderSettings()
   settings.Separator = ";"
   settings.AddColumnNameRow(0)
   settings.ClearDataTypes(False)
   for i in range(len(columnNames)):
       if columnNames[i] == 'MEASUREVALUE':
           settings.SetDataType(i, DataType.Real)
       elif columnNames[i] == 'DATETIME_ID':
           settings.SetDataType(i, DataType.DateTime)
       elif columnNames[i] == 'DATE_ID':
           settings.SetDataType(i, DataType.Date)
   stream.Seek(0, SeekOrigin.Begin)
   fs = TextFileDataSource(stream, settings)
   if Document.Data.Tables.Contains(dataTableName):
       dataTable = Document.Data.Tables[dataTableName]
       settings = AddRowsSettings(dataTable, fs)
       dataTable.AddRows(fs, settings)
   else:
       # Create Table if not already present
       print "Adding Data Table"
       dataTable = Document.Data.Tables.Add(dataTableName, fs)

def generate_text_data(dataSet,columnNames):
   currRow = ""
   global orderedColumns
   for row in dataSet.Tables[0].Rows:
       currRow = []
       for col in orderedColumns:
           if col in columnNames:
               currRow.append(str(row[col]))
           else:
               currRow.append('')
       yield "%srn" % (';'.join(currRow))

 

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...