Jump to content

Spotfire interprets field types from CSV incorrectly & deletes information in fields


Maria Nepomnyashchikh

Recommended Posts

Hello. I need to update one table on a regular basis, based on which I build a dashboard. The source file happens to be in CSV.

Problem: when file is read some field types are not recognized correctly (e.g., I have a field that contains real numbers, but also happens to containa lot of integers, Spotfire somehow recognizes only integers and sets integer as a field type). It would have not been a big issue if not for the following: when data gets imported all the information from the field is deleted if it doesn't match the field type Spotfire selected. So, I cannot even change field type post-factum and get correct information, since it's wiped on import.

So, when I reload table I have to manually go through field types to make sure they are correct. And I have like 30-40 fields, so, it is annoying as you can imagine.

Is there a way to optimize this process

Link to comment
Share on other sites

While importing data from csv, you will see option in bottomsection to insert transformation. You can select Change Data Type and select all 30-40 relevantcolumns whose data type needs to be changed to real. This way imported columns data type will be real and part of data table's source information. So everytime you reload data, data type will be changed to real for those columns.
Link to comment
Share on other sites

UPDATE: What I discovered is that Transformation you suggested deletes all the data that doesn't fit automaticly identified field type and then transforms the field type. Same way as if I imported the information with automatically incorrectly identified fields and only then changed types from dashboard. 

 

So the only way to correctly change field types without loosing data is manual without using transformation.

 

What happened to me applying transformations (I used 'Chage data types') was a complete mess...

 

Real fields were identified as integers, so all real numbers were deleted even though I 'transformed' the field into real

String fields were identified as integers and also lost all text data despite being 'transformed' into strings

The only wrong transformations that didn't result into data loss were numbers and dates that were identified as strings

 

How is it possible to be so bad D:

Link to comment
Share on other sites

May be later when I have time, but I will just comment the file. It has ~500k lines, where first 200k lines are from one DB, the rest from another, same columns, but formatting is different:

 

First 200k lines in dataset have revenues rounded up to an integer, but the rest are real numbers. That is why Spotfire identifies integers since I think it looks only at first N lines versus whole dataset, which causes deletion of over a half of remaining data.

Also employee IDs in the beginning are integers, then they contain letters -> so this is id'd as integers and all text containing ones are deleted.

Also, I noticed that if a few first lines are empty, it will identify any non-string field as a string (dates and numbers are wrongly id'd in my case, but at least it doesn't delete anything for strings and you can convert back correctly).

 

So, the probem lies with Spotire looking only at the first rows to identify data types and deleting everything that doesn't fit as a consequence.

 

From my experience with other data-related softwares: all csv fields are read as sting by default and you can change later, or random sample from the column (or whole column) is taken to identify the field type. This helps avoid data loss and conversion errors.

Link to comment
Share on other sites

  • 10 months later...

I have got the same problem. I have tried an ironpython script, which looked very promising, but it did not work either. It reads all data transformations in Spotfire, but does not read the file import parameter, which are determined before, when replacing the file ... (https://community.spotfire.com/wiki/how-replace-file-datasource-data-table-tibco-spotfirer-using-ironpython-scripting). I would really appreciate, if someone finds a solution here. The only workarounds that i know are the following one:

 

prepare the file so that Spotfire does not generate wrong data types. In your case, add always a string 2nd row, so that eyerything is imported as string (with a Makro). You can delete these rows later in Spotfire. 

You can also replace this file outside Spotfire with an file that contains the Identical name

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...