Spotfire interprets field types from CSV incorrectly & deletes information in fields

Maria Nepomnyashchikh · March 14, 2018

Hello. I need to update one table on a regular basis, based on which I build a dashboard. The source file happens to be in CSV.

Problem: when file is read some field types are not recognized correctly (e.g., I have a field that contains real numbers, but also happens to containa lot of integers, Spotfire somehow recognizes only integers and sets integer as a field type). It would have not been a big issue if not for the following: when data gets imported all the information from the field is deleted if it doesn't match the field type Spotfire selected. So, I cannot even change field type post-factum and get correct information, since it's wiped on import.

So, when I reload table I have to manually go through field types to make sure they are correct. And I have like 30-40 fields, so, it is annoying as you can imagine.

Is there a way to optimize this process

Khushboo Rabadia · March 20, 2018

While importing data from csv, you will see option in bottomsection to insert transformation. You can select Change Data Type and select all 30-40 relevantcolumns whose data type needs to be changed to real. This way imported columns data type will be real and part of data table's source information. So everytime you reload data, data type will be changed to real for those columns.

Maria Nepomnyashchikh · March 26, 2018

That broke my dashboard. :D All columns that were renamed weren't recognized anymore if i uased this method, also, all calculated fields disappeared.

Maria Nepomnyashchikh · March 26, 2018

UPDATE: What I discovered is that Transformation you suggested deletes all the data that doesn't fit automaticly identified field type and then transforms the field type. Same way as if I imported the information with automatically incorrectly identified fields and only then changed types from dashboard.

So the only way to correctly change field types without loosing data is manual without using transformation.

What happened to me applying transformations (I used 'Chage data types') was a complete mess...

Real fields were identified as integers, so all real numbers were deleted even though I 'transformed' the field into real

String fields were identified as integers and also lost all text data despite being 'transformed' into strings

The only wrong transformations that didn't result into data loss were numbers and dates that were identified as strings

How is it possible to be so bad D:

Khushboo Rabadia · March 26, 2018

Can you pleaseprovide the sample csv showing the behavior.

Maria Nepomnyashchikh · March 27, 2018

May be later when I have time, but I will just comment the file. It has ~500k lines, where first 200k lines are from one DB, the rest from another, same columns, but formatting is different:

First 200k lines in dataset have revenues rounded up to an integer, but the rest are real numbers. That is why Spotfire identifies integers since I think it looks only at first N lines versus whole dataset, which causes deletion of over a half of remaining data.

Also employee IDs in the beginning are integers, then they contain letters -> so this is id'd as integers and all text containing ones are deleted.

Also, I noticed that if a few first lines are empty, it will identify any non-string field as a string (dates and numbers are wrongly id'd in my case, but at least it doesn't delete anything for strings and you can convert back correctly).

So, the probem lies with Spotire looking only at the first rows to identify data types and deleting everything that doesn't fit as a consequence.

From my experience with other data-related softwares: all csv fields are read as sting by default and you can change later, or random sample from the column (or whole column) is taken to identify the field type. This helps avoid data loss and conversion errors.

Claude Fleischmann · February 23, 2019

I have got the same problem. I have tried an ironpython script, which looked very promising, but it did not work either. It reads all data transformations in Spotfire, but does not read the file import parameter, which are determined before, when replacing the file ... (https://community.spotfire.com/wiki/how-replace-file-datasource-data-table-tibco-spotfirer-using-ironpython-scripting). I would really appreciate, if someone finds a solution here. The only workarounds that i know are the following one:

prepare the file so that Spotfire does not generate wrong data types. In your case, add always a string 2nd row, so that eyerything is imported as string (with a Makro). You can delete these rows later in Spotfire.

You can also replace this file outside Spotfire with an file that contains the Identical name

Sign In

Spotfire interprets field types from CSV incorrectly & deletes information in fields

Recommended Posts

Maria Nepomnyashchikh

Link to comment

Share on other sites

Khushboo Rabadia

Link to comment

Share on other sites

Maria Nepomnyashchikh

Link to comment

Share on other sites

Maria Nepomnyashchikh

Link to comment

Share on other sites

Khushboo Rabadia

Link to comment

Share on other sites

Maria Nepomnyashchikh

Link to comment

Share on other sites

Claude Fleischmann

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Industries