Jump to content

how to do an association study with a huge dataset

spotfire newbie_2018

Recommended Posts

I usually run "data relationship" to assess data associations fora small dataset. But I now need to run anassociation in a large dataset and so far have failed. Below is a dummy example.

My data are provided in a csv file in the following format (see attachment in case alignment is off)

Class Subj1 Subj2 Subj3

Foreign language 76 88 54

Math 80 65 99

Biology 77 90 43

Physics 85 76 99

IQ 120 104 165


The task is to see which class score is best associated with IQ. For "data relationship" to work, I need to pivot the table so each class is one column. I usually pivot the table in Excel and paste the new table in Spotfire to do the association. But the real example contains 60,000 rows of different "classes" for a total of 800 subjects, which Excel cannot handle.

Due to the large number of rows and columns, it take Spotfire forever to unpivot and pivot to transpose the table information (I don't have the "Spotfire package" to transpose directly). Is there a better way to do an association study between one feature and ALL OTHER features, in the example above, it would be associate IQ score with all other classes and determine which class score shows the best association with IQ.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...