Jump to content

How to identify/remove columns which contain only one distinct value


Adee Ran

Recommended Posts

New Spotfire user, so this may be a dumb question.

I need to analyze data in CSV files that are generated programmatically. The files contain thousands of columns and often fewer rows than columns.

Many of the columns contain only one distinct value per column, so they don't really contain information, but they consume a lot of space in the list of columns and make it hard to get around - so I'd like to hide them, remove them, convert them to table property, etc. - but with the huge number of columns, doing that manually is not practical.

I cannot affect the creation of the input files.

These "single-value" columns are typically the same ones for all input files so a one-time analysis to create a program is possible. However, I would prefer a more generic solution that goes over all columns, identifies the single-value ones, and handles them automatically.

Is there a simple built-in way to do that in Spotify (Desktop 11.3)

If not, I assume this can be done with some script, but I'm not at that level yet, so would welcome a solution.

Link to comment
Share on other sites

I'm still using 7.11 and I'm not aware of a built-in method to do this.

Yes, you could do this with a data function where your full table is the input. Then you would check every column if you only have one unique value. And then remove all columns where this is the case.

This gets very simple when you use dplyr. Your data function would look like this, where source.table is the input and new.table is the output. Both variable are of type TABLE:

library(dplyr)

new.table % select_if(~n_distinct(.)>1)If you have never used a data function before, you can check a little tutorial that I posted here:

https://community.spotfire.com/questions/self-referential-calculated-column

Link to comment
Share on other sites

You could try also an Iron Python script (check which solution works better for you):

 

check indentation when you copy and paste

substitute 'yourtablename' with the name of your table

limitation: if a column has nulls, these are counted as separate values in this script and the column will not be removed

 

#modified from

#https://community.spotfire.com/questions/how-delete-columns-effciently-spotfire-iron-python-script

from System.Collections.Generic import List

from Spotfire.Dxp.Data import *

 

#the data table object from name provided

dt = Document.Data.Tables['yourtablename']

 

rowCount = dt.RowCount

rowsToInclude = IndexSet(rowCount,True)

 

#start with empty list

columnsToDelete = List[DataColumn]()

 

for column in dt.Columns:

values = column.RowValues.ValidRows

cursor = DataValueCursor.CreateFormatted(column)

values=list()

for row in dt.GetRows(rowsToInclude,cursor):

value = cursor.CurrentValue

values.append(value)

uniqueValues = set(values)

#print uniqueValues.Count

if (uniqueValues.Count==1):

#print column.Name + ' ' + str(uniqueValues.Count)

columnsToDelete.Add(column)

 

#print 'there are ' + str(columnsToDelete.Count) + ' columns to delete'

if columnsToDelete.Count != 0:

dt.Columns.Remove(columnsToDelete)

#print ('columns removed')

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...