Dense Ranking with multiple conditions

Antonio Volpini · August 11, 2021

I am working with a dataset in Spotfire that has several replicates of multiple test articles being run through several sets of conditions, with a single final readout column. See example table below

Sample

Temp

Conc

Time

Lum

1A

30

1

15 Hrs

832982

1B

30

1

30 Hrs

374

1C

30

2

15 Hrs

2

1D

30

2

30 Hrs

12

1E

20

1

15 Hrs

415

1F

20

1

30 Hrs

15

1G

20

2

15 Hrs

6

1H

20

2

30 Hrs

34449

In this example I am trying to rank my results to make the end graphs more digestible. I started with limiting the data set to only the conditions that I want through the "Limit Data through a custom expression" panel. Example here:

([Temp]= "30") And ([Time] = "15 Hrs") And ([Conc] = 1)

Then set up a denserank function in a custom expression on the x axis to sort the values into a smooth curve.

The Xaxis expression I am trying is:DenseRank(Avg([Lum/OD]) OVER ([MOLECULE_ID]), "asc")

The Y axis expression is:Avg([LUM]) OVER ([MOLECULE_ID])

This produces something similarto the attached graph. The points dont rank correctly.My questions are:

- Could anyone explain why the displayed data doesn't rank in an ascending curve What is wrong withthe custom expression

- Is there a better way to get points to rank To get a smooth curve

- The value of the x and y points dont match (see 2nd attachment) Why does this happen The equations are the same, so the value should also be the same

Any help would be much appreciated!

Fabian Duerr · August 12, 2021

You rank your data by molecule_id but you plot your data over sec system. How are those two columns related Can you make a screenshot when you replace sec system with molecule_id on the x-axis

Aside: If you use a bar chart you have the option to sort your x-axis by y values. This could solve your problem without using a rank function.

Antonio Volpini · August 13, 2021

Molecule ID and Sec System are the same. Sec System is included only for a name reference, as the molecule IDs are jsut long numerical strings. Removing it does not change the ordering.

A bar char is not the preferred method for this data as sometimes we need to view all the replicates, which makes the bar chart unwieldy.

Additionally, when switching to a bar chart, spotfire says that my expression is not valid. to solve this I removed the OVER statement from the X and Y Axis, the graph loads, but does not order correctly. Removing the over clause also presents a problem as I need to calculate the averages for subgroups within the data. I cannot always control the columns as exactly as in this example.

Fabian Duerr · August 13, 2021

Can you provide a sample of your data set (with embedded data)

Antonio Volpini · August 16, 2021

Unfortunately, I can only give a fake data table. The information will not match the graph, but is the similar to the real data. in terms of the type of data in each column and what is in the dataset overall.

Additionally, I found that the data limiting equation might be affecting this. I removed the whole equation and added parts back one at a time, and after adding more than on "and " clause the sorting appears to break. i.e. I could keep a filter on the Pellet column, but not add on the time, conc, or temp columns. Any addition of those immediately breaks the sorting. Unfortunately, I am new to spotfire. Is there another way to separate out subsets of data Or a way to know what columns you are able to filter on while preserving the function of the sorting method

Example table below

MOLECULE_ID	LUM	Conc	Temp	Time	Pellet vs Sup	sec system
488151	710	0.6	25	6 Hrs	Sup	YebF
492366	61382	0.6	25	6 Hrs	Sup	pelB
488856	4540	0.6	25	6 Hrs	Sup	espP
488074	97	0.6	25	6 Hrs	Sup	tdK
491590	49899	0.6	25	6 Hrs	Sup	pelB
488307	628	0.6	25	6 Hrs	Sup	Pet
482001	58442	0.6	25	6 Hrs	Sup	YebF
488528	3813	0.6	25	6 Hrs	Sup	pelB
488706	9580	0.6	25	6 Hrs	Sup	tdK
488455	5203	0.6	25	6 Hrs	Sup	OsmY
489467	18127	0.6	25	6 Hrs	Sup	espP
488735	7038	0.6	25	6 Hrs	Sup	Pet
488870	103	0.6	25	6 Hrs	Sup	pelB
489523	51167	0.6	25	6 Hrs	Sup	celCD
489327	17971	0.6	25	6 Hrs	Sup	fhaB
488707	8475	0.6	25	6 Hrs	Sup	ompA
488550	5407	0.6	25	6 Hrs	Sup	ChiA
489065	35353	0.6	25	6 Hrs	Sup	pelB
488906	14321	0.6	25	6 Hrs	Sup	OsmY
488823	1672	0.6	25	6 Hrs	Sup	ompA
488887	11420	0.6	25	6 Hrs	Sup	celCD
488159	454	0.6	25	6 Hrs	Sup	fhaB
480246	30917	0.6	25	6 Hrs	Sup	ctxB
488308	555	0.6	25	6 Hrs	Sup	pelB
488100	13780	0.6	25	6 Hrs	Sup	zot
489084	34412	0.6	25	6 Hrs	Sup	ChiA
488299	456	0.6	25	6 Hrs	Sup	pelB
488288	1541	0.6	25	6 Hrs	Sup	ctxB
489048	28169	0.6	25	6 Hrs	Sup	pelB
488820	590	0.6	25	6 Hrs	Sup	zot

Fabian Duerr · August 16, 2021

Everything works perfectly fine with your sample data. Even when I include a data limiting expression (matching the current data).

But molecule_id and sec system are not the same! Some molecule_ids share the same sec system.

Also, you can remove your OVER expression on the y-axis. Just use Avg([LUM])

You already have your data separated on the X-axis. No further breakdown needed.

Moreover you can change the settings for your x-axis expression to just evaluate on the current filtering. Do this in the properties.

Antonio Volpini · August 17, 2021

First off, thank you for trying that out, and for the help in general. Unfortunatley many of those fixes dont work in my case.

- I accidentally misrepresented my dataset in my last comment. The example I just gave did not include multiple subsets as I indicated in my initial post. I have edited it to be more accurate. Now the data filtering should acutally have an impact. There are two larger groups with multiple replicate values for subgroups within that. I have added the amended data, and an attempt I made at plotting below. The plot shows the same issues as previous. For instance I might want to filter down to the values for: Pellet vs Sup = Sup; conc = 0.6; Temp = 25. Data filtering used for that plot was: ([Pellet vs Sup]='Sup') and ([Conc]=0.6) And ([Temp]=25)

MOLECULE_ID	Conc	Temp	Pellet vs Sup	sec system	LUM
488151	0.6	25	Sup	YebF	710
488151	0.6	25	Sup	YebF	1632
492366	0.6	25	Sup	pelB	61382
492366	0.6	25	Sup	pelB	61481
489251	0.6	25	Sup	OsmY	62304
489251	0.6	25	Sup	OsmY	4540
487432	0.6	25	Sup	GFP	5462
487432	0.6	25	Sup	GFP	5983
488151	0.6	40	Sup	YebF	97
488151	0.6	40	Sup	YebF	1019
492366	0.6	40	Sup	pelB	49899
492366	0.6	40	Sup	pelB	50102
489251	0.6	40	Sup	OsmY	50821
489251	0.6	40	Sup	OsmY	628
487432	0.6	40	Sup	GFP	1550
487432	0.6	40	Sup	GFP	1650
488151	0.6	50	Sup	YebF	58442
488151	0.6	50	Sup	YebF	59364
492366	0.6	50	Sup	pelB	3813
492366	0.6	50	Sup	pelB	3813
489251	0.6	50	Sup	OsmY	4735
489251	0.6	50	Sup	OsmY	9580
487432	0.6	50	Sup	GFP	10502
487432	0.6	50	Sup	GFP	11267
488151	0.8	25	Sup	YebF	710
488151	0.8	25	Sup	YebF	1632
492366	0.8	25	Sup	pelB	61382
492366	0.8	25	Sup	pelB	61481
489251	0.8	25	Sup	OsmY	62304
489251	0.8	25	Sup	OsmY	4540
487432	0.8	25	Sup	GFP	5462
487432	0.8	25	Sup	GFP	5983
488151	0.8	40	Sup	YebF	97
488151	0.8	40	Sup	YebF	1019
492366	0.8	40	Sup	pelB	49899
492366	0.8	40	Sup	pelB	50102
489251	0.8	40	Sup	OsmY	50821
489251	0.8	40	Sup	OsmY	628
487432	0.8	40	Sup	GFP	1550
487432	0.8	40	Sup	GFP	1650
488151	0.8	50	Sup	YebF	58442
488151	0.8	50	Sup	YebF	59364
492366	0.8	50	Sup	pelB	3813
492366	0.8	50	Sup	pelB	3813
489251	0.8	50	Sup	OsmY	4735
489251	0.8	50	Sup	OsmY	9580
487432	0.8	50	Sup	GFP	10502
487432	0.8	50	Sup	GFP	11267
488528	0.8	50	Pel	pelB	35353
488528	0.8	50	Pel	pelB	35353
488528	0.8	50	Pel	pelB	35353
488528	0.8	50	Pel	pelB	35353
488528	0.8	50	Pel	pelB	35353

- Youre right that Sec system has duplicates, but it isnt involved beyond being displayed on the x-axis as a secondary item. Removing it doesnt appear to change the plotting at all.

- I need to have the over clause on the y axis, otherwise spotfire averages across all the molecules, so all the values are the same. I can only remove that equation if I want to see all the replicates for each molecule.

- I will experiment further with limiting by current filtering.

Antonio Volpini · August 17, 2021

Additonal Update: I think I figured it out. I had to include all the categories being used to limit the data in the OVER statements for both the X + Y axis. That allowed the plot to sort correctly. The dense rank numbers are a little crazy, but that isnt really a big problem. This worked for both the demo data and real data. For viewing the individual replicates, I can remove the Y axis equation too, and the sorting still holds.

Antonio Volpini · August 18, 2021

Unfortunately, figuring out the data filtering issue was not the full solution. It appears to work for single plots, but breaks when Trellising data. The dark mode solo plot from my last post is the same as the red highlighted one in the below picture.

I think the trellising action is causing a bunch of x axis duplications, a couple of them are circled in blue. I dont understand how the Trellis function interacts with the axis equations, but I suspect that spotfire is showing all the x axis values possible on every graph.

When I remove the trellis columns from the x axis statement, the duplicates disappear, but the sorting also disappears (The second plot in this post). I cant change the Y axis equation, or the value is not calculated properly across all the conditions.

Sign In

Dense Ranking with multiple conditions

Recommended Posts

Antonio Volpini

Link to comment

Share on other sites

Fabian Duerr

Link to comment

Share on other sites

Antonio Volpini

Link to comment

Share on other sites

Fabian Duerr

Link to comment

Share on other sites

Antonio Volpini

Link to comment

Share on other sites

Fabian Duerr

Link to comment

Share on other sites

Antonio Volpini

Link to comment

Share on other sites

Antonio Volpini

Link to comment

Share on other sites

Antonio Volpini

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Industries