Jump to content

Dense Ranking with multiple conditions


Antonio Volpini

Recommended Posts

I am working with a dataset in Spotfire that has several replicates of multiple test articles being run through several sets of conditions, with a single final readout column. See example table below

 

 

 

Sample

Temp

Conc

Time

Lum

 

 

1A

30

1

15 Hrs

832982

 

 

1B

30

1

30 Hrs

374

 

 

1C

30

2

15 Hrs

2

 

 

1D

30

2

30 Hrs

12

 

 

1E

20

1

15 Hrs

415

 

 

1F

20

1

30 Hrs

15

 

 

1G

20

2

15 Hrs

6

 

 

1H

20

2

30 Hrs

34449

 

 

 

In this example I am trying to rank my results to make the end graphs more digestible. I started with limiting the data set to only the conditions that I want through the "Limit Data through a custom expression" panel. Example here:

([Temp]= "30") And ([Time] = "15 Hrs") And ([Conc] = 1)

Then set up a denserank function in a custom expression on the x axis to sort the values into a smooth curve.

The Xaxis expression I am trying is:DenseRank(Avg([Lum/OD]) OVER ([MOLECULE_ID]), "asc")

The Y axis expression is:Avg([LUM]) OVER ([MOLECULE_ID])

This produces something similarto the attached graph. The points dont rank correctly.My questions are:

- Could anyone explain why the displayed data doesn't rank in an ascending curve What is wrong withthe custom expression

- Is there a better way to get points to rank To get a smooth curve

- The value of the x and y points dont match (see 2nd attachment) Why does this happen The equations are the same, so the value should also be the same

Any help would be much appreciated!

Link to comment
Share on other sites

You rank your data by molecule_id but you plot your data over sec system. How are those two columns related Can you make a screenshot when you replace sec system with molecule_id on the x-axis

 

Aside: If you use a bar chart you have the option to sort your x-axis by y values. This could solve your problem without using a rank function.

Link to comment
Share on other sites

Molecule ID and Sec System are the same. Sec System is included only for a name reference, as the molecule IDs are jsut long numerical strings. Removing it does not change the ordering. 

A bar char is not the preferred method for this data as sometimes we need to view all the replicates, which makes the bar chart unwieldy.

Additionally, when switching to a bar chart, spotfire says that my expression is not valid. to solve this I removed the OVER statement from the X and Y Axis, the graph loads, but does not order correctly. Removing the over clause also presents a problem as I need to calculate the averages for subgroups within the data. I cannot always control the columns as exactly as in this example.

Link to comment
Share on other sites

Unfortunately, I can only give a fake data table. The information will not match the graph, but is the similar to the real data. in terms of the type of data in each column and what is in the dataset overall. 

 

Additionally, I found that the data limiting equation might be affecting this. I removed the whole equation and added parts back one at a time, and after adding more than on "and " clause the sorting appears to break. i.e. I could keep a filter on the Pellet column, but not add on the time, conc, or temp columns. Any addition of those  immediately breaks the sorting. Unfortunately, I am new to spotfire. Is there another way to separate out subsets of data Or a way to know what columns you are able to filter on while preserving the function of the sorting method

 

Example table below

 

MOLECULE_ID LUM Conc Temp Time Pellet vs Sup sec system
488151 710 0.6 25 6 Hrs Sup YebF
492366 61382 0.6 25 6 Hrs Sup pelB
488856 4540 0.6 25 6 Hrs Sup espP
488074 97 0.6 25 6 Hrs Sup tdK
491590 49899 0.6 25 6 Hrs Sup pelB
488307 628 0.6 25 6 Hrs Sup Pet
482001 58442 0.6 25 6 Hrs Sup YebF
488528 3813 0.6 25 6 Hrs Sup pelB
488706 9580 0.6 25 6 Hrs Sup tdK
488455 5203 0.6 25 6 Hrs Sup OsmY
489467 18127 0.6 25 6 Hrs Sup espP
488735 7038 0.6 25 6 Hrs Sup Pet
488870 103 0.6 25 6 Hrs Sup pelB
489523 51167 0.6 25 6 Hrs Sup celCD
489327 17971 0.6 25 6 Hrs Sup fhaB
488707 8475 0.6 25 6 Hrs Sup ompA
488550 5407 0.6 25 6 Hrs Sup ChiA
489065 35353 0.6 25 6 Hrs Sup pelB
488906 14321 0.6 25 6 Hrs Sup OsmY
488823 1672 0.6 25 6 Hrs Sup ompA
488887 11420 0.6 25 6 Hrs Sup celCD
488159 454 0.6 25 6 Hrs Sup fhaB
480246 30917 0.6 25 6 Hrs Sup ctxB
488308 555 0.6 25 6 Hrs Sup pelB
488100 13780 0.6 25 6 Hrs Sup zot
489084 34412 0.6 25 6 Hrs Sup ChiA
488299 456 0.6 25 6 Hrs Sup pelB
488288 1541 0.6 25 6 Hrs Sup ctxB
489048 28169 0.6 25 6 Hrs Sup pelB
488820 590 0.6 25 6 Hrs Sup zot

 

 

 

 

Link to comment
Share on other sites

Everything works perfectly fine with your sample data. Even when I include a data limiting expression (matching the current data).

 

But molecule_id and sec system are not the same! Some molecule_ids share the same sec system. 

 

rank_mol.png

 

 

 

Also, you can remove your OVER expression on the y-axis. Just use Avg([LUM])

 

You already have your data separated on the X-axis. No further breakdown needed.

 

Moreover you can change the settings for your x-axis expression to just evaluate on the current filtering. Do this in the properties.

Link to comment
Share on other sites

First off, thank you for trying that out, and for the help in general. Unfortunatley many of those fixes dont work in my case. 

 

- I accidentally misrepresented my dataset in my last comment. The example I just gave did not include multiple subsets as I indicated in my initial post. I have edited it to be more accurate. Now the data filtering should acutally have an impact. There are two larger groups with multiple replicate values for subgroups within that. I have added the amended data, and an attempt I made at plotting below. The plot shows the same issues as previous. For instance I might want to filter down to the values for: Pellet vs Sup = Sup; conc = 0.6; Temp = 25. Data filtering used for that plot was: ([Pellet vs Sup]='Sup') and ([Conc]=0.6) And ([Temp]=25)

 

MOLECULE_ID Conc Temp Pellet vs Sup sec system LUM
488151 0.6 25 Sup YebF 710
488151 0.6 25 Sup YebF 1632
492366 0.6 25 Sup pelB 61382
492366 0.6 25 Sup pelB 61481
489251 0.6 25 Sup OsmY 62304
489251 0.6 25 Sup OsmY 4540
487432 0.6 25 Sup GFP 5462
487432 0.6 25 Sup GFP 5983
488151 0.6 40 Sup YebF 97
488151 0.6 40 Sup YebF 1019
492366 0.6 40 Sup pelB 49899
492366 0.6 40 Sup pelB 50102
489251 0.6 40 Sup OsmY 50821
489251 0.6 40 Sup OsmY 628
487432 0.6 40 Sup GFP 1550
487432 0.6 40 Sup GFP 1650
488151 0.6 50 Sup YebF 58442
488151 0.6 50 Sup YebF 59364
492366 0.6 50 Sup pelB 3813
492366 0.6 50 Sup pelB 3813
489251 0.6 50 Sup OsmY 4735
489251 0.6 50 Sup OsmY 9580
487432 0.6 50 Sup GFP 10502
487432 0.6 50 Sup GFP 11267
488151 0.8 25 Sup YebF 710
488151 0.8 25 Sup YebF 1632
492366 0.8 25 Sup pelB 61382
492366 0.8 25 Sup pelB 61481
489251 0.8 25 Sup OsmY 62304
489251 0.8 25 Sup OsmY 4540
487432 0.8 25 Sup GFP 5462
487432 0.8 25 Sup GFP 5983
488151 0.8 40 Sup YebF 97
488151 0.8 40 Sup YebF 1019
492366 0.8 40 Sup pelB 49899
492366 0.8 40 Sup pelB 50102
489251 0.8 40 Sup OsmY 50821
489251 0.8 40 Sup OsmY 628
487432 0.8 40 Sup GFP 1550
487432 0.8 40 Sup GFP 1650
488151 0.8 50 Sup YebF 58442
488151 0.8 50 Sup YebF 59364
492366 0.8 50 Sup pelB 3813
492366 0.8 50 Sup pelB 3813
489251 0.8 50 Sup OsmY 4735
489251 0.8 50 Sup OsmY 9580
487432 0.8 50 Sup GFP 10502
487432 0.8 50 Sup GFP 11267
488528 0.8 50 Pel pelB 35353
488528 0.8 50 Pel pelB 35353
488528 0.8 50 Pel pelB 35353
488528 0.8 50 Pel pelB 35353
488528 0.8 50 Pel pelB 35353

 

 

 

screenshot_2021-08-17_085906.png

 

 

 

- Youre right that Sec system has duplicates, but it isnt involved beyond being displayed on the x-axis as a secondary item. Removing it doesnt appear to change the plotting at all.

 

- I need to have the over clause on the y axis, otherwise spotfire averages across all the molecules, so all the values are the same. I can only remove that equation if I want to see all the replicates for each molecule. 

 

- I will experiment further with limiting by current filtering.

Link to comment
Share on other sites

Additonal Update: I think I figured it out. I had to include all the categories being used to limit the data in the OVER statements for both the X + Y axis. That allowed the plot to sort correctly. The dense rank numbers are a little crazy, but that isnt really a big problem. This worked for both the demo data and real data. For viewing the individual replicates, I can remove the Y axis equation too, and the sorting still holds. 

 

screenshot_2021-08-17_091901.png

 

screenshot_2021-08-17_092110.png

Link to comment
Share on other sites

Unfortunately, figuring out the data filtering issue was not the full solution. It appears to work for single plots, but breaks when Trellising data. The dark mode solo plot from my last post is the same as the red highlighted one in the below picture.

 

I think the trellising action is causing a bunch of x axis duplications, a couple of them are circled in blue. I dont understand how the Trellis function interacts with the axis equations, but I suspect that spotfire is showing all the x axis values possible on every graph.

 

When I remove the trellis columns from the x axis statement, the duplicates disappear, but the sorting also disappears (The second plot in this post). I cant change the Y axis equation, or the value is not calculated properly across all the conditions. 

 

Attempt tp trellis and sort values

 

screenshot_2021-08-18_163151.png

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...