Jump to content

How to pass grouped rows and full data set to expression function


Manoj Chaurasia

Recommended Posts

Hi All,

I am working on creating a dynamic aggregation function using an expression function where I need to pass the list of values for the row (e.g. from a cross table) as well as the list of values for the entire visualization. I need to do this because I am log transforming the aggregated value to deskew the aggregated amounts, then normalizing the results between 0 and 1 for each row and need to reference the min and max values of all the rows in the visual. The visual is dynamic so the rows can change based on a drop down, so I need to change how the min/max values are provided versus just calculating them once and continuing to re-use them. I had planned to do this just by evaluating the full (or filtered) data set that the visual is using.

I was able to calculate each piece of the formula individually using over functions in a custom expression (i.e. Sum([Amount]) THEN Min([Value]) over ([Amount]), but you can't nest the over expressions together to do a dynamic normalization such as (X - min) / (max - min) where the max and min are based on the row groupings. If you try to use more than one over expression you get errors that say expected ) found THEN or similar messages.

I had thought I could pass the row based values versus the data table based values to the function like FunctionName([Amount], [DataTable].[Amount]) where the second parameter would pass ALL values and the first parameter would only pass those for the row.

Is there anyway to accomplish this I also have researched trying to access the data table directly from the R/TERR code but it seems that's not possible and everything has to be passed as a parameter or read in directly as a data file or connection. Seems like such an easy approach but I haven't been able to figure it out.

Any suggestions extremly appreciated!

**Edit to include additional detail**

Please see the attached example excel file which demonstrates what I'm trying to accomplish:

1. I have a set of transaction level details with values (e.g. Amount) which need to be transformed and normalized

2. These transactions need to be grouped and calculated at the Name level

3. These transactions need to be goruped and calculated at the Group level

Ideally I would perform this calculation in a custom expression, but I have been unable to next the appropriate min/max values at the aggregation level chosen by the user (i.e., via a document property/drop down menu). The crux of the issue is that I need to be able to obtain the Min and Max values after I transform the data to re-normalize it. So in this simple example, first group by the selected aggregation level (e.g., name), log transform all of the aggregated values, then normalize the log values by the aggregated amounts.

Link to comment
Share on other sites

Here are some hints.

 

You could use...

 

Log10(Sum([Amount]))

Log10(Sum([Amount])) THEN Min([Value]) OVER (All([Axis.Rows]))

Log10(Sum([Amount])) THEN Max([Value]) OVER (All([Axis.Rows]))

 

... in your cross table. But this is the point where I got stuck.

 

 

 

Instead, I recommend to write a very simple data function that takes your filtered table as input and returns this input. Then you can use calculated columns in your new table. Whenever you filter the input table, the output table will refresh automatically (set the checkmark!) and the calculated columns will be correct. In the calculated columns you can use simple over statements. Finally use a crosstable to display the final result

Link to comment
Share on other sites

Thanks fabd - I was also able to get the individual parts of the min/max values using the THEN and OVER, but it will not let you nest them together.

 

For the data function, the tables I'm working with are quite large, 100s of thouands of records, so running the data function bound to a drop down menu and copying it into memory would likely be very compute intensive and likely slow (if it were to work).

 

I did solve the problem using calculated fields in the source data that are filtered using a document property set from a drop down menu, but it's also quite slow and not effective.

 

Is there anyway to reference an OVER clause without nesting it or pass the 'full' dataset to a function from a custom expression

Link to comment
Share on other sites

I can think of a rather plain (but should not be too slow) solution in which you build the amount, log and normalized log by name or group as calculated columns (you could select the column to group by in this example [Name], through a document property).

GroupedAmount: Sum(Real([Amount])) over ([Name])

LogGroupedAmount: Log10(Real([GroupedAmount]))

Normalized: ([LogGroupedAmount] - Min([LogGroupedAmount])) / (Max([LogGroupedAmount]) - Min([LogGroupedAmount]))

I suspect you have already tried it.

Otherwise, maybe there is another more involved solution if you can use Iron Python. I would go with the simple solution though.

You can build your cross table where the rows are the column you want to group byand calculate the axes as:

Sum([Amount]) as [Amount], log10(Sum(Real([Amount]))) as [Group Log Amount]

At this point, your cross table is hopefully much smaller than your original table. So you could write an Iron Python script that scrapes this cross table, puts it into a new table and sends it to a TERR data function that calculates the normalization.

Some ideas could be found in this dxp (in this case the cross table is turned into a json document property, but if the cross table has the same axis names you could simply write another data table).

https://community.spotfire.com/modules/natural-language-generation-example-...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...