Jump to content

How to compute any Percentile in Team Studio


Recommended Posts

Hi. I would like to compute in Team Studio e.g. 90percentile pergroup (group is defined by some categorical variable) but when I looked at the operators Summary and Aggregations, it seems only quartiles are here as default options. Is there a way to compute any percentile Thanks
Link to comment
Share on other sites

You can compute an approximationby chaining a few operators. In my example I want to calculate quantiles of Age grouped by Gender.

Firstly you use Window-Functions Rank: this will sort and divide the data into regular buckets or bins. In this operator the term quantiles means the actual buckets. You need to decide the number of buckets based on the quantiles you wish to compute. In this example I decide to use 10 buckets, that will divide my data into ten bins and give me the 10th, 20th , 30th,90th and 100th quantile. In order to be able to change this quickly (as I am using this information in two operators) I created a workflow variable called @Ncuts. I gave it 10 as the value.

The main input parameters you will need to decide are:

(A) Partition By: the column you want to group against (Gender)

(B) Order By: the column you want the quantile of (Age)

Order: Ascending

Calculate Quantiles: Yes

Number of quantiles:

10 (or @Ncuts if you have defined it)

Columns to keep: the two columns in (A) and (B)

 

In order to translate my quantiles into something more user-friendly I used a Variable operator, but this is optional. I defined quantileRegion as a double:

quantile*100/10 (or quantile*100/@Ncuts if you have defined @Ncuts)

 

Finally the Aggregation operator, to calculate my quantile values for Age by Gender:

Define Aggregations. For each bucket, the corresponding value of the Age quantile is approximately the maximum value.

Group By 1: your grouping column (Gender)

Group By 2: quantileRegion (or quantile if you did not define quantileRegion)

Aggregation Type: MAX of your column of interest (Age)

The result is one row per quantile per group. You can filter the resulting data table for the quantile(s) you wanted to calculate.

Note that if you had fewer than @Ncuts values in your variable of interest for some values of the grouping variable, you would not see all the quantiles, so best to check beforehand.

Gaia

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...