Jump to content

Aggregating 3 levels deep


Joseph Hemmann

Recommended Posts

Sorry if this appears to ramble, but I'm just trying to be as specific as possible with my issue.

So I found an answer to aggregating an aggregate in a cross table which works for the first level of my analysis. However, I need to meta aggregate THAT value and "then Sum([Value])" gives me an error in this context.

My cross table has 3 grouper levels, User, Code Category, and Code. For each User, I need to compare the bell curve of the individual Codes they use per Code Category against the entire population's bell curve of the same. I have gotten as far as for each code, I have the user's percentage in the category,the overall percentage in the category, and the variance of each code (user against population). However, for my next step, I want to show the worst offending users as far as deviance from the population, but in order to do that I need to sum their deviance at each code level into a total deviance for the entire code category.

The attached JPEG is an example of the concept I'm trying to get at. 2 sellers (Brian and John) sell apples and beans (the "code categories"), bothing having available to them Gala, Fuji, and Granny Smith apples, and Pinto, Black, and Kidney beans (the "codes"). I have figured out how to derive the percentages of each seller's sales among each category and the overall percentages of each item within each category, ignoring seller. I can even, at the code level, see their variance to the overall code percentage. The code snippet below covers all of these scenarios.

I cannot, though, sum the absolute values of those variances (the Total Variance) to see who's the worstperformer as far as staying within the desired bell curve. The line highlighted in yellow is also an issue I've run into, I can't seem to show when a seller has no sales of a particular code, but the overall sales do have representation, I want it to show for each seller with the variance of the entire percentage that code represents to the overall. Oh, and the code below shows Count instead of Sum because the background data doesn't have a count field, it's one line per sale, but the cross table looks like my example data in the screenshot.

 

Count([sale ID]) as [Count],

Count([sale ID]) / Count([sale ID]) over (intersect([user],[Category],All([Axis.Rows]))) as [user %],

Count([sale ID]) over (Intersect(

,All([Axis.Rows]))) as [Overall Count], (not shown in example, but works)

Count([sale ID]) over (Intersect([Code],All([Axis.Rows]))) / Count([sale ID]) over (intersect([Category],All([Axis.Rows]))) as [Overall %],

Abs(Count([sale ID]) / Count([sale ID]) over (intersect([user],All([Axis.Rows])))

- (Count([sale ID]) over (Intersect([Code],All([Axis.Rows]))) / Count([sale ID]) over (intersect([Category],All([Axis.Rows]))))) as [Variance], (looks convoluted, but it works)

Sum(Abs(Count([sale ID]) / Count([sale ID]) over (intersect([user],All([Axis.Rows])))

- (Count([sale ID]) over (Intersect([Code],All([Axis.Rows]))) / Count([sale ID]) over (intersect([Category],All([Axis.Rows])))))) over [user]

 

The bold section is what I want in psuedocode, but this doesn't work. Neither does leaving the Abs statement standalone and using "then Sum([Value])", that fails as well.

Anyone run into this particular analytical need before

Link to comment
Share on other sites

  • 2 months later...

Below expression should work for showing Total variance for each user across all categories, codes for that user. FYI- there is no image attached and it would be helpful to tally the data if sample data would have been provided in excel

Sum(Abs(Count([sale ID]) over(Intersect([Axis.Rows])) / Count([sale ID]) over (intersect([user],All([Axis.Rows])))

- (Count([sale ID]) over (Intersect(

,All([Axis.Rows]))) / Count([sale ID]) over (intersect([Category],All([Axis.Rows])))))) over (intersect([user],All([Axis.Rows])))
Link to comment
Share on other sites

Khushboo,

 

Thanks for your attempt. And sorry about the missing image, I thought I had uploaded it but if I remember correctly, I was getting some errors that day and decided to give up. There should now be an Excel spreadsheet attached which illustrates my issue (using the scenario I described in the original post).

 

As far as your solution, I think you've gotten me close but something is still incorrect. The logic does not throw an error like my attempts always did and it repeats the value for each seller on the lines for each code as would be expected (their total variance in a category is the same for each code in the category). However, for a seller who's individual variances are approximately 6, 3, and 2% for each of the codes, it says their total variance is over 8000%, which should obviously be 11%.

 

I have thrown a couple attempts at fixing your code, but to no avail. I either come up with code that is slightly different than yours but evaluates to the exact same number, or it actually makes the situation worse and gives me numbers in the millions of percent. If you don't have time to look again I understand, the kick start was appreciated either way.

 

Edit: Just reread your response and noticed you said "across all codes/categories for each user". I'm trying to get the total across all codes within each category (not across all categories) for each user. The performance metric is concerned with evaluating each category, so if someone is a terrible performer in Beans and the analyst is currently evaluating Apples, they don't want the bad Beans seller to appear if they are spot on the bell curve for Apples. The new Excel attachement should make that more clear.

Link to comment
Share on other sites

Hello Joseph,

One difference I found betweenexcel and description provided above is, you wanted total variance to be per user per category whereas above expressions were only for each user.

New custom expression would be on the similar lines just added [Category] forthe final sum clause over.

Sum(Abs((Sum([sale ID]) OVER (Intersect([Axis.Rows])) / Sum([sale ID]) over (intersect([user],[Category],All([Axis.Rows])))) - (Sum([sale ID]) over (Intersect(

,All([Axis.Rows]))) / Sum([sale ID]) over (intersect([Category],All([Axis.Rows])))))) over (intersect([user],[Category],All([Axis.Rows]))) as [Total Variance]Attached is the dxp for reference.
Link to comment
Share on other sites

Ok, I can see that your code works in your example. I can follow the logic. However, when I copy the code into my DXP it fails. The only thing I change is the field names and the internal sums have to become counts because the data source is line item and I don't have a field that just outputs the number 1 that would be sumable. I could do that if there is actually a problem with "Count() over", but I doubt that's really the issue.

 

I don't know if it's maybe a problem of scale (my data source has over 5 million rows, over 400k in the filtered data set that I grabbed the screenshot from). Some other factors that I can't see how they would be relevant, but might be, are that the users almost certainly have a lot of line items with no category applied (the category only applies to certain codes and there are a lot of codes that don't fall into a category) and there are a lot of users in the data set who will have no records within any given category.

 

I can't attach my actual DXP because it has information that I would get in a lot of trouble if I shared, but I doubt it will cause any issue to share these user masked numbers. The Billing Provider is the "user", E&M Category is the "category", and CPT Code is the "code". You can see in the screenshot I have users with very small variances evaluating to giant overall variances. A screenshot from my DXP and the actual code for the Overall Variance field are below. Thanks for being so responsive!spotfire_example_2.jpg

 

Sum(Abs((Count([HSP Account]) OVER (Intersect([Axis.Rows])) / Count([HSP Account]) over (intersect([billing Provider],[E&M Category],All([Axis.Rows])))) - (Count([HSP Account]) over (Intersect([CPT Code],All([Axis.Rows]))) / Count([HSP Account]) over (intersect([E&M Category],All([Axis.Rows])))))) over (intersect([billing Provider],[E&M Category],All([Axis.Rows]))) as [Overall Variance]

 

 

Link to comment
Share on other sites

Okay. With multiple lines for same code/category/user, inner expression within sum would be calculated for each row and then summed up. So if you have 10 lines for apple gala, the 10 times variance would be summed up.

I have altered the expression to divide the variance by the count of sale id before summing it up

sum(Abs(((Count([sale ID]) OVER (Intersect([Axis.Rows])) / Count([sale ID]) over (intersect([user],[Category],All([Axis.Rows])))) - (Count([sale ID]) over (Intersect(

,All([Axis.Rows]))) / Count([sale ID]) over (intersect([Category],All([Axis.Rows]))))) / Count([sale ID]) over (intersect([Axis.Rows])))) over (intersect([user],[Category],All([Axis.Rows]))) as [Total Variance]Attached is the dxp for reference.

Hope this helps !!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...