### Use Case Overview

This article demonstrates the detection of possible healthcare fraud, waste, and abuse via an outlier analysis on Medicare beneficiary data. If the analysis shows that particular providers, on average, generate abnormally high costs for Medicare beneficiaries, controlling for other factors, then those providers can be flagged for follow-up and potential investigation.

### Data Requirements

Our sample data consist of demographic, health indicator, and reimbursement data from Medicare claims. We have fine detail on reimbursement, breaking cost out by inpatient, outpatient, and carrier providers, and showing the Medicare, primary payer, and beneficiary responsibility in each of those categories of care.

### Data Filtering

*First template workflow for filtering the source dataset down to the population of interest.*

Our first flow filters the dataset down to our target population: males over 64 living in Florida with diabetes and without end stage renal disease (ESRD). We exclude beneficiaries with ESRD because it is an incredibly complex and costly condition (and thus often excluded from medical analyses involving cost). We transform the cost data by taking the log base 10 of each column. Because cost data data are positively skewed (costs have a lower bound of zero) applying a log transform makes relationships to other features more linear.

### Cost Model

*Second template for modeling reasonable cost.*

For our model, we apply a linear regression to predict the beneficiary's total cost. Our goal is to predict the expected annual cost for a beneficiary with diabetes given the other beneficiary characteristics. When a patient's annual spending exceeds some threshold above expected cost (e.g., 3 or 6 standard deviations), then a beneficiary (and particularly his or her providers) can be flagged for investigation.

### Outlier Detection

*Third template for finding outliers based on the cost model.*

In the final workflow we use the statistically significant coefficients (at the 95% confidence level) estimated for each provider in the model above, and calculate the mean and standard deviation for the distribution of these coefficients. We then identify all providers whose coefficients are six standard deviations above the mean, and report the provider along with the coefficient in a table.

### Check It Out!

For access to this Playbook, including its workflows, sample data, a PowerPoint summary, and expert support from Spotfire® Data Science data scientists, contact your Spotfire® Data Science sales representative.

## Recommended Comments

There are no comments to display.