Gains chart
Typically called a Cumulative Gains Chart it can be simply explained by the following example:
For simplicity let's assume we have 1000 customers. If we run an advertising campaign for all our customers, we might find that 30% (300 out of 1000) will respond and buy our new product.
Marketing to all our customers could be one strategy for running a campaign. But this is not the optimum use of our marketing dollars, especially for large customer bases. Therefore we would like to have a better way of running this advertising campaign so that instead of targeting all our customer base, we target only to those customers with a high probability of responding positively to the campaign. This will, firstly lower the cost of the campaign and, secondly (and maybe more importantly) we will not disturb those customers with advertising who have no interest in our new product.
This is where predictive classification models come in. There are lots of different models, but no matter which one we use, we can still evaluate the results of our model by using Cumulative Gains Charts. If we have historical data on the reactions of customers to past campaigns, we can use the data to build a model that predicts, if a particular customer will respond by buying the product or not. The results of such a model are typical, for each customer, the probability of a positive and negative reaction from the customer. We can sort customers according to the probability of a positive reaction to the campaign and run the campaign only for a percentage of customers with the highest probability.
The Gains chart is the visualization of that principle. On the Xaxis we have the percentage of the customer base we want to target with the campaign. The Yaxis gives us the answer to the percentage of all positive responses customers have found in the targeted sample. In the picture below you can see an example of the Gains chart. (The gains chart associated with the model is the red curve).
What can we read from the graph? What happens if we only target 10% of our customer base? According to the results of our model, if we will take the 10% of customers with the highest probability of a positive response, we will get 28% of all the possible positive responses. This means we will find 84 customers with positive responses from the 100 customers reached by the campaign (84 is 28% of 300 positive response customers in our customer base).
With an increase of targeted customers to 50%, we already have more than 80% of those who will, in a real situation, give a positive response. If this is our selected strategy for the real campaign (reaching 50% of our customers by the model), then we will have reached 80% of all the positive responses and saved 50% of our costs of running the campaign (we do not want to run the campaign to customers that are not likely to respond positively).
The choice of the percentage to be targeted in the campaign depends on the concrete costs for the campaign and the profit from the expected positive responses. The Gains chart is a display of the expected results based on the choice of the percentage targeted. Our final strategy, therefore, consists of the model and the targeted percentage (instead of the percentage we can define the cutoff value for probabilities  if the probability is above this value/threshold we will include the customer in the campaign).
It was already said that the red curve represents the proposed model. The Blue curve represents the gains chart of a random model. In this case, we are displaying the observed results of picking customers randomly without any selection criteria, which assumes that we would get the same proportion of positive responses if we target the whole customer base. In other words, If we target 10% of all customers, we will have 10% of all the positive responses within our 10% sample. The curves are meeting at (0, 0) and (100, 100), the second point means we run the campaign to all customers, therefore the output (all those who responded positively) is the same as the observed results. When we are using a predictive model, in this case picking customers according to sorted probabilities, it does not make sense when we include all customers.
The Green curve is the optimal model, the best possible order for picking customers ? we will first target all customers with a positive response and then those with a negative response. The slope of the first part of the green curve is 100/(percentage of all positive responses).
Confusion matrix
To test our strategy (defined by the model and the targeted percentage or equivalently the cutoff value) we need to compare the output of the model to the actual results in the real world. This is done by comparing the results and creating a contingency table of misclassification errors (terminology as used in hypothesis testing  TP means true positive, FN false negative, FP false positive, and TN true negative):
Prediction YES 
Prediction NO 

Observed YES 
Count TP (right decision) 
Count FN (error of the second kind) 
Observed NO 
Count FP (error of the first kind) 
Count TN (right decision) 
Ideally, we want to have the right decisions made with high frequency. Such a table (usually called a confusion matrix) is a very important decision tool when we evaluate the quality of the model.
For better orientation, it is common practice to display the confusion matrix in the form of the following graph. From this graph, we see, how many times the model predicts correctly (true negatives and true positives) and how many times we have an incorrect prediction (false positives and false negatives). The better the model, the larger the bars TP and TN in comparison to FN, and FP.
A point on the gains chart is equivalent to:
The second term is on the Xaxis and it is a fraction of targeted customers.
Discussed curves (ROC, Gains, and Lift) are computed based on information from confusion matrices. It is important to realize that curves are created according to a larger number of these confusion matrices for various targeted percentages/cutoff values.
ROC curve
Other terms connected with a confusion matrix are Sensitivity and Specificity. They are computed in the following way:
The ROC curve (Receiver Operating Characteristics curve) is the display of sensitivity and specificity for different cutoff values for probability (If the probability of a positive response is above the cutoff, we predict a positive outcome, if not we are predicting a negative one). Each cutoff value defines one point on the ROC curve, plotting the cutoff for the range of 0 to 1 will draw the whole ROC curve. The Red curve on the ROC curve diagram below is the same model as the example for the Gains chart:
The Yaxis measures the rate (as a percentage) of correctly predicted customers with a positive response. The Xaxis measures the rate of incorrectly predicted customers with a negative response.
The optimal model could be the following: Sensitivity will rise to a maximum and specificity will stay the whole time at 1 (the optimal model is in green color). The task is to have the ROC curve of the developed model as close as possible to the optimal model.
Usage
The Gains and the ROC curve are visualizations showing the overall performance of the models. The shape of the curves will tell us a lot about the behavior of the model. It clearly shows how much our model is better than a model assigning categories randomly and how far we are from the optimal model which is in practice unachievable. These curves can help in setting the final cutoff point for deciding which probabilities will mean positive and negative response prediction. The model together with the cutoff point will define our strategy of who should be targeted by the campaign and who should not be (Typically a chosen default value of 0.5 might not meet the requirements of the use case nor would it be the best cutoff). During the building of the predictive model, we can have many interim models  candidates for the final best model. Displaying more Gains (ROC) charts for more models in one graph gives the possibility to compare models.
It is very important to mention that ROC, Gains, or Lift charts are connected only by one predicted category! In our example, we were interested in finding customers with positive responses because that was the main task of our use case. There are also analogical Gains and ROC charts that represent the negative customer response as well. If the main goal for prediction was finding the customers with a negative response, the criterion for the quality of the model would be rather Gains or ROC curve for the negative response category.
So, what is the difference?
Both curves are displaying the dependence of the correctly predicted category in question (positive response in our example) by changing the cutoff of assignment to that category. The difference is the scale on the Xaxis of the graph, whereas the Yaxis is the same for Gains as well as the ROC chart. If you love formulas then have a look at the following table:
The Graphical representation of the results as a confusion matrix is below  colors on the graph represent the same as the color markings in the table above:
The whole principle of connection of Gains and ROC charts together with Confusion matrices (tables of good and bad classifications) is below. The main goal of the graphs below is to highlight the fact that a single confusion matrix (as well as other measures like misclassification rate) are connected only with one point on the Gains, ROC, or Lift chart!
Lift chart
We have mentioned the Lift chart several times but have not explained it. A Lift chart comes directly from a Gains chart, where the Xaxis is the same, but the Yaxis is the ratio of the Gains value of the model and the Gains value of a model choosing customers randomly (red and blue curve in the above Gains chart). In other words, it shows how many times the model is better than the random choice of cases. We can see that the value of the lift chart at X=100 is 1 because if we choose all customers there would be no lift. The same customers will be picked by both models.
We hope you enjoyed this article and we wish you a lot of good predictive models.
Useful links
 This topic has been discussed in a Dr Data Science session, see this youtube video. And if you enjoy these topics, please sign up for the next Dr. Data Science series.
 We have created a Spotfire application for calculating all the above mentioned charts which is recommended to use if you would like to educate colleagues or students on this topic.
Recommended Comments
There are no comments to display.