Introduction
This Customer Analytics Template for Spotfire uses Spotfire and Python and it is divided into five different parts, we will go through it step by step (Figure 1).
Figure 1. Cover page. Shows the different steps in this template.
Open Source Libraries Used
The necessary Python libraries are pandas, NumPy, spacy and mlxtend. And, you have to use Spotfire's inbuilt Python Tools from the Tools menu to install them. (Figure 2)
Visit Using Python packages in Spotfire to learn more details regarding Python setup.
Figure 2. The installation dialog box in Spotfire.
Data Functions
TIBCO Spotfire Data Functions are the Spotfire way to add pre-built Python and R scripts to Spotfire analyses. They can perform pretty much any type of calculation and return the results to a Spotfire analysis.
There are five data functions being used in this application and all are built using Python (Figure 3)
- Transaction Data Preparation: This data function performs some of the data preparation steps, such as variable creation and transformation. The rest of the preparation steps, such as cleaning, merging data sets, etc., are performed in the Data Canvas. Data Canvas is where users can review and author the data pipeline for each data table.
- MBA: The Market Basket Analysis is calculated in this data function. We use document properties to define the parameters needed to apply the Apriori algorithm and generate the association rules. The mlxtend library is used in this step.
- RFM: This data function uses the customer transaction data to get the recency, frequency, and monetary score per customer.
- CLTV: Given the number of periods ahead to be predicted, this Python data function executes the Customer Lifetime Value Analysis. Customers are segmented according to first purchase amounts and a Linear Model is used to then take the area under the curve as a measure of segment lifetime value.
- Segment Recommendations: This data function generates the top three categories purchased per customer and also produces the product category recommendations for the selected customers.
Figure 3. Data Functions in Spotfire.
These five data functions are already built and embedded in the tool, ready for you to use. You can modify existing data functions or build your own to answer other questions not covered in this template. If you want to learn more about Data Functions in Spotfire you can visit this page in the product documentation as well as this community article Spotfire Data Function Library: Python and R Scripts for Data Analysis.
Data
This first part covers access to the data. The tool was built using a big Kaggle data set of customer-level sales data from an online retailer selling various household products between 01/12/2009 and 09/12/2011. The dates of the invoices were advanced to near present time and product categories variables were manually defined to build the final dataset. The default data in the Spotfire template is a public dataset permitted to use for demo purposes, it contains information about customer transactions, indicating the product purchased, the quantity, its category, and price, as well as the date of purchase.
The Prepare Data button triggers a Spotfire Data Function to create a new cleaned and combined dataset ready for exploration and for further customer analytics steps in Spotfire.
You can easily navigate through the data, if you select/mark any items in the treemap on the right, the table shown on the left is updated. There is also a product hierarchy available so you can go to small details if you want. (Figure 4)
Figure 4. Data overview.
Data Exploratory Analysis
This is a critical step in the Data Science lifecycle because we need to understand patterns and data limitations before doing any more advanced analytics.
You can analyze best-selling and least-selling categories, seasonal behaviors, and other trends looking at time and product. Using the filters allows interactive analysis of the data, and delving into deeper detail. You can make selections to look at different years or a certain period of time. It is possible to look at different categories and select a group of categories as well.
The Spider Chart Mod for Spotfire® enables the visualization of multivariate data in the form of a two-dimensional chart. Spotfire Mods are an extension to Spotfire allowing custom visualizations to be developed using a cloud-enabled framework that makes it easy for anyone to build, share, and use.
The dashboards can be built with different metrics, you can select from the drop-down list and look at the 'Total Value', the 'Volume of Sales', the 'Average Value', the 'Number of customers' or the 'Number of Transactions'. This will depend on the type of questions that the business users ask. And you can develop other measures as well. (Figure 5)
Figure 5. Best and least selling categories, sales over time, and customers per year and category.
Market Basket Analysis
We are interested in recommending new products to our customers and to do that we are going to do a Market Basket Analysis (MBA).
If you have new data and you want to update the analysis, you can click the Update MBA button and it will run the MBA Data Function. The results are populated in Spotfire automatically. There are different settings that you can choose to run the analysis as well, you can set a minimum for lift, a minimum for support, and maximum group size to ensure the outcome of the analysis has appropriate quality. The data scientist who designs and maintains this tool can customize these settings to meet the specific needs of the business users.
MBA is used to identify a combination of products that are frequently bought together. We look at all of the purchases in the dataset and we generate measures that will give us an idea of the strength of these relationships between the different products. If you would like to learn more about MBA you can read this article.
The analysis produces a set of metrics, which are: lift, support, and confidence and they can be visualized in different ways. There is a table where the highest lift is at the top. When there is a lift that is higher than 1 it means that it is a good product to recommend, so if people are buying one product (antecedent) you can recommend the next product (consequent). Support is a measure of the presence of these combined purchases within the datasets and confidence is a measure of accuracy.
You can also look at the scatter plots to see the relationship between the measures. The sliders for the scatter plots give you another nice way of exploring the data.
There is a Network Chart Mod for Spotfire® as well, this visual is very useful because it displays all the relationships between all the data points in the dataset and it is a good way of exploring where the strong relationships are. The darkest color means the strongest relationship and the size of the node reflects support (popularity of the item).
On the left side, you will find different filters that you can use to filter the data. (Figure 6)
Figure 6. MBA, relationships between the different measures, and network chart.
We want to give product recommendations to a specific segment of customers, so the next step is segmentation. To do that we will use two techniques: RFM and CLTV.
RFM Analysis
RFM categorizes customers according to how long ago they made a purchase (Recency), how often (Frequency), and how much they spent (Monetary). Based on these categories and how customers are ranked in each of those metrics they will be assigned to meaningful business segments.
When you click on Calculate RFM the RFM Data Function is triggered. The bar chart will show the distribution of each segment and the color scheme helps to see what are the most promising customers and the highest risk customers. By selecting a segment in the visualization the graphs underneath will update automatically.
You can now see how there are next best actions you can take, for instance:
- Increase sales: For 'Potential loyal customers' recommend a product to increase how much they spend.
- Prevent churn: 'At-risk' customers, recommend promotion or campaign to re-engage them and avoid them from churning.
We also add filters by category and time, if preferred. (Figure 7)
Figure 7. RFM analysis for customer segmentation.
CLTV Analysis
There is another Segmentation technique that we used and that is Customer Lifetime Value (CLTV), a technique to predict how much the customers are going to spend in the future based on their historic purchase behavior.
If you enter the number of periods (in months) that you wish to predict and you click on Calculate CLTV, you can see the distribution of how much money we predict the customers are going to spend over time, which is their Customer Lifetime Value. The boxplot displays the distribution of all the values within the different segments.
As always, you can use the filters on the left to look at different categories or certain periods of time.
In order to create recommendations, you now need to select the segment that you want to build recommendations for by marking relevant bars in the RFM and CLTV charts, after that, just click on the Generate Recommendations button. This will automatically run the last data function - Segment Recommendations - and you can then navigate to the final part of the recommender tool using the navigation menu. (Figure 8).
Figure 8. CLTV analysis for customer segmentation.
Product Recommendation
On this page, you can see the recommendations for product categories to offer to the previously selected customers. This is based on the Market Basket Analysis for the selected segment. And you can change it to 'Total Value Adjusted' (which means that those categories where customers in the segment spend more money, are weighted heavier, and will be recommended sooner). Or you can change it to 'Average Value Adjusted' (based upon the average spend per purchase).
The two bubble graphs show the distribution of how the selected customers purchase in all different categories. The bubble size represents the number of customers and the color indicates the value that they have spent.
There are three different recommenders and, depending on the marketing or sales strategy, you may choose any one of these or develop further recommendations.
- Top 3 Recommended Product Categories: This recommender is using the weighted lift score and the color indicates the strength of the recommendation. If the customer buys product A you would recommend product B, with the high lift.
- You may also like/ Other customers also bought: This recommender uses support, which emphasizes the popularity of the item in the segment (high support).
- Try something different: those are products that have a lift score that is higher than one, so it's a positive relationship between the products, but it's something different so if your customers are buying products from the same categories you might want to introduce them to different categories. (Figure 9)
Figure 9. Product recommendations and purchase behavior for selected customers.
In this application, we have looked at the segments of customers which is appropriate for household goods, but it is important to mention that these recommendations can be turned into personalized recommendations by, for example, weighting each recommendation by that customer's purchasing history. This might be preferred for high-value items such as cars and holidays.
Plus, the process can be automated and built as an end-to-end solution, so that the data that you generate in this recommender can be written to a database or service to which the decision tool has access. For example, if you want to create automated emails or Web Site notifications you can bring up these recommendations automatically.
Finally, the last page of this template guides you to replace the data used provided in this template with your own data.
Summary
The Customer Analytics Template allows non-experts from any industry selling products or services to segment their customers using segmentation techniques such as RFM and CLTV, and generate product recommendations using MBA.
The tool uses Spotfire and Python to produce actionable consumer insights using intuitive, interactive sliders, filters, and visualizations. It is a starting point for an application and can be further customized to meet your specific recommendation needs.
Recommended Comments
There are no comments to display.