Jump to content
  • Customer Targeting and Spatial Analysis


    This Playbook demonstrates how to perform spatial analysis on your dataset within a Spotfire Data Science team Studio Workflow. Specifically, we are going to look at how a retailer can assess how many potential high value customers are within a certain trade area for a given latitude and longitude.

     

    Data Requirements

    We used a randomized data from a set of 300,000 Southern California properties with 9 unique features: id, value, ltv_ratio, zip, city, county_id, state, latitude and longitude. We also used this dataset from Github to add zip code centroid latitude and longitude and used Google Maps to pick a retailer in Southern California and use its latitude and longitude. Your dataset should contain at least value, ltv_ratio, zip, latitude and longitude in its main property dataset.

    Data Exploration

    In this Playbook, we assume that high value customers are those who are in the top quintile for both Property Value (value) and Loan-to-Value Ratio (ltv_ratio). To understand what we should make our cutoffs to be, we must first explore our data so that we can understand our population. We will perform the following tasks separately for both Property Value and Loan-to-Value Ratio, I will refer to each as the "target feature":

    1. Row Filter to remove null values and zeros from the target feature
    2. Variable operator to create 5 equal bins of the target feature by using Quantile Variables and "Average Ascend" for Quantile Type
    3. Summary Statistics on our target feature using the bins we created in the previous step for the Group by parameter

    We can then analyze the output of our Summary Statistics operators to see the ranges for our top quintiles of Property Value and Loan-to-Value Ratio - these values will be used in a Row Filter to remove anyone who does not meet these thresholds.

    At this point, your flow should look something like this:

    screen_shot_2017-07-07_at_3_29.46_pm.png.8b945cd5ab472c372525953dc4ac6407.png

    Data Enrichment and Feature Engineering

    In this section we will want to perform the following tasks:

    1. Join in our zip code centroids on zip
    2. Perform a side join of our retailer location by joining on 'dummy variables'
      1. For this step, we have created two new columns in each dataset called 'dummy' that has a value of 1 for every record
    3. Calculate the great circle distances of each zip centroid and property from the retailer location
      1. In this example we are using Postgres database so we can leverage the earthdistance functions - these return distance in meters so we also divided the output by 1609.34 to convert to miles. If you are using another database or Hadoop, please refer to their documentation or calculate the Haversine Distance using the Haversine Formula
    4. Row Filter to select zip codes whose centroids are within 10 miles of the retailer - we are considering this radius to be the trade area
    5. Variable operator and use CASE statements (or IF statements in other languages) to create booleans for subsets of radiuses
      1. We used 2.5 miles and 7.5 miles for ours - this a nice indication of how close these high value customers actually are
    6. Aggregate by zip to sum the counts of each radius and the total counts in the trade area

    Your final flow should look something like this:

    screen_shot_2017-07-07_at_4_53.09_pm.thumb.png.a3ec2f4d5053ddaba1b5593bb96b6d0b.png

    Other Playbook Assets

    Homeowner Segmentation Touchpoint

    screen_shot_2017-07-07_at_5_06.13_pm.png.c9d56c5bca8eaa41f483152b1ba956ea.png

     

    screen_shot_2017-07-07_at_5_06.37_pm.thumb.png.f3467cda4923d95837a9817ffab786c8.png

     

    screen_shot_2017-07-07_at_5_07.18_pm.thumb.png.7959f0ae1b5770613dffc2f6920b7882.png

    In this Touchpoint, we allow users to enter threshold values for both Property Value and Loan-to-Value Ratio and are returned a ranked list of all cities who have properties that satisfy the given constraints. This Touchpoint allows non-analytic users to understand the demographics of the cities in their dataset in a code-free and workflow-free environment. The list of cities can be easily exported as a .csv file to be added to a sales presentation or to populate an email.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...