bill vorhiesSummary: Over 80% of companies are not yet using advanced analytics. Here’s a step-by-step plan to implement a brand new predictive analytics program getting the biggest bang for your buck from the most cost effective investment.

Are You (Your Company) Part of the 80% or the 20%?

No matter how you add up the figures, Gartner reports that at most 20% of companies are engaged in real data science and predictive analytics. And that’s what’s reported by Gartner ‘responders’ who are more analytic-savvy and larger than most. The last definitive figure I saw from Gartner was 1 in 8 or about 12% have actual advanced analytic programs. That means your odds of being part of a company with no advanced analytics at all are probably north of 80% or 85%.

Are You Big Enough to Benefit?

What’s the smallest company that can benefit from advanced analytics? We illustrated in a recent article that even a company with only a few thousand customers could benefit from some types of predictive analytics for only a few thousand dollars (link here). Realistically though, the cost threshold for a real program starts around $750,000 to $1 Million. That’s enough to fund a team of four or five including data scientists and supporting IT staff and analysts.

You make the call. Our estimate is that you need revenue north of $50 Million and probably more like $150 Million to make this commitment. However, we also estimate that you’ll be making multiples of your $1 Million investment in revenue and margin within two years.

Where and How to Get Startedconfused

The greatest barrier to the 80% of non-adopters is simple, they don’t know where or how to get started. There is so much unfamiliar jargon, so much press about multiple exciting new opportunities, and so much perceived risk of starting at the wrong place or taking the wrong direction that it keeps many companies from even starting out.

So to make this as simple as possible for the largest number of companies, we present this Step-by-Step Plan for implementing advanced analytics, to get the biggest return with the smallest reasonable investment.

Here’s the scenario. You are a reasonably experienced Data Scientist and have been hired by XYZ Corp to get them started with advanced analytics. You’ve looked around, and they’re right, they don’t have any. What they have is a decent transactional and BI system with the ability to drill down into historical data and a forecasting and problem solving approach based on MSPITR (most senior person in the room).

Of course, where to start will differ if you’re in a highly specialized business like insurance underwriting, aerospace manufacturing, or big-web-user all digital companies, but to make this as general as possible we’ll assume XYZ has:

  1. $500 Million in sales and 1,000 employees from mostly the same line of
  2. Multiple channels including their own brick and mortar stores, a wholesale channel to retailers, and some B2C digital ecommerce (typical light manufacturing or wholesaler/retailer profile).
  3. Some of the product is manufactured in the US but mostly overseas with longer lead times, and they have about 1,000 SKUs.
  4. Their products encourage repeat or additional purchases so they (want to) have an on-going relationship with their customers.
  5. At least some of their products are sufficiently complex that they have a pretty good size call center for customer service that can double for in-bound and out-bound sales when not helping current customers.

You might be thinking that your new boss would know where he wants to start or at least where they are suffering the greatest pain. But no. He says we have so many opportunities and so little understanding of advanced analytics that we need you, Data Scientist, to lay it out for us and get the greatest and quickest returns with the least investment. Make me a hero.

You briefly consider doing the ‘good consultant’ thing, interviewing LOB and process leaders to build a portfolio of opportunities. That’s not necessarily bad but you know you’re going to have to immediately staff up and equip a data science team of about five (counting you) and the $1 Million start-up cost is going to be seen as a gamble by your boss anyway. No sense getting the LOB and process leaders all fired up only to find they may have to wait until year 2 or 3 before their projects rise to the top.

You’ve seen enough real-world projects to know at least roughly how you’d help in each of these different areas. The real constraint is how to get quick wins with smallest reasonable investment.

playbook1Here’s what I suggest is the first real constraint. You may want a NLP guy, an optimization guy, a big data architect, and a bunch of other specialists on your team but right now, with a team of only four or five you need to focus on one set of core skills that will be self-contained and successful.

Instead of going opportunity by opportunity, I am suggesting you prioritize skill by data science skill.

Phase 1: Predictive Analytics

So what is the analytic skill set your team should start with to return that biggest bang for the buck? In my estimation, start with predictive modeling.

Step 1. Predictive Modeling for Basic Scoring and Classification Models:

DS orgPredictive Modeling is the ability to build scoring and classification models including clustering and segmentation. Here we are focusing on ‘response modeling’ though these same skills can be used for anomaly or fraud detection and a wide variety of forecasting or values estimating.

Keep in mind our premise. Start with the smallest viable team and focus on excellent performance in a single analytic skill set. Your initial team will likely start with an ‘analyst’ who knows the current transactional and BI data. The second member will be a junior data scientist who can do the data-prep for modeling. The third person will be an experienced predictive modeler. The fourth is you as project lead. And the fifth, by the end of the first year will be a model manager responsible for guiding implementation in production systems and monitoring for model refreshes.

Why start here? Cross Sell and Up Sell to Current Customers: XYZ has an on-going relationship with its customers. They are not just one-and-done shoppers. The easiest and least expensive sale you can get is cross sell or up sell to your current customers.

Where to implement? This will vary a lot based on business circumstances but the call center should be an obvious early target. Based on the profile of the caller using existing transactional data for recency, frequency, dollar value, and products purchased, you can build models to determine next most likely product for upsell and cross sell and display appropriate scripts for the CSR in real time while the customer is already on the phone. Similarly you could also model for potential defection of existing customers and use the same approach to promote loyalty and diminish churn.

New Customer Acquisition: Depending on the method of outreach used for the acquisition of new customers, for example catalog mailings, these same techniques will create obvious wins there compared to any other intuitive or non-data science technique.

Step 2. Predictive Modeling with Appended Demographic and Contextual Data:

By the middle of the first year you should be looking to add demographic append data to the customer records. You can purchase this from any number of data brokers or major information houses like Experian. Demographic append data will make your models significantly more accurate and therefore profitable. Be sure to consider the cost and level of effort required to maintain this new data accurately.

Context data can have to do with the method of purchase (in store or on line); it can be geographic, time of year, time of day, or any variety of other variables. Alongside demographic append data, context data will also increase model accuracy. We could have said click stream data from the ecommerce site but we’re intentionally holding off on that one because we would need to add skills to our team or use outside consultants.

When you first start out and don’t have this append data you will necessarily be building ‘good enough’ models. But model accuracy can make big differences in campaign ROI and should be a focus of your activities. See our earlier article “the value of accuracy in predictive modeling”.

Step 3: Uplift Modeling:response_model_lift

This variation on straight response modeling is only valuable if you have a sufficiently large number of customers since it requires modeling both the target and control groups. It prevents you from spending promotional dollars on those who would have purchased even if they were not contacted. The technique is easy to implement if your team firmly understands predictive response modeling, but would not be economic or useful unless you had more than about 100,000 customers. It would be useful for single-item promotions but not for broadcast or multi-offer or catalog mailings. See more about uplift modeling in our article “There is Something New Under the Sun”. However, if your circumstances allow, it can increase promotional ROI 4X to 10X.

Note that between steps 1, 2, and 3 you have probably already saved enough money to cover the start-up expense of your advanced analytics group.

Step 4. Market Basket or Affinity Analysis:market-basket-analysis

This technique can be performed easily by a team already grounded in predictive analytics. In brief, these techniques seek to identify pairs (or triads or baskets) of specific products where the purchase of one is strongly associated with the purchase of the other. Read more about this technique in our article “Affinity Analysis”.

Where to implement?

Brick and mortar stores: This technique is used to guide merchandise location so the products with strong affinity are typically placed together. Similarly, the strongest pull-through products are often located is such a way to cause the shopper to pass by other types of merchandise.

Web sites: Similar to brick and mortar stores, market basket and affinity analysis guide the positioning of product on related pages to increase click through and sales.

Pricing and Discounting: Knowing that two product sets are already strongly associated reduces the need to discount both. This prevents unnecessary discounting and increases gross margin.


The Rest of the Plan

There are 5 Phases to our Step-by-Step Plan. In our continuation article we will walk you through:

Phase 2: Geospatial Analytics

Phase 3: NoSQL (Hadoop) Based Analysis of Varied Data Types

Phase 4: Forecasting, Optimization, Simulation

Phase 5: All the Rest


October 5, 2015

Bill Vorhies, President & Chief Data Scientist – Data-Magnum – © 2015, all rights reserved.

About the author: Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. Bill is also Editorial Director for Data Science Central. He can be reached at: or