Recently I’ve been making my way through the literature about “real time analytics”, “in-stream analytics”, and “in-memory analytics”. These are all category names created by Gartner and readily adopted by any number of developers. The thing that gives me heartburn about this developer-driven hyperbole is the implication that suddenly the creation of brand new insights using advanced analytics has become “real time”. Humbug.
It’s time to apply a little rigor to this widely held claim that ‘fast data’ leads to ‘fast action’, like words going in one ear and out the other without a little brain in between.
So let’s differentiate between time-to-action and time-to-insight.
First of all this is frequently and incorrectly shown as a single action with data in one side and action out the other. There are really two completely different tasks here with very different time frames.
Time-to-Action is about your relationship with your customer. Typically the customer-generated trigger comes in one side, enters your transactional platform where it may be scored, next best offers formulated, recommenders updated, or any number of other automated tasks may take place. Based on these predetermined routines the desired and pre-planned action comes out, not infrequently in an action that is fed back to the customer. Time-to-Action can indeed be real time, right down to milliseconds and microseconds.
However pay careful attention to what the previous paragraph said. These actions and the algorithms that created them were preplanned. That is, some team of smart data scientist spent considerable time and effort to explore the data, clean, transform, and perhaps normalize it, selected features, then built a number of models until one proved to their satisfaction that it was sufficiently robust to be implemented in the transactional system. That is Time-to-Insight, and that is decidedly not real time.
The Gray Area
There are some tasks in time-to-insight that have been very significantly speeded up. Not the original discovery and creation of patterns but the update activities like data cleaning, transforms, and scoring data. As far back as “in-database analytics” (sounds like ancient history doesn’t it?), we’ve been pushing these repetitive activities back toward the database so that users could consume the data directly without it having to pass continuously through our analytic platform.
That was a minor revolution. Today we have a major revolution, “in-memory analytics”. Thanks to rapidly falling costs of SSD and DRAM there is a huge movement toward in-memory platforms that encompass not only the analytic tools but also pretty much the whole database. These are really amazing pieces of technology with all the analytics and all the ‘hot data’ held in either core DRAM or SSD. It hasn’t been that many years ago that I would leave my machine on overnight while my predictive analytics routines ran, ready for me the next morning. And not much more recent that I could at least go out for lunch and come back. Now I can’t even get up for a cup of coffee before my millions of rows and thousands of features have been processed. It’s very, very fast, but it’s not millisecond fast. Nor could you derive these models without me, the data scientist in the picture, and I am surely not millisecond fast.
Some Repetitive Tasks Can be “Nearly” Automated
So we learned to write routines that cleaned and transformed data, and we’ve always had scoring algorithms that could be included in operational systems. Once the type of cleaning and transforms that were required were well understood, that along with scoring could be made fast enough for real-time processing. That meant that scores could be updated on the fly with the new transactional data as it arrived.
It remains an open question whether this level of complexity and effort is worth the gain. The vast majority of users are still updating their data, scores, and even recommenders overnight or even less frequently and don’t seem to be suffering for it. But power users are going for the new tech.
Another type of semi-discovery activity that can be made near real time is model refresh. Once the model is established, the data well understood, and the rate at which the data ‘drifts’ is defined then refreshing means little more than rerunning the models with the new data to update the coefficients.
There are indeed advanced analytic power users in financial services, telecom, and ecommerce that have implemented hundreds or even thousands of predictive models all of which need periodic refresh. In-Memory platforms with the new incoming transactional data and the fully enabled analytic routines can in fact refresh models and rescore data in real time. Some vendors like SAS have given these utilities names like ‘Model Factories’ which seems fitting. Once again however, it doesn’t include the original time-to-insight.
In the spirit of full disclosure we also ought to talk about simple unsupervised learning techniques like segmentation, association, or some recommenders based on association rules. Can’t these be fully automated and therefore run in real time? And the full answer is yes, but only after the data has been cleaned, transformed, normalized, and the features selected. So the first time around, that is Time-to-Insight, can never be real time since that human data scientist is still required.
Read our earlier articles on Stream Processing, What Is It and Who Needs It, and How it Works. It’s an amazing technology since it performs very significant logical processes on data before it even gets to memory which is the very definition of real time. Take action on data before there is even any I/O at all. And Stream Analytics can house advanced analytic algorithms, any of them, that score or recommend or segment, or even judge and categorize the sentiment of passing social media posts. That is, if the insight had first been discovered and loaded into the Event Stream Processing Platform.
So in pursuit of clarity and to tamp down the hype a little bit, be sure to differentiate between Time-to-Action and Time-to-Insight because these are quite different activities. Time-to-Action can indeed be real time, even without some of these new technologies. Time-to-Insight still requires a smart and curious human to make and test the discovery before it can be put to use. Will Time-to-Insight ever be real time? Not likely.
November 11, 2015
Bill Vorhies, President & Chief Data Scientist – Data-Magnum – © 2015, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. Bill is also Editorial Director for Data Science Central. He can be reached at: