Summary: Driven by the need for speed in IoT and Stream Processing the future is arriving a little ahead of schedule. And the marriage of or at least erasing the distinction between RDBMS and NoSQL is about to commence.
Just a few months ago in May we wrote about the coming marriage of NoSQL and RDBMS predicted by Gartner to occur by 2017 as being such a strange and unexpected thing. If it was true we saw it as a collision between surprisingly different technologies. After all, RDBMS had never scaled horizontally with the ease of NoSQL and the shared nothing MPP architecture was looking pretty attractive. Fair to say that the NewSQL camp had occupied this space from the beginning but didn’t seem to be making much headway. Then everything changed.
Perhaps I just wasn’t looking hard enough but one of the driving factors I’d overlooked was the coming cost breakeven between HDD and SSD.
This 2013 study showed SSD rapidly closing in on HDD for cost. There was some disagreement about when breakeven would occur but in May SanDisk announced that their next gen SSD drive would breakeven with the cost of HDD. Even without true breakeven the advantage of in-memory processing was becoming immediately apparent being at least 20X faster and more reliable.
And while SSD is important, it’s really core DRAM that’s the key to this in-memory marriage. DRAM is still more expensive, but like SSD its cost is coming down rapidly.
So even before cost breakeven, over at least the last two or three years early adopting developers had begun touting their in-memory DBs as outright replacements for our server farms of disk drives. SAP HANA was one of the first majors to propose outright replacement of transactional databases with in-memory HANA and is having surprising success in getting their clients to migrate.
Analytic platform maker SAS offered an all in-memory analytic platform (Visual Analytics and Visual Statistics) about two years ago. And now we see almost all the NewSQL crew including VoltDB and MemSQL moving to in-memory. So what’s going on?
IoT and Stream Processing
The first driver is the final arrival of IoT. After many months of hype there are now enough sensors creating time-driven data streams to make a difference. Second is social media. Social media streams have become increasingly relevant in managing customer relationships and are generating volumes of time-driven streaming data. And finally, general commerce but especially ecommerce has begun to recognize the need for real-time response with advance analytics in the range of well under one second.
All of these data streams have driven the adoption of stream processing and in-stream analytics, and guess what; Stream Processing relies on in-memory speed. See our recent articles on Stream Processing Basics and Stream Processing – How it Works.
While Stream Processing has been the basis for providing analytic value in sub-second response times, the truth is that while it needs some in-memory storage, it doesn’t need all that much. Most folks are letting the data stream through the Event Stream Processing steps as soon as it arrives, and then letting the data persist in conventional HDD data bases. To leverage off this phenomenon, many developers of In-Memory DBs are either packaging Stream Processing with their offering or making implementation very easy.
Hybrid Transactional Analytic Data Bases (HTAP)
There’s a lot of change in our industry going on in real time here, by which I mean the capabilities we’re used to are changing around us at a rapid rate. Certainly the confluence of cost effective in-memory DRAM and SSD data bases and real time stream processing have let the visionary architects wonder, why not just keep it all in memory all the time.
The result is something becoming known as the Hybrid Transactional Analytic Data Base (HTAP). The name is pretty new so we’ll see if it holds. The concept though is quite radical. A single DB based on SSD or SDRAM that would hold (pretty much) all the data all the time and be able to perform transactional and analytic tasks simultaneously. Yes, this eliminates separate transactional and analytic data stores.
All of the majors now offer one: IBM, SAP, Oracle, Microsoft, and many of the contenders including much of the NewSQL group including Aerospike, DataStax, Kognitio, MemSQL, Starcounter, Teradata, and VoltDB.
These HTAPs scale horizontally in MPP / shared nothing architecture and are fully ACID compliant. So driven by the need for speed in IoT and Stream Processing the future is arriving a little ahead of schedule. And the marriage of, or at least erasing the distinction between RDBMS and NoSQL is about to commence.
November 5, 2015
Bill Vorhies, President & Chief Data Scientist – Data-Magnum – © 2015, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. Bill is also Editorial Director for Data Science Central. He can be reached at: