bill vorhies

Summary:  Yes it’s a real phrase and it’s the secret to picking the right NoSQL database.

You can drop this phrase at your next Big Data tech meeting:  “Polyglot Persistence”.  Yes it’s a real thing.  Polyglot means speaking in many languages but in Big Data it means picking the right NoSQL DB for the right application.

If you’re already deep into Big Data then you’ve probably figured this out intuitively but if you’re just getting started you may not yet have realized that there is no ‘one best choice’ for all cases.  In fact, the major takeaway here is that you’re going to want several different types of NoSQL (Key Value, Document, Columnar, and Graph) and in some cases even more than one version of the same type depending on your actual use case.

The early adopters already figured this out.  Going back a couple of years, in addition to RDBMS Disney was using Cassandra, Hadoop, and Mongo.  Netflix was using Cassandra, Hbase, and SimpleDB.  Twitter was using Cassandra, FlockDB, Hbase, and MYSQL.  Mendeley was using Hbase, Mongo, Solr, and Voldemort.

Multiple NoSQL DBs is the rule, not the exception.  So how do you begin to focus in on the NoSQL type best suited for your project?  Here’s what Polyglot Persistence might look like today:

Functionality Considerations DB Type
User Sessions Rapid Access for reads and writes.  No need to be durable. Key-Value
Financial Data Needs transactional updates.  Tabular structure fits data. RDBMS
POS Data Depending on size and rate of ingest.  Lots of writes, infrequent reads mostly for analytics. RDBMS (if modest), Key Value or Document (if ingest very high) or Column if analytics is key.
Shopping Cart High availability across multiple locations.  Can merge inconsistent writes. Document, (Key Value maybe)
Recommendations Rapidly traverse links between friends, product purchases, and ratings. Graph, (Column if simple)
Product Catalog Lots of reads, infrequent writes.  Products make natural aggregates. Document
Reporting SQL interfaces well with reporting tools RDBMS, Column
Analytics Large scale analytics on large cluster Column
User activity logs,  CSR logs, Social Media analysis High volume of writes on multiple nodes Key Value or Document

 

Where to start?  The answer is as always: test, test, test.

The good news is that acquiring a NoSQL DB is relatively cost effective if you’re going for a Hadoop distribution.  These are generally in the low to mid five figures so much less on an annual basis than the cost of one software engineer.  Of course you will have to have qualified staff with these skills and that may mean one person or many depending.

And if you use one of the major Hadoop distributors then you’ll most often get at least Key Value, Column, and Graph as part of the same distribution package with SQL friendly Drill and Spark as well.

Based roughly on MapR's distribution.

(Based roughly on MapR’s distribution.)

Still, you need to pick a project to start with and that can be whatever has the highest business value.  But as your experience with NoSQL grows, plan on having several NoSQL DB types to facilitate your business objectives.

 

May 7, 2015

Bill Vorhies, President & Chief Data Scientist – Data-Magnum – © 2015, all rights reserved.

 

About the author:  Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

Bill@Data-Magnum.com

 

Tags: , , , , , ,