Polyglot Persistence Shapes Big Data Solutions

April 30, 2013 | BY Michael Hausenblas

In 2011 Martin Fowler coined the term Polyglot Persistence, suggesting in a nutshell:

... any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.

More recently, Mark Madsen and Robin Bloor discussed the topic in a webcast, along with the Bloor Group's Database Revolution white paper. Another good source for lessons learned and examples is an IEEE Software hosted podcast, the Episode 189 of the Software Engineering Radio: Eric Lubow on Polyglot Persistence.

Make no mistake, Polyglot Persistence as a meme has a direct impact on how you design and implement solutions for large-scale data processing. Moreover, it will influence the way you think about the tools you deploy. Rather than the one-size-fits-it-all mantra we've been injected by the Oracles and the likes over the past ten or more years, we now should consider dealing with a tool-belt. And, as an architect it is your responsibility to select the right combination of tools for the tasks at hand. The Hadoop ecosystem offers many options.

What you choose may depend on the type of data you're dealing with (such as a customer's shopping basket vs. a financial transaction) as well as on the sort of workload. Is it a quick key-driven look-up? Do you need to scan and aggregate data over many records? Do you have ad-hoc queries? Or rather timed, repeated one-offs that run in batch mode? Is low-latency you primary concern? And of course, as always, all the tooling should not only be available at scale, in the Petabytes and beyond, but must be reliable and high performance.

Look at the following figure and it may get clearer that the data volumes, varieties and velocities we had to deal with so far are really a special (and simple) case of Big Data, rather than the other way round:

Here at MapR we appreciate Polyglot Persistence and have in fact already aligned our Big Data platform in this sense. One day, I suppose, the majority of the platforms out there will be compatible with Polyglot Persistence. We already today enable you to benefit from the combination of Open Source-based agreement on the interfaces and the enterprise-grade implementation, delivering a reliable and fast solution:

If you are interested in reading more of the collected wisdom around Polyglot Persistence, including examples and support in decision making, it is now available in the book titled NoSQL Distilled, by Pramod J. Sadalage and Martin Fowler.