Polyglot Persistence

Six years ago, the term “polyglot programming” was coined to describe the idea that software applications should be written in a mix of different languages—meaning that programmers should use the right language for the specific job at hand. In the big data field, a new term has emerged: “polyglot persistence.”

Today, most large companies are using a variety of different data storage technologies for different kinds of data. A lot of companies still use relational databases to store some data, but the persistence needs of applications are evolving from predominantly relational to a mixture of data sources. Polyglot persistence is commonly used to define this hybrid approach. Increasingly, architects are approaching the data storage problem by first figuring out how they want to manipulate the data, and then choosing the appropriate technology to fit their needs. What polyglot persistence boils down to is choice—the ability to leverage multiple data storages, depending on your use cases.

Your choice of data store really depends on two key factors:

  1. the type of data you’re working with, and
  2. the sort of workload.

If you’re working on an e-commerce solution, you may find it best to use a key-value store for the shopping cart, but then leverage completely different solutions for storing transactional data, caching session information, and analyzing purchasing behavior. Is it a quick key-driven look-up? Do you need to scan and aggregate data over many records? Do you have ad-hoc queries? Do you need to perform timed, repeated one-offs that run in batch mode? Or is low-latency your primary concern?

The following figure illustrates that the data volumes, varieties and velocities we have to deal with are really a special (and simple) case of big data, rather than the other way around:

Here at MapR, we have applied the concept of Polyglot Persistence to our MapR Converged Data Platform. Our platform was designed to solve problems with inherent volumes and variety of data, including a mixture of unstructured and structured data. One of the distinct advantages of the MapR Platform is that data can be ingested as a real-time stream; analysis can be performed directly on the data, and automated responses can be executed. With MapR, a broad set of applications can be supported, including mission-critical applications that require a depth and breadth of enterprise-grade Hadoop features.

Using the most suitable storage for your use case will lead to better fits and fewer problems in the future with data management, scalability and performance. By choosing the MapR Converged Data Platform, your organization can achieve a reliable Hadoop deployment that is easy to manage, monitor and scale throughout its life cycle. And in turn, you’ll be able to analyze larger data sets, get faster analytical results, and save money.

To learn more about Polyglot Persistence, read the book NoSQL Distilled by Pramod J. Sadalage and Martin Fowler.