Data Tiering with MapR

EXECUTIVE SUMMARY

Data tiers on MapR let you store, manage, and analyze data in different tiers, based on performance, cost, and capacity trade-offs, regardless of the underlying physical storage infrastructure. All your data has different characteristics and thereby mandates different requirements from the underlying data platform. With MapR data tiers, you can segregate data and easily balance between performance, cost, and capacity requirements.

MAPR DATA TIERS

  • Manage data life cycle in different data tiers
  • Choose between performance, capacity, and cost-optimized tiers
  • Use policies to enforce SLAs and move data across tiers
  • Match your storage media of choice with tiers to maintain hierarchical data management

STORE, MANAGE, AND ANALYZE YOUR DATA INTELLIGENTLY

We are currently in the exabyte era, where most of the data has been generated in the recent few years or is being generated on a daily basis. More and more organizations are embracing a continuous analysis model for their business decisions, which requires them to handle different types of data with varying SLAs. As organizations store more and more historical data, the characteristics of data change. Current data that is active will transition to not being actively used over time. Data platforms must be equipped to offer several capabilities to handle the data life cycle.

MapR data tiers allow you to store, manage, and analyze your ever-growing data, based on different SLAs. MapR introduces a three-tiered approach to placing and managing data:

MapR data storage tiers diagram

KEY BENEFITS

  • One solution for different categories of data
  • Easy management of data across its life cycle, while maintaining SLAs
  • Automated placement and movement of data across tiers
  • TCO benefits by leveraging cheap object storage for inactive data
  • Monitoring and optimizing space utilization

OPTIMIZE FOR AVAILABILITY AND PERFORMANCE

Use replicas to protect and spread your extremely active data across the cluster, knowing it’s always available. Associate the tier with all-flash performance to achieve the maximum performance and availability. For example, an organization building a machine learning (ML) pipeline will invariably have large volumes of training data that are frequently accessed and updated for building initial and subsequent models. Depending on the rate with which the training data is updated, replicating and storing it on a high-performance tier will accelerate ML training jobs, resulting in faster building of models.

MAXIMIZE FOR CAPACITY EFFICIENCY

Apply erasure coding data protection while maximizing the capacity efficiency. Cost to store large volumes of data can vary widely, but with a MapR capacity tier, you can plan on reducing your overall cost in managing data. Highlighting the same ML example, using large volumes of training data will require efficient storage, while still allowing continuous updates and ingestion of new data.

LEVERAGE FOR COST AND LONG-TERM ARCHIVING

Use the MapR integration with S3-Compatible APIs for storing data long-term. Typically used for archival purposes, a MapR archive tier allows you to move data to the cloud or any S3-Compatible cheap store with the ability to bring back the data into an active, operational mode quickly.

MapR data placement management diagram

MANAGE PLACEMENT AND MOVEMENT OF DATA WITH POLICIES

Easily migrate and manage movement of data across the tiers, based on policies. For example, you can move data if it is older than a certain number of days.

MATCH YOUR STORAGE WITH THE TIERING LIFE CYCLE

Associate any tier with all-flash performance or spinning disks to form a combination of highly performant or highly dense tiers. For long-term archiving, tier data to any S3-compatible object store, be it a public cloud or an on-premises third party vendor supported by MapR, for a highly efficient configuration. MapR allows a mix-and-match of these tiers within a single cluster, simplifying provisioning and management of data.

Download PDF