MapR Distributed File and Object Store

MapR Distributed File and Object Store Cloud-Scale Data Store is the industry’s only exabyte scale data store for building intelligent applications with the MapR Data Platform.


We are in the midst of a once in a 30-year infrastructure shift. Organizations must be able to combine legacy operational data and analytical data to create intelligent applications. Yesterday’s storage and data management technologies were not designed to take advantage of distributed computing environments, cloud infrastructures, containers and virtualization, and IoT. Additionally, the exponential growth of data volumes and rigid infrastructures make it difficult to move data and integrate analytics with operational processes, effectively creating data silos. These silos make it challenging to derive meaning and intelligence from the data and can lead to high costs of processing and storing data. These costs only increase when data volumes grow. A new approach is required: intelligent applications that automate real-time operational decisions on the basis of applying deep analytical insights.

MapR Distributed File and Object Store is a high-scale, reliable, globally distributed data store that creates a data fabric for managing files and containers. MapR Distributed File and Object Store supports the most stringent speed, scale, and reliability requirements across multiple edge, on-premises, and cloud environments. MapR Distributed File and Object Store makes it easy to store any data at exabyte scale and supports trillions of files, provides enterprise-grade features to be the system of record for large global enterprises, and uniquely combines analytics and operations into a single platform enabling intelligent application development.

MapR Distributed File and Object Store is software for building intelligent applications with the MapR Data Platform. MapR Distributed File and Object Store includes the MapR multi-temperature Global Namespace and data management in the form of security, compression, snapshots, multi-tenancy, and self-healing. MapR Distributed File and Object Store is delivered via either flash or disk.

Why MapR Distributed File and Object Store?

Why MapR Distributed File and Object Store?

Learn More

From the Blog - Introducing MapR Distributed File and Object Store: Cloud-Scale Data Store

Today’s storage and data management technologies were not designed to take advantage of distributed computing environments, cloud infrastructures, containers and virtualization, and IoT. Therefore, a need arises for a new kind of data platform. Intelligent applications that automate real-time operational decisions on the basis of deep analytical insights are necessary. This is where MapR Distributed File and Object Store comes in as the solution.

Read the blog post

Case Study - SAP Digital Interconnect

SAP Digital Interconnect chose MapR Distributed File and Object Store because they required a modular, componentized architecture with implicit Hadoop features and enhanced performance that could run on commodity hardware or in the cloud. Cost was an important factor. "MapR Distributed File and Object Store provided us with a huge amount of storage for a fairly low cost, which was ultra-critical for us," says Joe Love, Senior Staff Infrastructure Engineer.

Read the case study

10 Reasons Why Customers Choose MapR Distributed File and Object Store

1. Massive Scalability
2. Global Namespace
3. Support Different Types of Data
4. Automated Data Placement
5. Deploy Anywhere

6. Reliability and High Availability
7. Multi-Tenancy and Security
8. Flexibility to Optimize for Speed and Cost
9. Naturally Analytics Ready
10. High Speed Ingest and Data Processing

Get more details


HDFS vs. MapR-FS – 3 Numbers for a Superior Architecture

Ted Dunning, Chief Application Architect at MapR, talks about the architectural differences between HDFS and MapR Distributed File and Object Store's underlying technology that boil down to three numbers.

Watch the video
Comparing MapR-FS and HDFS NFS and Snapshots

This demo by Bruce Penn, Principal Solution Architect at MapR, compares NFS and Snapshots between MapR FS and HDFS.

Watch the video
Driving Better Business Benefits with Hadoop: A Comprehensive Platform Approach

What if you could get over $3 back for every $1 you invest in big data technology? Research by IDC shows that big data ROI can be huge, at an average of 382% 3-year ROI for the organizations that were studied.

Watch the video

MapR Distributed File and Object Store architecture diagram

Data Management

Global Namespace

The MapR Global Namespace provides a consolidated view into files that are in different physical locations. It offers simple data access management especially in large scale multi clusters and multi-tenant environments.

Data Replication

MapR Data Replication is a mechanism to maintain copies of data distributed across locations providing data protection. The MapR method of replication ensures there is no data loss or access loss during hardware failures.


Topology is a unique MapR feature that allows end users to determine where to place replicated copies of data by maintaining details on the location of nodes and racks in the cluster. This ensures reliability and efficient data placement across the cluster.


MapR uses memory caching for both data and metadata, which allows for maximizing performance. Caching can be tuned by the end user to improve performance.

Auto Tiering

MapR Auto Tiering maintains the lifecycle of data and promotes/demotes the data onto different storage tiers backed by different storage media. This allows for efficient usage of cluster capacity and data management. Proper management and placement of data reduces overall total cost of ownership (TCO).

Intelligent Policy Management – Volumes and Mirroring

The volume is an entity of management in MapR. Volumes can be created on a user basis, department basis, project basis, or as required by the business. MapR offers volume level snapshots, mirroring for data recovery and protection functionality. Volume management and mirroring offers multi-tenancy and segregation while ensuring data protection. Mirrored volumes can be created between clusters within the same datacenter, across data centers, or even across on-premises and the cloud.

Storage Pools and Automatic Load Balancing

MapR storage architecture follows a concept of storage pools, where disks are grouped together to form a pool. MapR distributes volumes across the storage pools. Maintaining diverse storage pools backed by different storage media offers the ability to host applications with varying capacity, performance, data protection capabilities.

Enterprise-Grade Reliability

MapR provides enterprise-grade reliability by ensuring three copies are maintained for both metadata and data and distributed throughout the cluster. Using Topology awareness, MapR ensures that copies are maintained across racks. Any hardware failure will not impact data availability or access.



MapR offers a wide variety of encryption. Encryption is on the wire so that data is safe even before it reaches the destination. This is differentially better than data at rest encryption where it is encrypted after the raw data reaches the destination. MapR also offers Kerberos, which adds a second level of extra security especially for very sensitive confidential data.


MapR authentication ensures that the identity of the end user is known reliably in the network. MapR supports two methods of authenticating a user and generating a ticket: a username/password pair and Kerberos. MapR uses the ticket to identify the user and make authorization decisions.


MapR authorization restricts an authenticated MapR user’s capabilities on the system. MapR supports ACLs (Access Control Lists), for regulating user privileges to the job queue and cluster. MapR also uses ACLs to control administrative access to volumes.

Access Control Expressions

MapR offers another powerful unique model of authorization in the form of access control expressions (ACEs). ACEs can be used to control access to MapR tables, files, directories, volumes, and streams. ACEs can be used to define whitelists (grant access) and blacklists (to deny access) for a combination of users, groups, and roles.

Data Growth

Globally Distributed Scale

The distributed, no LUN, no volume, fundamental approach of the MapR platform ensures that there are no logical limitations to achieve scale. This allows for seamless, no disruption growth of the cluster. MapR scales to thousands of hosts and clients across multiple racks and can even expand across geo locations and datacenters. Any expansion to an existing cluster can be achieved online with no downtime to ongoing production.

Inline Compression

MapR offers inline compression where data is reduced while it arrives and before it is stored onto the disks. This offers a lower CAPEX since you need only purchase storage capacity to store the unique data as opposed to storing duplicate data before it is compressed.

Thin Provisioning

MapR offers thin provisioning by default, which means capacity is used as and when it is needed. No extra overhead is needed to pre-allocated storage.

Quota Provisioning

MapR allows for controlled method of volume creation and management. Quotas are a way to limit the space used by a volume or an entity (user group) by specifying the amount of space a volume is allowed to use. This ensures that capacity is being utilized among all the various tenants and also offers resource isolation.

Data Recovery

Extreme Resiliency

MapR offers extreme resiliency at various levels. The MapR approach of distributing and replicating the metadata across the cluster ensures extreme resiliency since such a layout eliminates the risk of data loss during a failure.

Instant Snapshots

MapR Snapshots are read-only volume level snapshots. Snapshots can be taken manually or set up on an automated snapshot schedule. Creating snapshots take up no time and does not take up disk space initially since only the change deltas are stored.

Some of the prominent and notable benefits of the MapR implementation of snapshots include:

  • Rollback from errors - Whether from an end-user error or analytical applications manipulating the data, MapR allows for recovery to a well-known state.
  • Hot backups - MapR snapshots can easily and quickly be backed up.
  • Managing real-time analysis - For real-time applications, where data is being fed into the MapR cluster continuously, MapR snapshots provides perfect comparisons without affecting the real-time nature of the data.

MapR Snapshots are implemented using the redirect-on-write method where the snapshot is taken but is atomic and consistent.


MapR provides self healing from multiple simultaneous hardware failures, where it reconstructs the data from copies, allowing cluster availability at all times. When a cluster expands, the upgrade and self-healing times go down since all nodes in the cluster participate in the process thereby reducing the time considerably.

Data Insight

Open Interfaces

MapR offers a flexible platform to manage diverse types of data. By supporting the following protocol interfaces, a wide range of applications that dictate diverse characteristics and metrics can all be hosted on a single platform. This eliminates datacenter sprawl, silos of environments and overall TCO reduction.

  • NFS – NFSv3 (support for NFSv4 coming soon) is supported along with the benefits of a distributed file system and a single global namespace.
  • HDFS APIs – Data can be brought in directly through NFS and accessed/analyzed through HDFS using the published HDFS APIs.
  • POSIX – MapR supports POSIX and FUSE-based POSIX clients. This client software allows app servers, web servers, and client nodes to read/write data directly to a MapR cluster.

Flash Performance Optimization

MapR Distributed File and Object Store is flash optimized to get the most out of the core components of the cluster. Multiple instances of the MFS file server are run in the cluster. Each instance runs on a single node as a single process. Multiple instances are a way to maximize the performance characteristics of the underlying storage system. Thereby, the maximum benefit of configuring Multiple MFS instances can be achieved on an all-flash platform.

Deploy and Execute Anywhere

MapR has the unique ability to be hardware agnostic and Hypervisor/OS agnostic. Combined with its powerful interfaces, MapR can host multiple varied workloads regardless of the hardware on which it is running and regardless of which operating system is hosted. MapR has a broad ecosystem of hardware and Hypervisor/OS it can run on thereby allowing the freedom of choice to the customer on the MapR platform.


What is MapR Distributed File and Object Store?

Fabian Wilckens, EMEA Solutions Architect at MapR, discusses some of the key themes, including "real-time" and "standard interfaces," that...

Watch the video
Whiteboard Walkthrough - Handling Disk Failure in MapR Distributed File and Object Store

Abizer Adenwala, Technical Support Engineer at MapR, walks you through what a storage pool is, why disks are striped, reasons disk would be marked as failed, what happens when a disk is marked failed.

Watch the video
3 Benefits of Multi-Temperature Data Management for Data Analytics

SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data...

Watch the video

Interested in MapR Distributed File and Object Store?

Email us at