MapR Filesystem

Discusses the features of the MapR file system, and provides a comparison with the Hadoop Distributed File System (HDFS).

The MapR Data Platform provides a unified data solution for structured data (tables) and unstructured data (files).

MapR Filesystem is a random, read-write distributed file system that allows applications to concurrently read and write directly to disk. The Hadoop Distributed File System (HDFS), by contrast, has append-only writes and can only read from closed files. As HDFS is layered over the existing Linux file system, a large number of input/output (I/O) operations decrease the cluster’s performance. MapR Filesystem also eliminates the Namenode associated with cluster failure in other Hadoop distributions, and enables special features for data management, and high availability.

The storage system architecture used by MapR Filesystem is written in C/C++ and prevents locking contention, eliminating performance impact from Java garbage collection.

The following table highlights some of the features of the MapR Filesystem:
Feature Description
Storage pools A group of disks to which the MapR file system writes data.
Containers An abstract entity that stores files and directories in the MapR file system. A container always belongs to exactly one volume, and can hold namespace information, file chunks, or table chunks for that volume.
CLDB A service that tracks the location of every container.
Volumes A management entity that stores and organizes containers. Used to distribute metadata, set permissions on data in the cluster, and for data backup. A volume consists of a single name container, and a number of data containers.
Direct Access NFS Enables applications to read and write data directly on to the cluster.
POSIX Clients The loopbacknfs, and FUSE-based POSIX clients connect to one or more MapR clusters, and allow app servers, web servers, and applications to write data directly, and securely to the MapR cluster.