Containers and the CLDB

MapR Filesystem stores data in abstract entities called containers that reside on storage pools. Each storage pool can store many containers. Blocks enable full read-write access to MapR file system and efficient snapshots.

An application can write, append, or update more than once in MapR file system, and can also read a file as it is being written. In other Hadoop distributions, an application can only write once, and the application cannot read a file as it is written.

On average, a container size is 10-30 GB. The default container size is 32GB. Large number of containers allow for greater scaling and allocation of space in parallel without bottlenecks.

Described from the physical layer:
  • Files are divided into chunks.
  • The chunks are assigned to containers.
  • The containers are written to storage pools, which are made up of disks on the nodes in the cluster.
The following table compares the MapR Filesystem storage architecture to the HDFS storage architecture:
Storage Architecture HDFS MapR Filesystem
Management layers Files, directories and blocks, managed by Namenode. Volume, which holds files and directories, made up of containers, which manage disk blocks and replication.
Size of file shard 64MB block 256MB chunk
Unit of replication 64MB block 32GB container
Unit of file allocation 64MB block 8KB block

The MapR file system automatically replicates containers across different nodes on the cluster to preserve data. Container replication creates multiple synchronized copies of the data across the cluster for failover. Container replication also helps localize operations and parallelizes read operations. When a disk or node failure brings a container’s replication levels below a specified replication level, MapR file system automatically re-replicates the container elsewhere in the cluster until the desired replication level is achieved. A container only occupies disk space when an application or program writes to it.

The CLDB (Container Location Database) maintains information about the location of every container in the cluster, defines the container precedence in the replication chain, and organizes container content updates across the replication chain. It runs as a system of independent servers, only one of which is a master at any time.

The MapR Filesystem and other services (such as NFS Gateway and POSIX) send heartbeat (HB) messages to the master CLDB. The CLDB is registered with ZooKeeper and the master CLDB to ZooKeeper connection is kept alive by sending a probe message every few seconds. The CLDB service tracks the location of every container and uses the HB messages to determine the state of a container and thereby all containers on that node. The CLDB actively participates in the failover of a node in the event of a node failure.