MapR-XD Cloud-Scale Data Store is the industry’s only exabyte scale data store for building intelligent applications with the MapR Converged Data Platform.
We are in the midst of a once in a 30-year infrastructure shift. Organizations must be able to combine legacy operational data and analytical data to create intelligent applications. Yesterday’s storage and data management technologies were not designed to take advantage of distributed computing environments, cloud infrastructures, containers and virtualization, and IoT. Additionally, the exponential growth of data volumes and rigid infrastructures make it difficult to move data and integrate analytics with operational processes, effectively creating data silos. These silos make it challenging to derive meaning and intelligence from the data and can lead to high costs of processing and storing data. These costs only increase when data volumes grow. A new approach is required: intelligent applications that automate real-time operational decisions on the basis of applying deep analytical insights.
MapR-XD Cloud-Scale Data Store is a high-scale, reliable, globally distributed data store that creates a data fabric for managing files and containers. MapR-XD supports the most stringent speed, scale, and reliability requirements across multiple edge, on-premises, and cloud environments. MapR-XD makes it easy to store any data at exabyte scale and supports trillions of files, provides enterprise-grade features to be the system of record for large global enterprises, and uniquely combines analytics and operations into a single platform enabling intelligent application development.
MapR-XD is software for building intelligent applications with the MapR Converged Data Platform. MapR-XD includes the MapR multi-temperature Global Namespace and data management in the form of security, compression, snapshots, multi-tenancy, and self-healing. MapR-XD is delivered via either flash or disk.
The MapR Global Namespace provides a consolidated view into files that are in different physical locations. It offers simple data access management especially in large scale multi clusters and multi-tenant environments.
MapR Data Replication is a mechanism to maintain copies of data distributed across locations providing data protection. The MapR method of replication ensures there is no data loss or access loss during hardware failures.
Topology is a unique MapR feature that allows end users to determine where to place replicated copies of data by maintaining details on the location of nodes and racks in the cluster. This ensures reliability and efficient data placement across the cluster.
MapR uses memory caching for both data and metadata, which allows for maximizing performance. Caching can be tuned by the end user to improve performance.
MapR Auto Tiering maintains the lifecycle of data and promotes/demotes the data onto different storage tiers backed by different storage media. This allows for efficient usage of cluster capacity and data management. Proper management and placement of data reduces overall total cost of ownership (TCO).
The volume is an entity of management in MapR. Volumes can be created on a user basis, department basis, project basis, or as required by the business. MapR offers volume level snapshots, mirroring for data recovery and protection functionality. Volume management and mirroring offers multi-tenancy and segregation while ensuring data protection. Mirrored volumes can be created between clusters within the same datacenter, across data centers, or even across on-premises and the cloud.
MapR storage architecture follows a concept of storage pools, where disks are grouped together to form a pool. MapR distributes volumes across the storage pools. Maintaining diverse storage pools backed by different storage media offers the ability to host applications with varying capacity, performance, data protection capabilities.
MapR provides enterprise-grade reliability by ensuring three copies are maintained for both metadata and data and distributed throughout the cluster. Using Topology awareness, MapR ensures that copies are maintained across racks. Any hardware failure will not impact data availability or access.
MapR offers a wide variety of encryption. Encryption is on the wire so that data is safe even before it reaches the destination. This is differentially better than data at rest encryption where it is encrypted after the raw data reaches the destination. MapR also offers Kerberos, which adds a second level of extra security especially for very sensitive confidential data.
MapR authentication ensures that the identity of the end user is known reliably in the network. MapR supports two methods of authenticating a user and generating a ticket: a username/password pair and Kerberos. MapR uses the ticket to identify the user and make authorization decisions.
MapR authorization restricts an authenticated MapR user’s capabilities on the system. MapR supports ACLs (Access Control Lists), for regulating user privileges to the job queue and cluster. MapR also uses ACLs to control administrative access to volumes.
MapR offers another powerful unique model of authorization in the form of access control expressions (ACEs). ACEs can be used to control access to MapR tables, files, directories, volumes, and streams. ACEs can be used to define whitelists (grant access) and blacklists (to deny access) for a combination of users, groups, and roles.
The distributed, no LUN, no volume, fundamental approach of the MapR platform ensures that there are no logical limitations to achieve scale. This allows for seamless, no disruption growth of the cluster. MapR scales to thousands of hosts and clients across multiple racks and can even expand across geo locations and datacenters. Any expansion to an existing cluster can be achieved online with no downtime to ongoing production.
MapR offers inline compression where data is reduced while it arrives and before it is stored onto the disks. This offers a lower CAPEX since you need only purchase storage capacity to store the unique data as opposed to storing duplicate data before it is compressed.
MapR offers thin provisioning by default, which means capacity is used as and when it is needed. No extra overhead is needed to pre-allocated storage.
MapR allows for controlled method of volume creation and management. Quotas are a way to limit the space used by a volume or an entity (user group) by specifying the amount of space a volume is allowed to use. This ensures that capacity is being utilized among all the various tenants and also offers resource isolation.
MapR offers extreme resiliency at various levels. The MapR approach of distributing and replicating the metadata across the cluster ensures extreme resiliency since such a layout eliminates the risk of data loss during a failure.
MapR Snapshots are read-only volume level snapshots. Snapshots can be taken manually or set up on an automated snapshot schedule. Creating snapshots take up no time and does not take up disk space initially since only the change deltas are stored.
Some of the prominent and notable benefits of the MapR implementation of snapshots include:
MapR Snapshots are implemented using the redirect-on-write method where the snapshot is taken but is atomic and consistent.
MapR provides self healing from multiple simultaneous hardware failures, where it reconstructs the data from copies, allowing cluster availability at all times. When a cluster expands, the upgrade and self-healing times go down since all nodes in the cluster participate in the process thereby reducing the time considerably.
MapR offers a flexible platform to manage diverse types of data. By supporting the following protocol interfaces, a wide range of applications that dictate diverse characteristics and metrics can all be hosted on a single platform. This eliminates datacenter sprawl, silos of environments and overall TCO reduction.
MapR-XD is flash optimized to get the most out of the core components of the cluster. Multiple instances of the MFS file server are run in the cluster. Each instance runs on a single node as a single process. Multiple instances are a way to maximize the performance characteristics of the underlying storage system. Thereby, the maximum benefit of configuring Multiple MFS instances can be achieved on an all-flash platform.
MapR has the unique ability to be hardware agnostic and Hypervisor/OS agnostic. Combined with its powerful interfaces, MapR can host multiple varied workloads regardless of the hardware on which it is running and regardless of which operating system is hosted. MapR has a broad ecosystem of hardware and Hypervisor/OS it can run on thereby allowing the freedom of choice to the customer on the MapR platform.