Our three platform services—MapR-FS, MapR-DB, and MapR Streams— are unified by common core capabilities built into the underlying platform such as high availability, real-time access, unified security, multi-tenancy, disaster recovery, a global namespace, self-healing, and management and monitoring.
Enterprise-Grade Platform Services
MapR-FS is the enterprise standard POSIX file system that provides high-performance read/write data storage for the MapR Converged Data Platform. MapR-FS includes important features for production deployments such as fast NFS access, access controls, and transparent data compression at a virtually unlimited scale.
MapR-DB is an enterprise-grade, high performance NoSQL database management system. It is used to add real-time, operational analytics capabilities to applications built to handle big data. Because it is part of the MapR Converged Data Platform, it inherits all the advantages of the underlying platform.
MapR Streams is a global publish-subscribe event streaming system for big data. It connects data producers and consumers worldwide in real time, with unlimited scale. MapR Streams is built into the MapR Converged Data Platform, making it the only highly available streaming system to support global event replication.
The built-in MapR high availability (HA) features eliminate single points of failure at the node, and include file system metadata, NFS access, resource management (YARN), and job tracking levels. You can benefit from:
- High uptime with zero data loss, despite multiple node failures in the cluster.
- No work loss upon node failure to avoid restarting jobs from scratch.
- Rolling upgrades which let you upgrade live clusters one node at a time to minimize planned downtime.
- Zero configuration required to get HA, unlike other big data platforms. No complex setup or manual intervention is needed.
Edge Data Aggregation and Analytics
Organizations that need to capture, process, and analyze IoT data close to the source can take advantage of MapR Edge. MapR Edge is a small footprint edition of the MapR Converged Data Platform providing secure local processing, quick aggregation of insights on a global basis, and the ability to push intelligence back to the edge for faster and more significant business impact. Key capabilities include:
- Distributed data aggregation.
- Bandwidth-awareness that adjusts throughput even with occasionally-connected environments.
- Global data plane that provides global view of all distributed clusters in a single namespace.
- Converged analytics combining operational decision-making with real-time analysis of data at the edge.
- Unified security providing end-to-end authentication, authorization, access control, and on-the-wire encryption from the edge to central clusters.
- Standards-based protocols and APIs.
- Enterprise-grade reliability.
MapR gives you real-time capabilities beyond what other platforms can provide in a single cluster. With businesses always seeking to respond faster to new events, MapR provides key real-time capabilities for:
- Immediate access to large data files in MapR-FS, even as they are being loaded into the system.
- Interactive read and write operations for business applications with MapR-DB.
- Self-service exploration of new data with SQL via Apache Drill, without having to first create a formal schema.
- Reliable delivery of global, high speed streams of event data with MapR Streams.
MapR provides security controls to ensure that sensitive data is accessible only by authorized users. MapR provides:
- Authentication via Kerberos and/or LDAP via Pluggable Authentication Modules, or a native username/password authentication system as an alternative to Kerberos.
- Access controls for files, databases, and streams, including Access Control Expressions (ACEs) for fine grained, Boolean expression-based permissions.
- Performant wire-level encryption protects data sent between nodes and applications to ensure data privacy.
- Comprehensive auditing on data accesses, authentication, and administrative operations.
With multi-tenancy, a capability unique to MapR, you can manage distinct user groups, data sets, and applications in a single cluster while keeping them isolated from each other. You can run different jobs at the same time safely, securely, and efficiently. Several features contribute to the multi-tenancy capability in MapR:
- Volumes - logical partitions of the cluster for creating separate administrative policies such as quotas, permissions, and capacity planning.
- Security - role-based access controls to limit data access to authorized users.
- Data/job placement control - specify on which nodes data resides and jobs run.
- YARN - use the Hadoop 2.X resource scheduler as another level of resource control when running multiple jobs in a cluster.
Ensure fidelity and protection of your critical data with mirroring, replication, and consistent, point-in-time snapshots.
- Scheduled, incremental, block-level mirroring allows you to deploy your mission-critical disaster recovery strategy on large files with low recovery point objectives (RPO) and low recovery time objectives (RTO).
- MapR-DB and MapR Streams deliver immediate updates to remote replicas in real time to enable very low RPOs. Replicas are immediately available for active use upon failover.
- Consistent snapshots protect against data loss or corruption due to user or application errors. Snapshots also can be used for creating consistent, online backups.
MapR Platform Services support distinct global cluster deployments that run as a single logical, global cluster. With global namespace support, you can:
- Access any data sets (with the appropriate access controls) on any remote cluster as if they were part of the local cluster.
- Perform administrative tasks for any globally remote cluster from a single administrative interface.
MapR delivers a powerful node recovery process via patented innovations. MapR serves your big data environments that cannot lose data, must run on a 24x7 basis, and require immediate recovery from node and site failures—all with a smaller data center footprint. MapR supports these capabilities for the broadest set of applications from batch analytics to interactive querying and real-time streaming.
Management & Monitoring
Manage and monitor your big data cluster with the interface that best suits your workflow: browser-based, REST API, or command line.
- Easily provision nodes in your cluster with appliance-like simplicity with the browser-based Auto-Provisioning Templates.
- Manage your infrastructure with instant views/alerts of your cluster health with heatmaps and alarms.
- Manage applications by viewing running jobs for troubleshooting or utilization auditing.
- Manage data with volumes, security, mirroring, and snapshots.
Developers that are using containers can take advantage of the MapR Persistent Application Client Container (PACC) for access to persistent data in MapR. MapR PACCs are pre-built, certified container images that can be used for container applications to securely connect to and access data from all MapR platform services (MapR-FS, MapR-DB, MapR Streams). Key benefits include:
- Pre-built, certified container image.
- Flexibility in deployment, capable of being deployed on MapR nodes or remote nodes, including nodes in the cloud.
- Secure authentication at container level and encrypted data over-the-wire, allowing for secure connections.
- Availability of Docker image and Dockerfile.
When applications go from idea to reality, MapR provides the only production-ready platform for Hadoop, Spark, and related technologies.
The design of the patented MapR Converged Data Platform speaks directly to Enterprise Architects who know best that architecture matters.
MapR provides developers the widest variety of popular open source projects for developing data applications.