The MapR Converged Community Edition is an integrated platform consisting of Apache Hadoop, an event streaming system, a NoSQL database, and a distributed POSIX file system. It includes the latest innovations from the Hadoop 2.X and open source communities such as Apache HBase™, Apache Storm™, Apache Pig, Apache Hive™, Apache Mahout™, YARN, Apache Sqoop™, Apache Flume™, and more. It also delivers high performance, real-time operations with MapR-DB, MapR Streams, and MapR-FS.
MapR Converged Community Edition (MapR CE) is a free edition of the MapR Converged Data Platform, with usage restrictions specified in the MapR End User License Agreement, and with community forum support1. This free version includes Apache Hadoop, Apache Spark™, MapR-DB (NoSQL database), MapR Streams (event streaming), and MapR-FS (POSIX file system). MapR CE enables distributed processing of large data sets across a cluster of servers. MapR delivers a proven platform that supports a broad set of large-scale, real-time applications.
If you seek enterprise-grade business continuity capabilities, please see the Apache Hadoop in the MapR Converged Data Platform data sheet, the MapR-DB data sheet, and the MapR Streams data sheet for more information.
Project choices. MapR supports a broad set of Hadoop projects, including the entire Apache Spark™ stack, YARN, Apache Drill, and more. MapR helps customers select the right tool for their specific requirements.
Monthly certified updates. MapR gives you access to the latest cutting-edge projects on Hadoop.
Backward compatibility. MapR lets you upgrade specific projects without needing to upgrade core Hadoop packages. Additionally, MapR lets you upgrade Hadoop and run your existing applications as is without rewriting them.
The high performance, integrated NoSQL database, MapR-DB, lets you run analytics on live data without data copying, and deploy multiple use cases and workloads in a single, operationally efficient cluster.
MapR Streams lets you reliably deliver event data streams for real-time processing. With MapR Streams you can connect data producers and consumers in a high performance, publish/subscribe model.
Apache Drill on MapR lets you immediately query complex datasets such as nested data, NoSQL data, and data with rapidly evolving schemas, without requiring schema preparation. ANSI SQL support lets you use your existing business intelligence tools. For more information, please see the Apache Drill data sheet.
Standard Hadoop tool support. MapR supports all Hadoop APIs and Hadoop data processing tools to access Hadoop data. You can move data in the MapR Distribution easily into other distributions, and vice versa.
Standards-based file access. Unlike other distributions, MapR provides true Network File System (NFS) capabilities. MapR Direct Access NFS™ lets you access Hadoop like a standard file system (via a single NFS mount point), to copy data into and out of Hadoop easily at high rates, or to access Hadoop data using common command line tools and desktop applications. The optional addon MapR POSIX Client provides authenticated NFS access from remote nodes, along with over-the-wire compression and parallel access to boost throughput.
Industry standards. MapR fully supports additional industry-standard APIs, including ODBC/JDBC, LDAP, Kerberos, HBase, JSON, HDFS, NFS, and more.
Third-party tool ecosystem. The entire ecosystem of third-party tools (BI, ETL, etc.) built for use on Hadoop work on MapR. Examples of certified tools are available at the MapR App Gallery mapr.com/appgallery.
Portable applications. Hadoop applications built on MapR run on any other Hadoop distribution, and vice versa, with no code changes or recompilation.
Kerberos and LDAP integration. MapR supports authentication services via Kerberos and/or LDAP.
Native authentication. MapR also offers a standards-based authentication system as a simpler alternative to Kerberos that leverages Linux Pluggable Authentication Modules (PAM) to provide the widest registry support.
Access control. Data is secured using standard Unix file permissions and advanced role-based access control expressions (ACEs).
Comprehensive auditing. MapR auditing logs help to analyze user behavior as well as to meet regulatory compliance requirements. MapR uses the JSON format to log accesses at the administrative, authentication, database, and file levels.
Performant wire-level encryption. MapR encrypts data sent between nodes and applications to ensure data privacy, using Intel AES-NI capabilities where available.
The MapR Converged Community Edition supports standard multitenancy beyond the capabilities in YARN via volumes and security features to let distinct user groups, data sets, and applications coexist in isolation in the same cluster. More advanced multi-tenancy capabilities on data and job placement control are available in the enterprise editions of the MapR Distribution.
Volumes. MapR supports the logical grouping of files and directories on which policies (permissions, replication factors, quotas, etc.) can be set.
Security. MapR authentication and authorization controls provide another level of user and data isolation.
Customers can reduce their data center footprint with the MapR performance advantage by deploying as few as one third the servers of other distributions. Faster file access and a faster optimized shuffle for MapReduce lets customers get more work out of their hardware investment. A MapR cluster can scale to thousands of nodes and can store trillions of files.
MapR officially set the MinuteSort record by sorting 1.5 TB of data in under a minute on Google Compute Engine. A MapR customer has since exceeded that record by sorting 1.65 TB, with one seventh the number of servers of the highest non-MapR record.
The integrated NoSQL database, MapR-DB, is built on the core MapR Data Platform which set records on both the TeraSort and the MinuteSort benchmarks. Recently, MapR-DB ran over 30,000 batch put operations per second per node, and showed as much as an eleven-fold speed improvement over HBase. With its in memory feature, MapR-DB can store a database in memory for additional performance gains.
Auto-tuning and data structure innovations in MapR-DB ensure consistent low latency, even at the 95th and 99th percentile latency measurements. MapR (in red on the graph) consistently responds quickly, while the other distribution (in blue) shows many high latency spikes due to inefficient disk cleanup activities.
MapR Control System. To manage, administer, and monitor your Hadoop cluster, the MapR Control System (MCS) is a browser-based interface to let you immediately view the status of your cluster via heatmaps, and drill into specific issues to investigate any problems. Alarms proactively notify you if potential problems arise.
Rolling upgrades. To minimize planned downtime, MapR allows a node-by-node Hadoop upgrade on a live cluster. With MapR backward compatibility, existing applications can still run on an upgraded Hadoop cluster with no modifications.