MapR ties YARN and HP Vertica to its MapR Distribution

MapR ties YARN and HP Vertica to its MapR Distribution

Matt Aslett, 451 Research Analyst

MapR Technologies has updated its Hadoop distribution with support for the Apache YARN resource management framework, and expanded its SQL-on-Hadoop strategy to embrace HP's Vertica Analytics Platform.

The 451 Take

MapR has been reluctant in the past to reveal details about customer traction, so the disclosure that it has 500 paid licensees is significant, as is support for YARN as MapR, like other Hadoop vendors, looks toward Hadoop becoming a more flexible platform for multiple workloads. The company can claim an advantage in relation to improved flexibility given its support for multiple storage approaches, as well as its agnosticism when it comes to the various approaches to enabling SQL-based analysis of data in Hadoop.


MapR has become the latest distributor of Apache Hadoop to update its distribution to the Hadoop 2.x code base, including the Apache YARN resource management framework. The company has also expanded its SQL-on-Hadoop strategy to embrace HP's Vertica Analytics Platform as part of an open approach that will support various SQL-on-Hadoop approaches and projects. MapR has also introduced a new virtualized environment to enable potential customers to get up and running with MapR's distribution.

The introduction of YARN enables Hadoop to become a more flexible platform for supporting multiple concurrent workloads and expanding Hadoop beyond MapReduce applications. The latest version of the MapR Distribution including Apache Hadoop is based on Hadoop 2.2, including YARN, and has the additional benefit of being able to integrate some of MapR's differentiating functionality with YARN, such as its POSIX and NFS-compatible file system.

Other differentiating MapR capabilities include to point-in-time snapshots and disaster recovery, mirroring and high availability, while its ExpressLane capability – which is designed to give priority to the processing of small MapReduce jobs with automatic resource allocation – will specifically be able to take advantage of YARN for improved workload management. MapR also enables users to take advantage of both Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster at the same time.

Updates to the M3, M5 and M7 editions of the MapR Distribution including Apache Hadoop will be introduced in March, as will support for HP's Vertica Analytics Platform running on MapR. Currently in early access release, the Vertica Analytics Platform on MapR enables HP's Vertica analytic database to run on the same cluster nodes as MapR's Hadoop distribution, taking advantage of MapR's file system for full ANSI-SQL based querying of data in Vertica and MapR, as well as its high availability, snapshots and mirroring functionality.

Support for HP Vertica is part of an expanded, open strategy for SQL-on-Hadoop that will also support customers using the MapR-initiated Apache Drill project, as well as Cloudera's Impala, the Shark in-memory project, the Presto project, Apache Hive (including the Stinger project enhancements) and Apache Tez. MapR maintains that there are different use cases for all these approaches at this stage in terms of maturity, depth of functionality and breadth of SQL support, and that it will provide customers with the flexibility to use the approach that is appropriate to them.

Finally – in terms of the latest announcements at least – MapR has launched the MapR Sandbox for Hadoop, a virtualized environment containing the MapR Distribution including Apache Hadoop to enable potential customers to start experimenting with the distribution. The MapR Sandbox also includes tutorials for developers, analysts and administrators and is designed for training, development and testing use cases.

e MapR Sandbox for Hadoop is something of a new strategy for MapR, given that to date it has focused on letting others evangelize and build the market and then attempted to step in by targeting its differentiated functionality on those that are already in production with claims of additional performance and reliability. That approach has enabled the company grow to the extent that it now claims 500 paid licensees, with a large proportion of its bookings coming from Global 2000 companies in areas such as financial services, retail, security, healthcare and telecom.


MapR's primary competition comes from its fellow Hadoop distributors Cloudera, Hortonworks, Pivotal, IBM and Intel, although MapR is significantly differentiated given its additional functionality. It is not alone in extending the core Hadoop distribution, however, as IBM offers General Parallel File System as an alternative to Hadoop Distributed File System (HDFS) in its BigInsights Enterprise Edition product, while Pivotal offers optional integration with its HAWQ and GemFire technologies, and while Cloudera's distribution is fully open source, its Cloudera Enterprise management functionality is not.

Other potential competitors include Hadoop-as-a-service providers such as Microsoft, Rackspace, Treasure Data, Altiscale and Qubole, as well as Amazon Web Services with Elastic MapReduce, although the latter has a partnership with MapR to enable Amazon EMR users to select MapR's distribution as an option. MapR also has a relationship with Google through which the MapR Platform is available on the Google Compute Engine. MapR is also looking to differentiate itself in 2014 with a focus on operational workloads, extending into the future potentially even to transactional workloads. The latter would potentially take it into competition with Splice Machine, although it is also a partner, so initiatives there may well be collaborative.

SWOT Analysis

MapR offers a differentiated product that provides alternative capabilities for some of Hadoop's traditional problem areas.

There are plenty of enterprises that have put Hadoop through its paces and are now looking to take projects into production, which is where MapR has to date stepped in.

That differentiation can also be viewed in a negative light as a deviation from the core open source Apache Hadoop project.

The Hadoop sector is likely to get more crowded as traditional data management providers move in and consolidate. Some specialists will be left behind.