Gartner Provides Side by Side Hadoop Comparison

Contributed by

5 min read

It's always interesting to compare products side-by-side and Gartner Research Director Svetlana Sicular has done just that in her The Charging Elephant blog series. Sicular posed the same set of questions to the participants of the recent Hadoop panel at the Gartner Catalyst conference. Jack Norris, MapR CMO participated, here's his take on the first question.

Q: How specifically are you addressing variety, not merely volume and velocity?

A: MapR has invested heavily in innovations to provide a unified data platform that can be used for a variety of data sources, such as clickstreams to real-time applications that leverage sensor data.

The MapR platform for Hadoop also integrates a growing set of functions including MapReduce, file-based applications, interactive SQL, NoSQL databases, search and discovery, and real-time stream processing. With MapR, data does not need to be moved to specialized silos for processing, data can be processed in place.

This full range of applications and data sources benefit from MapR’s enterprise-grade platform and unified architecture for files and tables. The MapR platform provides high availability, data protection and disaster recovery to support mission-critical applications.

MapReduce: MapR provides world record performance for MapReduce operations on Hadoop. MapR holds the Minute Sort world record by sorting 1.5 TB of data in one minute. The previous Hadoop record was less than 600 GB. With an advanced architecture that is built in C/C++ and that harnesses distributed metadata and an optimized shuffle process, MapR delivers consistent high performance.

File-Based Applications: MapR is a 100% POSIX compliant system that fully supports random read-write operations. By supporting industry standard NFS, users can mount a MapR cluster and execute any file-based application, written in any language, directly on the data residing in the cluster. All standard tools in the enterprise including browsers, UNIX tools, spreadsheets, and scripts can access the cluster directly without any modifications.

SQL: There are a number of applications that support SQL access against data contained in MapR including Hive, Hadapt and others. MapR is also spearheading the development of Apache Drill that brings ANSI SQL capabilities to Hadoop. Apache Drill, inspired by Google’s Dremel project, delivers low latency interactive query capability for large-scale distributed datasets. Apache Drill supports nested/hierarchical data structures, schema discovery and is capable of working with NoSQL, Hadoop as well as traditional RDBMS. With ANSI SQL compatibility, Drill supports all of the standards tools that the enterprise uses to build and implement SQL queries.

Database: MapR has removed the trade-offs organizations face when looking to deploy a NoSQL solution. Specifically, MapR delivers ease of use, dependability and performance advantages for HBase applications.. MapR provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks. From a performance standpoint, MapR delivers over a million operations per second from just a 10-node cluster.

Search: MapR is the first Hadoop distribution to integrate enterprise-grade search. On a single platform customers can now perform predictive analytics, full search and discovery; and conduct advanced database operations. The MapR enterprise-grade search capability works directly on Hadoop data but can also index and search standard files without having to perform any conversion or transformation. All search content and results are protected with enterprise-grade high availability and data protection, including snapshots and mirrors enabling a full restore of search capabilities.

By integrating the search technology of the industry leader, LucidWorks, MapR and its customers benefit from the added value that LucidWorks Search delivers in the areas of security, connectivity and user management for Apache Lucene/Solr.

Stream Processing: MapR provides a dramatically simplified architecture for real-time stream computational engines such as Storm. Streaming data feeds can be written directly to the MapR platform for Hadoop for long-term storage and MapReduce processing. Because MapR enables data streams to be written directly to the MapR cluster, MapR allows administrators to eliminate queuing systems such as Kafka or Krestel and perform publish-subscribe models within the data platform. Storm can then ‘tail’ a file to which it wishes to subscribe, and as soon as new data hits the file system, it is injected into the Storm topology. This allows for strong Storm/Hadoop interoperability, and a unification & simplification of technologies onto one platform.

Click here to read all of Jack's responses.

This blog post was published August 27, 2013.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now