6 min read
As I discussed in my keynote presentation at Strata + Hadoop World in New York, there are a lot of myths and misconceptions when it comes to Hadoop. Let’s take a closer look at the architecture and customer use cases that highlight the power of Hadoop and separate the myths from reality.
There are many commercial Hadoop distributions in the marketplace today, but the reality is that we all share the same open source Apache code. Hadoop is one of the first markets that’s actually been created by open source technology, and given its early stage, it’s appropriate and, in many cases required, to combine open source code with innovations to meet customer requirements. The result? An incredibly strong ecosystem: Hadoop is by far the fastest growing Big Data technology, and is one of the top 10 fastest growing technologies overall in terms of job growth.
Hadoop is at the center of Big Data, in stark contrast to the NoSQL market, where there is no consensus, no common API, and no ability to seamlessly move workloads across solutions. However, the one NoSQL solution that has an inherent advantage is Apache HBase, which is integrated with Hadoop and included with every commercial distribution. You might think that if HBase is included in every distribution, and every distribution shares the same open source code, then HBase must run the same across all distributions. This is not the case, because the reality is that architecture matters. Take a look at the architecture that supports Apache HBase applications:
On the left you can see that HBase is running on Java, which is running its data into the Hadoop Distributed File System, which is running on another Java instance, which is writing into the Linux file system, which is writing to disk. To make matters worse, you have database operations trying to write to a write-once storage layer of HDFS. On the right hand side is an example of the MapR M7 architecture that eliminates the Java dependency, collapsing those intermediate data layers, and removing the complexity. As a result, HBase applications experience drastically improved performance.
As the graph below illustrates, the performance results of the MapR M7 architecture are dramatic. In orange, you can see the performance of HBase applications on other distributions, which shows tremendous latency spikes. Imagine trying to program a real-time online application with HBase given those results. That’s in sharp contrast to the blue line, which shows consistent low-latency for HBase applications on a 24x7 basis with the MapR M7 distribution.
The reality is that a significant number of companies are already enjoying production success with Hadoop. Here are just a few examples:
You may be thinking, “But these are Web 2.0 companies. Traditional enterprises are still experimenting —maybe they’re doing some lightweight ETL, but nobody’s really seriously using Hadoop in production.”
The reality is that there are a significant number of companies achieving powerful results with Hadoop. Here are just a few:
In addition to the multitude of Hadoop use cases and production success in healthcare, manufacturing, telco, and government agencies, here are a few of the more unusual production use cases for Hadoop:
These examples are just a subset of the more than 500 paying MapR customers that are using Hadoop today. Many have switched from another Hadoop distribution and have done so seamlessly, to enjoy production success with MapR.. Is Hadoop ready for prime time? Absolutely.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.