Hadoop Adoption – Is the Cluster Half Full?

Contributed by

14 min read

Gartner just released a comprehensive research report based on a survey highlighting the adoption trends around Hadoop, which sheds some light on where and how customers are getting value from Hadoop. Some of the key take-aways in the report include:

  • Despite the success reported by early adopters, over half of respondents (54%) report no plans to invest in Hadoop at this time.
  • Hadoop adoption is driven primarily by technology executives, especially those in the C-suite, including the CFO or COO.
  • Finding enough Hadoop skills continues to be a challenge and barrier to adoption (57% of respondents)
  • Identifying the use case and how to get value from Hadoop was reported as a challenge for 49% of respondents
  • Hadoop deployments had a small population of direct users. (70% of respondents had 20 or fewer users)

Looking at these results through the Gartner Hype Cycle, Hadoop has progressed from the peak of inflated expectations down into the trough of disillusionment as it heads towards mainstream adoption. This is the phase where customers start to understand the realities regarding technology adoption and the applicable use cases. Companies that have experimented with Hadoop are seeing the rough edges and understanding the realities about how the technology can be leveraged appropriately.

But there’s another side to the story.

There has been tremendous success and value derived from Hadoop by those data-centric companies that have crossed into mainstream production usage of Hadoop to power their business. This is what MapR customers are experiencing. We invested early and deeply in the underlying data platform in our Hadoop distribution to make it operationally-ready for success as a production system. Not just for Web 2.0 companies, but for mainstream organizations and enterprise IT.

In order to deploy mission-critical applications on Hadoop, customers need enterprise-grade features. Customers also require more than batch capabilities. The market is shifting to real-time applications. MapR has invested heavily from day one to create an enterprise-grade, trusted platform to support real-time applications. Customers might be slow to adopt a batch-oriented Hadoop distribution that lacks production grade features, but the demand for Hadoop from MapR continues to accelerate.

What are some examples? Customers today are actively preventing fraud through the better use of data while a transaction is happening. Companies are optimizing revenue by better tailoring online sessions to recommend the right products to their customers. Organizations are reducing costs by responding to maintenance issues before there is downtime. These are just some of the ways MapR benefits its customers.

Customers Gain Tremendous ROI from Hadoop

Gartner provides a great view across the Hadoop market on an aggregated basis. Let’s take a look at some primary research that we did with a sample of our 700+ Hadoop customer base through http://www.techvalidate.com/product-research/mapr-distribution-for-hadoop'>TechValidate:

  • MapR customers experienced payback in less than 12 months and greater than 5X returns on their investment
  • 73% of customers deployed new products or services leveraging the MapR Distribution and 59% lowered costs
  • 47% of customers increased revenue
  • 19% of customers reported reduced security and fraud risk

Hadoop reduces costs

And companies using Hadoop from MapR realize an ROI more quickly. One large healthcare customer states they recognized value from MapR 2-5x faster than other Hadoop distributions:

Hadoop ROI

Hadoop Applications Touch Millions of People

One stat from the Gartner survey I found somewhat surprising was that 70% of respondents had only between one and 20 users accessing Hadoop, and 4% had ZERO users.

I have two perspectives on this:

  1. This stat shows we are in a young market where the early adopters are still dipping their toes in the “data lake”. This is not surprising, in general. Many companies use Hadoop initially for data landing, staging, transformation, and “deep” analytics like machine learning. It sounds like many of the survey respondents were not yet in production.
  2. While measuring end-users is interesting, it assumes that Hadoop should merely be an analytic platform where data scientists or analysts query data and look for insights. For data-centric organizations that have automated applications that are integrated into their core business, the value is not tied to the # of users who access the platform. For example, creating

The more real-time, operational applications is where I see customers driving significant results. In stark contrast to Gartner’s survey, MapR customers have jumped in headfirst to the Hadoop data lake and are much further along, on average. Rather than specifically asking our customers how many users they have on their Hadoop cluster, we focused on the number of use cases they run on a single Hadoop cluster - some of which power customer-facing applications for thousands or literally millions of users such as the Aadhaar project in India which protects the unique identity of millions of citizens. Here’s what our customers said:

Hadoop use cases

We see 96% of MapR customers running multiple use cases on a single Hadoop cluster, with 18% running over 50 use cases on a single cluster. We see our segment of the Hadoop market growing extremely well because of the investments we made in the underlying data platform, which allows companies to run multi-tenant, Hadoop-based data lakes/hubs supporting multiple use cases. Perhaps it makes sense, then, that other distributions can only manage to support a small number of users on a given Hadoop cluster.

How Companies Are Overcoming Hadoop Adoption Barriers

Gartner survey respondents point out the challenges in adopting Hadoop in organizations as follows:

  • Obtaining skills and capabilities needed (57% of respondents)
  • Determining how to get value from hadoop (49%)
  • Integrating big data technology with existing infrastructure (44%)
  • Risk and governance issues including security and data privacy (40%)
  • Integrating multiple data sources (40%)
  • Funding for Hadoop-related initiatives (27%)
  • Leadership of organizational issues (20%)
  • Understanding what is “Hadoop” (13%)
  • Other (13%)

Let’s take a look at the first three:

1st Adoption Challenge: “Obtaining skills and capabilities” - this remains the primary barrier in organizations adopting Hadoop. However, as Gartner points out, “vendors are responding to this challenge by offering a variety of training options,” which is absolutely right. We recognized Hadoop education as a bottleneck since most training involved a classroom setting and instructor-led classes, which is expensive and does not scale quickly enough to close the gap. MapR developed Hadoop On-Demand Training that is easily accessible online for free. MapR classes are interactive and provide a path to certification for developers, administrators and analysts, with new courses available each quarter. The response has been tremendous and we have seen over 20,000 people enrolled in training since it was launched at the end of January.

2nd Adoption Challenge: “Determining how to get value from Hadoop” - forty-nine percent (49%) of respondents listed this as one of their top three challenges. Hadoop provides broad flexibility to run many types of processes (e.g., declarative languages such as SQL, procedural processing for statistics and machine learning, graph process, etc.). This can be applied to literally 1000s of different use cases across any industry, leaving people with the question, “Where should I start?”

Real-world HadoopPart of my job is to identify the most common use cases we see from customers and package that into repeatable solutions or “blueprints” and examples that others can learn from and get a “quick start” on identifying their use case and delivering a working application on Hadoop. One great place to start for companies new to Hadoop is a recent book published through O’Reilly Media called “Real-World Hadoop” which covers customer examples in detail, including the architecture and use of various projects within the Apache Hadoop ecosystem.

In addition, MapR has created Quick Start Solutions (product, services, and training) for the three common Hadoop use cases our customers deploy:

  1. Data architecture optimization and general self-service data exploration - some people refer to this as a “data lake” or “enterprise data hub.” It provides a single place to store and process any/all data in an organization which was previously thrown away because of cost and low business value density. Discover new insights, transform data, and turn it into business value either within Hadoop or load into other special-purpose systems such as a CRM system or data warehouse. Roughly 40% of Hadoop use cases with MapR are to provide analytics on untapped data or to offload and augment processing from other systems.

  2. Log analysis - security logs, IP logs, web logs, sensor logs. Each type of log analysis lends itself to different business outcomes, whether it’s to better understand consumer preferences by studying their behavior across web, mobile, store, and call center, or better detection of anomalous behavior which might indicate fraud or malicious attacks of systems by bots or malware. These systems generate huge amounts of semi-structured data, requiring cost-effective and flexible storage for dynamic data structures which are hard to maintain in relational databases. Security log analysis is a use case for roughly 20% of our customers, and we see this applied in heavily regulated industries such as financial services, communication service providers (CSPs), and healthcare.

  3. Real-time operational applications - one common component we see are recommendation engines - more relevant customer experience with next-best-offer on the web or mobile applications. Again, we see roughly 20% of Hadoop users running this use case. One common misnomer is that Hadoop is only for batch workloads, but more and more companies are using NoSQL databases such as Apache HBase together with search-based technologies such as Apache Solr or Elasticsearch for low-latency applications in content targeting or recommendations.

3rd Adoption Challenge: “Integrating big data technology with existing infrastructure” - one way to interpret this is that respondents don’t think that Hadoop is enterprise grade, providing the business continuity for SLAs or integration with existing standards that organizations have in place. That is where MapR started in our journey to make Hadoop production-ready. A large percentage of our customers comes from experienced Hadoop users who bet their business on Hadoop and NEED it to be a production, mission-critical system. That’s what we do. TechValidate research shows that the top three reasons people choose MapR are for high availability, performance, and ease of data integrations (i.e., “integrating with existing infrastructure”).

Why MapR

It is important that Hadoop can support all the standard import/export projects like Sqoop to connect with other systems, but what about other standards like JDBC/ODBC, or POSIX/NFS? What if your Hadoop cluster could provide a random read/write file system for enterprise storage? That’s a core investment we made in our Hadoop distribution early on, so that customers see more success with their Hadoop deployment when using MapR.

The Road Forward for Hadoop: Real-time, Production Applications

If you want to start getting business value from Hadoop, you need to overcome these initial challenges pointed out in Gartner’s survey. Adopting emerging technology is a journey, and these are common challenges early in the process. The next phase of the Hadoop journey our customers see is having Hadoop be a real-time and a multi-tenant environment where this shared data lake/hub/system can easily support multiple business units as well as isolating resources to run production/operational workloads and meet business-level SLAs. Hadoop must be as reliable and production-ready as NetApp enterprise storage, Oracle databases, or your Teradata data warehouse.

And real time is NOT just about how fast you get a query response. Real time is across multiple dimensions (e.g., data ingest, query response, processing time), enabling companies to complete the data-to-action cycle faster than ever in order to respond more quickly to risk and opportunities in their business, as evidenced by this final view of our customer research:

Real-time Hadoop

So, wherever you are in the "big data" journey, know that Hadoop is one of the largest and fastest-growing ecosystems, and there are LOTS of resources available to help you on your journey. MapR customers represent the leading edge of enterprise Hadoop adoption, and you can read their stories and experience here.

This blog post was published May 15, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now