HP Leverages the Power of MapR in its Big Data Infrastructure

HP’s MapR Big Data platform has become the launching point for dozens of internal solutions and customer offerings. Across use cases, HP is benefitting from its MapR Big Data solutions with improved data availability, scalability,massive storage, cost savings, and flexibility.

The Business

Silicon Valley icon Hewlett-Packard is in the midst of a turnaround aimed at shifting the business to new growth areas. Faced with tough competition, HP’s new strategy is focused on providing integrated solutions in cloud computing, security, Big Data, and mobility.

"MapR’s responsiveness, service, and support were second to none. And MapR addresses our long-term needs by being a reliable Hadoop distribution geared towards the enterprise."

-- Keith Dornbusch, Manager, Big Data Services IT Team, HP

The Challenge

HP needed to find the right technology and partner to build an infrastructure for its internal Big Data development projects and to serve as the platform for new client offerings.The company needed to ensure high availability across six data centers for mission critical applications.

"The MapR technology was top-notch, particularly their performance and high availability features. Plus, our working relationship with the vendor was excellent."

-- Keith Dornbusch, Manager, Big Data Services IT Team, HP

MapR Solution

Once HP decided to implement ApacheTM Hadoop® in its Big Data, it implemented a comprehensive evaluation of Hadoop vendors.

“For everything we measured across the board—performance, high availability, disaster recovery, manageability, knowledge base, future roadmap—MapR was the clear choice,” said Keith Dornbusch, Manager of HP’s Big Data Services IT Team. “The MapR technology was top-notch, particularly their performance and high availability features. Plus, our working relationship with the vendor was excellent. MapR’s responsiveness, service, and support, were second to none. And MapR addresses our long-term needs by being a reliable Hadoop distribution geared towards the enterprise.”

Building a ‘Data Lake’ platform for multiple uses
HP has established a new internal architecture and “data lake” environment based on Hadoop and MapR. The data lake provides a common platform that holds all data derived from different sources, such as customers, production, or finance. “The real value is breaking down silos and bringing the data together,” says Paul Westerman, Senior Director of IT Big Data Solutions at HP. “With Hadoop, you can drop files in and the data is there. It’s a different type of development environment.” HP deployed the Hadoop environment on 320 of HP’s DL380e servers that have Dual Intel ES-2440 processors. Each of these servers provides approximately 20 terabytes of storage and 128 gigabytes of SD RAM, adding up to an ideal environment for Big Data storage and processing.

By bringing all the data together, MapR is enabling HP to develop new solutions including:

Monitoring product quality through telemetry data
Being able to store and access a larger volume of data than previously, HP can offer new services to improve customer experience and be more proactive about customer service. “The most impressive MapR use case is about telemetry data,” explains Dornbusch. “This is the data that our machines create during the manufacturing process. Once the product is in the customer’s hands, we can monitor how it is performing and that data is sent back. We can determine whether a customer will have a problem before it happens.”

Creating a better 360-degree customer experience
Another MapR solution strives to improve the overall HP customer experience by combining data from across all HP divisions into one dashboard that includes all interactions each customer has with the company. “The most important use case is customer experience. This will be a game changer for HP,” says Westerman. “We will be able to bring together all information about every place and time that HP touches a customer. We will combine silos into one dashboard so we can provide a better experience for our customers.”

Being smarter about selling through the website
HP is also using MapR to get smarter about how it sells to customers through their website. The click data portion of hp.com generates hundreds of billions of clicks. “We are now able to store and use this data to improve a customer’s web experience. MapR and Hadoop are managing five petabytes of data on dual 46-node clusters set up with 20 terabytes per node,” he says. “For example, having this data allows us to customize an offer based on the person’s online behavior.”

There are also several other MapR solutions being used that collect and analyze data in the areas of printer usage, security, and employee experience.


High availability
MapR’s high availability and disaster recovery features have been critical to HP.

“One of the key reasons we chose MapR was because of its high availability and disaster recovery capabilities,” says Dornbusch. “We have six data centers; if we lose one, it can fail over to another. We require automatic failover between clusters.”

Flexible development environment
MapR’s NFS features are also very important to HP. “We use Hadoop to move data out of flat files to use across platforms. We are able to get our data out of silos. Now we can get to it and use it quickly,” explains Dornbusch. “Hadoop helps us with time-to-market for our internal customers. We can have a platform space for our applications team available almost immediately. We built a farm of servers so we can expand rapidly.”

The MapR platform enables HP to plan for the future by knowing that they can harness the explosion of data they are managing across their businesses. “We’re seeing the tip of the iceberg with Big Data. For example, our Enterprise Data Warehouse (EDW) has grown 50% per year for the past seven years. However, this past year alone, the amount of data we are managing in just one implementation of HP Vertica Analytics Platform and Hadoop is twice the volume as our EDW,” Dornbusch says.

Cost savings
One of the primary benefits of the MapR environment is cost savings. “MapR enables massive storage at low cost,” he says. “We process billions of records per second. The MapR technology enables us to maintain data that we couldn’t manage previously and analyze that data quickly in ways we couldn’t do before.”

“There are specific benefits for each project. For example, in one analysis we conducted on warranty data, there was a $10 million ROI because we did not pay claims that were fraudulent,” explains Dornbusch.