Using MapR XD with Big Data in SAP IQ and HANA

Contributed by

7 min read

Until recently, most of the data in the telecommunications industry was discarded because there simply was no cost effective way to store and analyze it. With the rise of big data technologies, that has changed, and so has the telecom competitive landscape. Margins are razor thin, and the industry is more competitive; it is not about voice, it is about data. Carriers that can monetize the enormous amounts of data flowing through their network will not only gain competitive advantage but also potentially open up entirely new lines of business. I recently attended a talk given by SAP on their SAP Consumer Insight product, one of the new breed of applications that extract value from the telecom data stream.

Using MapR XD with Big Data in SAP IQ and HANA

SAP Digital Interconnect simplifies complex digital transformations for telecommunications carriers. It connects the “last mile” with enterprises and consumers worldwide for more than 1,000 mobile operators. When data flows between carriers, it most likely flows through SAP Digital Interconnect.

A good example of monetizing the mountains of data telecom that providers have is SAP Consumer Insight 365, a cloud service powered by HANA and MapR XD to gather and aggregate billions of anonymized data points from mobile network operators. By collecting and analyzing big data from mobile carriers, SAP can offer insights into consumer behavior, answering such question as: who are my customers (demographics), where are they coming from (mobile location), and what are they doing (clickstream)?

The scale of this system is impressive, ingesting 10-12 TB of data per day/operator, globally. With 1,000 operators, this works out to 12 petabytes per day. That is a big data problem. Looking through the lens of a CIO who needs to forecast total system costs, the magnitude of the problem starts to come into focus. Let’s break down the problem for a mid-size telecom provider with 25 million subscribers:

Table Data

Using a tiered architecture, SAP Consumer Insight 365 keeps the hot data in SAP Hana (for up to one month) and the remainder in the IQ/MapR XD tier. That is a lot of data, and those numbers are only for a single mid-sized operator; there are 999 more. The size of data on MapR XD was about 3.4 PB of raw used space, for a total of about 2 PB of actual data. We had good compression and a number of duplicate vs. triplicate volumes to help with storing that data.

Why SAP Chose MapR XD for SAP IQ and HANA

I appreciated the honesty in this talk. It is one thing when a vendor trumpets how great they are, but entirely different when a customer gives you praise. Some of the reasons SAP chose MapR, straight from the horse’s mouth include:

  • Easy implementation
  • Modular and componentized architecture, in line with SAP requirements
  • Very cooperative vendor
  • Up to 10 times cheaper to operate when using high-end storage
  • Easy scale up/down
  • Ability to add/remove components without bringing the cluster down

It Is All About Performance

It is worth mentioning that SAP Consumer Insight has an award-winning GUI (five awards that I am aware of, perhaps more). These kinds of awards can only be won if the user interface is fast and responsive, and in this case, it means fast queries across tiered data sources, so it is no surprise that one of the big reasons SAP chose MapR XD is performance. During the proof of concept, SAP tested write speeds per node, and here are their benchmarks, verbatim from the presentation:

  • 8 node MapR cluster, 256 GB RAM each, 4x10 GB links
  • 1.5 GB/sec sustained write speeds per multiplex node (tested up to 4 IQ nodes), near linear scale per node
  • No observed performance impact during forced MapR node failures

The emphasis in the second point is mine and worth explaining. This means for total cluster write throughput, you add the number of nodes. With two nodes, you get 3 GB/sec; three nodes gets you 4.5 GB/sec. For those interested, the fastest certified speed is 12 GB/sec. By way of comparison, local disks typically go up to only 6 GB/sec. It is also worth mentioning that SAP did this work without the POSIX/FUSE client; this is all NFS. There were a few other points from the PoC worth mentioning:

“We failed three nodes within a few minutes of each other and created no errors. Once the fourth node was failed, IQ (the database) errors began to occur. A simple shutdown of IQ and restart of the MapR cluster nodes resulted in complete recovery.”

In the end, MapR XD eliminated the need for a three-tiered architecture. The storage layer was so fast and so cheap that they simply have hot and warm data tiers.

The SAP and MapR Journey Continues

Happy with the results of the PoC, the SAP Consumer Insight 365 team put MapR XD into production. With about 50 MapR volumes currently under management, hundreds of thousands of files in some directories and tens of thousands of files open simultaneously, the data growth rates significantly exceeded projections. The implementation plan had to be dialed back, not to tune the software, but because hardware could not be procured and provisioned fast enough.

Final Quotes

I’ll conclude with these final quotes from the presentation:

  • "For most things, it just works, and is highly durable."
  • "MapR (the company) is VERY helpful when we have a problem."

The SAP team is publishing their findings as they go, so if you’re building a high-performance cloud-based file system, stay tuned as the best practice guides will be available soon.

Additional Resources

This blog post was published August 07, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now