7 min read
Until recently, most of the data in the telecommunications industry was discarded because there simply was no cost effective way to store and analyze it. With the rise of big data technologies, that has changed, and so has the telecom competitive landscape. Margins are razor thin, and the industry is more competitive; it is not about voice, it is about data. Carriers that can monetize the enormous amounts of data flowing through their network will not only gain competitive advantage but also potentially open up entirely new lines of business. I recently attended a talk given by SAP on their SAP Consumer Insight product, one of the new breed of applications that extract value from the telecom data stream.
SAP Digital Interconnect simplifies complex digital transformations for telecommunications carriers. It connects the “last mile” with enterprises and consumers worldwide for more than 1,000 mobile operators. When data flows between carriers, it most likely flows through SAP Digital Interconnect.
A good example of monetizing the mountains of data telecom that providers have is SAP Consumer Insight 365, a cloud service powered by HANA and MapR XD to gather and aggregate billions of anonymized data points from mobile network operators. By collecting and analyzing big data from mobile carriers, SAP can offer insights into consumer behavior, answering such question as: who are my customers (demographics), where are they coming from (mobile location), and what are they doing (clickstream)?
The scale of this system is impressive, ingesting 10-12 TB of data per day/operator, globally. With 1,000 operators, this works out to 12 petabytes per day. That is a big data problem. Looking through the lens of a CIO who needs to forecast total system costs, the magnitude of the problem starts to come into focus. Let’s break down the problem for a mid-size telecom provider with 25 million subscribers:
Using a tiered architecture, SAP Consumer Insight 365 keeps the hot data in SAP Hana (for up to one month) and the remainder in the IQ/MapR XD tier. That is a lot of data, and those numbers are only for a single mid-sized operator; there are 999 more. The size of data on MapR XD was about 3.4 PB of raw used space, for a total of about 2 PB of actual data. We had good compression and a number of duplicate vs. triplicate volumes to help with storing that data.
I appreciated the honesty in this talk. It is one thing when a vendor trumpets how great they are, but entirely different when a customer gives you praise. Some of the reasons SAP chose MapR, straight from the horse’s mouth include:
It is worth mentioning that SAP Consumer Insight has an award-winning GUI (five awards that I am aware of, perhaps more). These kinds of awards can only be won if the user interface is fast and responsive, and in this case, it means fast queries across tiered data sources, so it is no surprise that one of the big reasons SAP chose MapR XD is performance. During the proof of concept, SAP tested write speeds per node, and here are their benchmarks, verbatim from the presentation:
The emphasis in the second point is mine and worth explaining. This means for total cluster write throughput, you add the number of nodes. With two nodes, you get 3 GB/sec; three nodes gets you 4.5 GB/sec. For those interested, the fastest certified speed is 12 GB/sec. By way of comparison, local disks typically go up to only 6 GB/sec. It is also worth mentioning that SAP did this work without the POSIX/FUSE client; this is all NFS. There were a few other points from the PoC worth mentioning:
“We failed three nodes within a few minutes of each other and created no errors. Once the fourth node was failed, IQ (the database) errors began to occur. A simple shutdown of IQ and restart of the MapR cluster nodes resulted in complete recovery.”
In the end, MapR XD eliminated the need for a three-tiered architecture. The storage layer was so fast and so cheap that they simply have hot and warm data tiers.
Happy with the results of the PoC, the SAP Consumer Insight 365 team put MapR XD into production. With about 50 MapR volumes currently under management, hundreds of thousands of files in some directories and tens of thousands of files open simultaneously, the data growth rates significantly exceeded projections. The implementation plan had to be dialed back, not to tune the software, but because hardware could not be procured and provisioned fast enough.
I’ll conclude with these final quotes from the presentation:
The SAP team is publishing their findings as they go, so if you’re building a high-performance cloud-based file system, stay tuned as the best practice guides will be available soon.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.