7 min read
At Strata+Hadoop World in New York last week, MapR CMO Jack Norris talked about the Big Data Dividend – the ongoing, significant profits that are derived from data-driven applications. In his keynote, Jack provided a look at the bigger picture. The key message? We’re in the middle of the biggest change in enterprise computing in decades.
Here are the high level points that Jack made in his keynote:
The way we handle, process and analyze data is going through a tremendous transition. This transition phase revolves around three major aspects:
1) Continuous analytics – It’s not about weekly reports or periodic analysis anymore. Companies that are getting the biggest results from their data are incorporating continuous analytics into their operations.
A great example of continuous analytics is American Express. This information was shared in a recent Information Management interview with two American Express senior vice presidents. American Express is using MapR in their multi-petabyte platform. The company has been on a multi-year journey to incorporate machine learning into their business. They’re using machine learning for a wide range of applications, from customer service to managing risk.
As an example, they have Amex Offers, which is a machine learning application that provides customized, relevant offers on mobile devices to card members. Their big data application is pulling merchant data, spending history, preferences, and is using an embedded algorithm that continually learns and improves.
American Express also leverages big data to identify potential fraud. Any time that an American Express card is used throughout the world, they’ve got a big data platform that is used to protect $1 trillion dollars in annual charge volume. They make a point of sale decision in 2 milliseconds.
Continuous analytics is powerful, but that power is fueled by both the volume and variety of data sources, which brings us to the second aspect of the data transition:
2) Convergence – Organizations are moving from dedicated applications on their own “pools” of data, to many applications running across a single big data platform—a “data ocean at scale.”
We’ve already seen leading companies move in this direction. TechValidate recently did a study and found that 18% of MapR production customers were running 50 or more applications on a single cluster. That’s a testament to the maturity of the platform, the multi-tenancy, the security, and the workload management capabilities that are on that platform. The benefits of speed, efficiency and business impact are hard to overestimate.
3) Operational Agility – Performing continuous analytics and centralizing the data is important, but the third aspect is what EY (formerly Ernst & Young) calls “operational agility,” and it’s the major force for disruption.
In the future, when we look back at today’s business environment, it will be interesting to see which companies like Kodak, Blockbuster, and Blackberry were slow to respond, and which companies adopted operational agility to zoom ahead of the competition and establish dominance.
When it comes to operational agility, where do you start? The best place to start is with the data itself. Your data can be a big enabler as well as an obstacle to agility. For instance, let’s say you were building a product catalog as part of a web application.
If you chose a relational database, it would be a fairly complex data model. The example above shows that you have a 32-line select statement to get a single product.
Contrast that with JSON, which has self-describing documents that have the attributes in the embedded schema. In the example above, you can see the difference between a document that describes a bike vs. a pedal vs. a jersey.
To select a single product, as in the slide above, requires a very simple statement. This is one of the reasons why JSON has become the de facto standard for web applications.
JSON is also fueling a lot of the data formats for the Internet of Things, such as machine-generated content and log files. IOT is projected to grow to 50 billion devices that are connected to the Internet by 2020. So with this amount of data, having a database to store the records and to perform real-time operations is a requirement. You can take the JSON documents, flatten them, and put them in a NoSQL key-value store, but a database that handles native JSON support is a better solution.
That’s why we introduced the first in-Hadoop document database, where you get the advantages of a document database combined with the scale, reliability, and the integrated analytics of our enterprise Hadoop platform. Now developers have an easy way to building scalable applications without having to worry about schema and a lot of data transformation and other intermediate steps.
In a way, this qualifies as the fourth major aspect of the transition. We’ll leave it to you to decide—go ahead and download the Developer Preview sample code and experiment for yourself.
However, many of you might not be at that stage—we’re all at different points in our big data journey, so part of our contribution to the community is to provide free, on-demand training. We have separate tracks for developers, data analysts, and administrators. Each track leads to its own certification.
In summary, we’re in the middle of the biggest change in enterprise computing, and we’re committed to helping you make history. Get started today by downloading the MapR Database Document Database Developer Preview.
In addition, be sure to view Jack’s Strata+Hadoop World interview with Mike Hendrickson, O’Reilly Media’s VP for Content Strategy. In the interview, Jack talks about the fact that companies are focused on this emerging trend, and discusses how companies are combining operational production data with analytics, which is impacting their business as it’s happening.
Ready to get started?