8 min read
As I discussed in my keynote presentation at Strata + Hadoop World in San Jose last week, companies need to do more than merely report on business results. There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.
The reality is that real time is required across the entire process: it begins at the time data is collected, and continues until the business action is taken. If we can compress that time-to-action timeframe, it can form the foundation for some truly foundational applications.
Let’s take a look at some examples. These companies use high-frequency decisioning applications to make small, automated adjustments to:
is one of the fastest-growing advertising platforms in the industry. With nearly seven billion transactions per day, Altitude Digital is able to select, in real time, the best video advertisement to play at the right time for the right person.
is a $2 billion hospital chain that’s moving to an event-based platform that handles real-time patient data, medical histories, and other data to improve patient care.
is improving yield management by leveraging real-time machine sensors for vibrations, heat, etc. The company can determine quality problems and correct issues must faster with this real-time analysis.
is a $23B multinational company based in Houston, and is a leading worldwide provider of oil equipment, components, and services. NOV is using the MapR Platform to perform real-time analysis to optimize oil and gas drilling and production.
leverages big data to identify potential fraud when an American Express Card is used anywhere in the world. Their platform protects $1 trillion in charge volume every year - determining in less than 2 milliseconds if the charge is fraudulent or not.
These applications are just a few examples of the real-time applications across industries that are transformational. Whether you refer to them as high frequency decisioning or operational agility is not important; the key question is, how do you enable these applications? First of all, it requires a new approach.
Traditional Applications Dictate Data
Traditionally, we’ve taken an “application first” approach: you start with the application and determine the data requirements. You then prepare the data into specialized schemas to serve the application. Each of these applications has their own dedicated silo, and the result is that you have a proliferation of silos. In fact, the average company has hundreds of data silos throughout their organization. Gartner refers to this as the biggest challenge for data management in organizations. The promise of big data is to centralize this into a data lake and bring the processing to the data.
Hadoop enables organizations to collect data into a centralized data lake. However, with the growing complexity of big data, we’re actually seeing the separation of data into specialized clusters: a cluster for ingest, a cluster for streaming analytics, a cluster for database operations, and another for deep analytics. We’re starting to create the same silo problem only with different technologies.
In order to eliminate data silos and enable these real-time, transformational applications, you need to focus on two areas.
The first is a Data Platform that eliminates separate clusters and enables applications to benefit from all data. By eliminating silos, you can have all of your data available for a wide variety of data manipulations for your application. Every piece of data can be considered a “first-class citizen”: structured, unstructured, data-in-motion, and data-at-rest.
Data-in-motion represents the second area you need to focus on: event-based data flows. Examples include web events, machine sensors, biometrics, and mobile events. Operational agility is really the ability to quickly analyze and understand this flowing data in context. The context comes from understanding the long-term trends and patterns that you see in data-at-rest, as well as leveraging newly arriving data. Keep in mind that data-in-motion and data-at-rest need to work closely together; they’re not fundamentally separate.
In fact, we have a tendency to talk about big data when it’s at rest, marveling at its volume and variety, and forget that it was typically created one event at a time, whether that’s social/mobile interactions, machine-generated data, or customer transactions. Harnessing these data flows and understanding their meaning and context are key building blocks for applications.
Event-based data is what drives applications. Whether it’s collecting machine sensors to predict and prevent failures, providing key offers to customers, or identifying and preventing fraud before it happens, all of these use cases are enabled by event-based data flows and a converged platform.
One example of a company that is using event-based data to drive applications is Liaison Technologies. They wanted to simplify and speed the flow of healthcare information for their customers. Their answer is a converged platform and to treat the electronic medical record as a stream. The stream itself is a system of record, and the publish/subscribe paradigm serves their “customers” – the hospitals, clinics, patients, physicians, and payers. These customers are informed in real time as updates are happening, and their applications consume that data in the format that makes sense, whether that’s a table for a database operation, or an index for a search function. The result is a simplified process that is much faster, with real-time applications featuring integrated security to protect privacy and fulfill HIPA compliance.
The winners will not necessarily be the companies with most data; the true winners will be those companies that demonstrate the most data agility. Remember that the key to agility is having both converged data and converged processing, which includes event-based data flows.
When it comes to big data, it truly is time to get real.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.