Data and Application Portability Table Stakes for AI and ML

Contributed by

11 min read

Editor’s Note: The topic of this blog post and the customer use case it describes relate to material covered in a white paper you may find useful, Realizing the Full Potential of Your Cloud Investment.

As the ubiquity of the Internet of Things (IoT) grows, and the volume of data it generates explodes exponentially, we engage with more and more companies that hear the siren song of AI and ML, a song that promises greater operational efficiencies, value, and competitive differentiation by mining that data. But in order to take part, those companies struggle with how to build an environment that allows them to create, test, and deploy models against that ever-increasing mountain of data.

The first issue these companies often face is dealing with legacy applications that use a portion of that huge volume of data to run their current business. Their challenge is to evolve to an infrastructure where their existing applications can continue to access the data uninterrupted, allowing them to harness the economies of scale and flexibility of cloud compute and storage, while gaining new business insights through AI and ML applications built on this new platform.

A recent customer example illustrates this dilemma and how MapR helped them move to a position where they now have the power to use their data, employing new AI and ML techniques, to gain insights that can drive significant new opportunities for their business. Artificial intelligence for oil and gas is a huge potential market, expected to reach USD $2.85 billion by 2022 (MarketsandMarkets, 2018). Presently, North America is the largest market using AI in oil and gas, followed by Europe and Asia Pacific.

The Challenge

The company is a multi-billion dollar enterprise that owns and operates land-based and offshore drilling rigs around the world. In the oil and gas industry, the upstream operations are also known as exploration and production. It is where companies analyze data from the environment and determine the locations that have the highest probability of containing large reserves of crude oil or natural gas they can extract. Then, once they drill, they continually monitor the wells and environment to ensure that their operations are running at peak performance. Since many of these sites are located in remote areas, it is very attractive to use sensors, automation, and AI to decrease on-site operating costs. This area of the industry is quickly moving to an IoT approach. There already are millions of sensors in the field, with more being added each year. These sensors generate data used to make cheaper, faster, and more accurate decisions on where oil can be found and extracted.

The customer in question currently has hundreds of active rigs deployed in more than twenty countries. Each of the rigs has over 150 sensors, and the customer could see the possibility of growing to over 500 sensors per rig in the foreseeable future. With each sensor generating 10-20 readings per second, this expansion would mean a data flow rate growing up to ten thousand readings per second into the data center. This massive growth in data frequency was the main driver for them to find a platform that could handle that volume and beyond.

Prior to MapR, this oil and gas customer was using a proprietary system to communicate between the wells and the company on-premise datacenter. Once data reached the data center, they used a traditional RDBMS to store and analyze the time series sensor data. The collected data provided real-time performance and daily operational drilling statistics to their customers through a proprietary web-portal. Unfortunately, as their operations grew, and as the number of sensors per rig multiplied, they were finding that the legacy system was unable to handle the high-frequency data updates without incurring significant cost increases with their existing RDBMS solution. In addition to the cost increase, was the problem of not being able to deliver on their real-time drilling information needs. These challenges together provided impetus to explore newer, more modern, data platforms and architectures.

The Solution

The company’s research convinced them that they needed to move to a cloud-first strategy to help them contain costs and to give them the flexibility to utilize new technologies in AI and ML. They were relatively new to “big data” technologies, but recognized this approach as a valuable direction for sustained innovation and continued support for their customers. Their vision was to become the leading platform provider delivering real-time operational information in addition to views of the historical data to allow customers to make better-informed strategic decisions on oil-rig usage and deployment. The solution they picked to achieve this was the MapR Data Platform. Here’s why.

After meeting with MapR, the customer could see that by deploying the MapR Data Platform they would be able to:

  • Handily manage the data growth they were anticipating from the edge (rigs) to their datacenter.
  • Get real-time access to the raw data, enabling just-in-time insights across the entire drilling fleet.
  • Use MapR’s high availability and fault-tolerant data platform to reduce outages and eradicate data loss.
  • Have access to new and evolving ML and AI tools to extract new insights and value from the increasing stream of data being collected from the rigs.

The combination of these features would improve the customer experience and satisfaction by providing more timely and better insights, improve uptime of the rigs, and lower the overall operational costs.

They also were attracted to the MapR Data Platform for its API and programming language extensibility. They were impressed with the comprehensive MapR documentation and examples that illustrate MapR Database (JSON) as having the most API and programming language independence of any available platform. (Examples can be seen here.)

The Outcome

For the initial deployment, the company decided to do a pilot implementation with 25 of their rigs to prove out the architecture. They used MapR Event Store for Apache Kafka to ingest the real-time time-series data from the 150 sensors per rig into MapR-DB JSON on the cloud. The pilot implementation demonstrated that they were able to handle 4.5 million data points per second per node on a 5-node cluster.

The customer’s legacy system was made up of primarily custom C# code for the applications running on the rigs. After researching the various migration options, they decided to use librdkafka, which is the C/C++ library for Kafka that MapR supports for MapR Event Store (Streams). This approach gave them the ability to write all of the various sensor data information to one common data destination on MapR while still maintaining their current environment and expertise in C/C++ at their edge configuration.

For the POC, the stream ingestion from the rigs was done with MapR stream/topic via Kafka REST interface. The streams were then consumed into a MapR-DB JSON running on a MapR Data Platform cluster that ran on the cloud provider platform. This replaced the traditional RDBMS that was running within the customer’s data center. The customer then used the Data Science Refinery and Apache Drill to do the queries and analysis on the real-time information that then is published on their web portal dashboards for their clients.

The POC successfully demonstrated:

  • Implementation of a high-performance database for time series and event data capable of storing and retrieving millions of sensors data per second with a milliseconds response time.
  • Migration of the existing data and database and made them available for integration into new applications.
  • That the MapR Data Platform could sustain read and writes in milliseconds with average concurrent sessions ranging in hundreds.

Having finalized the migration of their data to the MapR Data Platform, they have access to a modern dataware layer, and their data is now readily available to a broad set of AI and ML applications and tools. By making use of these tools, they are able to provide their customers with more insights into how best to deploy and manage their oil-rig resources. Whereas in the past, their main value was primarily in providing and managing the physical rigs, they now have the ability to harness the data coming from the operations of the rigs to provide additional value for themselves and their customers. Harnessing the value of their data increases their value to their customers and their competitive differentiation in the oil and gas industry.

They also now can more easily move their data between their on-premises data center and the cloud and even between cloud providers. With the MapR Data Platform, they have enabled a trusted and secure, global data fabric. Since this company operates in twenty countries, they have the ability to upload the information from any of their rigs to a local cloud instantiation and still synchronize the view and accessibility to anyone in the world given the right authentication and authorization. This flexibility can help them control costs and better meet the needs of their worldwide customers.

The final outcome is that the customer has migrated to a modern infrastructure with minimal disruption to their existing operations due to the portability offered through the MapR Data Platform. They are now positioned to take advantage of all the new tools and platforms being created for AI and ML that will work on the data they are generating and storing in MapR.

Additional Resources

White paper: Realizing the Full Potential of Your Cloud Investment

Blog post: Practical Tips for Data Access and Machine Learning Tools

eBook: Buyer's Guide to AI and Machine Learning

eBook: AI and Analytics in Production

Free on-demand training: Introduction to Artificial Intelligence and Machine Learning

This blog post was published May 16, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now