Observations on Customer Journeys on Hybrid Cloud Deployments, Kubernetes, AI/ML, and Real-Time Analytics and Security with a Next-Generation Data Platform

Contributed by

8 min read

We just finished our customer advisory board meetings in the EMEA region. The meetings were highly productive with in-depth product and technology conversations and great feedback about the products MapR has already delivered and innovations lined up for 2019 and beyond. First of all, I would like to thank our amazing customers – Paysafe, Societe Generale, Criteo, Edwards Vacuum, and others – for the continued support.

Let me provide a summary of the trends we are seeing and the feedback we heard.

1. Containerization and Kubernetes are prevalent, not in experimentation but in production, and a data platform is critical for success.

Main drivers for this adoption include:

  • Overall simplification of infrastructure and operations
  • Flexibility for users to run many different types of existing applications and new workloads, such as machine learning/AI on the same data platform, without creating copies of the data and running siloed environments
  • Ability to deploy and use multiple versions of analytic ecosystem frameworks, such as Spark, seamlessly on the same cluster (e.g., Spark 2.2 for production and Spark 2.3 for development)
  • Isolation of dev, test, and production compute environments, again on the same data cluster

MapR invested in containers/Kubernetes early, and we are excited to see that many of our customers are already able to solve several of the use cases by leveraging the MapR Persistent Application Client Container (PACC) and Kubernetes volume driver on the POSIX-compliant MapR Distributed File and Object Store and to reap the advantages of built-in multi-tenancy. While we speak, our engineering teams are at work, actively building deeper integrations between MapR and K8s to open up a new world of containerized big data environments. Stay tuned.

2. Global data fabric to support hybrid environments is not a vision of the future – it’s real, and it is happening now.

The need for global data fabric is inevitable in mature customer deployments. For IoT/edge scenarios, the philosophy is ‘act locally and learn globally.’ You want to collect the information from the ‘edge,’ perform local processing on the data, and then move data to a central environment for deeper analytics and machine learning. The intelligence/insights from ML often are sent back to edge clusters for smart and timely decisions in-place. The support for hybrid environments, spanning on-premises and multi-cloud, arises due to capacity and cost considerations. A global data foundation ties all these deployment environments with a unified namespace, which users will leverage to build a broad set of applications, hence allowing the portability across different infrastructures.

Since day 1, MapR has been uniquely built for hybrid/multi-cloud environments and has provided the ability to transparently synchronize data between on-premises and one or more clouds. The data storage platform provides customers with the ability to optimize for performance, capacity, and cost. The data sharing and distribution is simplified through built-in event streaming capabilities. The entire fabric is exposed through a global namespace on which customers use a variety of open APIs to build AI/Analytics and mission-critical business applications. Many of our customers are running with MapR in these hybrid environments, and we got amazing feedback around further data management and cloud experience innovations to keep customers ahead in their journey.

3. Security is make-or-break for next-gen data platforms – yes, this is security at the platform level, not as a bolt-on feature.

With 100s of PBs of data, spanning across many environments, security cannot be an afterthought. MapR in our customer environments is not an auxiliary but rather a mission-critical part of business, holding transactions data, customer data, and more. Security in these environments is critical and must apply at scale.

We are excited that MapR customers continue to appreciate our existing capabilities to provide consistent security controls, such as standards-based authentication, granular access controls with access control expressions, and, most recently released, secure by default and audit events as streaming capabilities. Our further goal for security is to simplify these controls for 100s of PB-scale data, while optimizing for data governance for business users. We got very positive feedback on that direction.

4. Machine learning must leverage the data platform and get to production seamlessly – silos of AI/ML cannot achieve this workflow.

Our customers are using and able to leverage full data in the MapR Data Platform for their machine learning workloads. The NFS/POSIX interfaces allow them to run a broader set of ML frameworks directly on the data in MapR, without limiting their ML to Spark-based workflows and without having to create data silos for AI/ML needs. Being able to run containerized applications with MapR as the persistent data platform (see the first point, above) is an enabler here to ensure these workloads can run securely on MapR. One key problem we hear is that customers are still struggling with ML logistics to get the models into production. The MapR Data Platform already provides building blocks to simplify ML logistics, and we want to make the end-to-end ML workflows more simple by connecting the dots for users with simplified toolings as they move their AI initiatives to production. We are excited to see the urgent need and opportunity for such a solution in customer environments.

5. Businesses must move from cold to hot data – real-time analytics and real-time decisions are key for customer-focused innovation.

For decades, companies had 100s–1000s of non-integrated applications in their own database infrastructure, coming together typically only at the end of day into an ODS for downstream data processing and a data warehouse for BI. This rear-mirror view of business prohibits the businesses from offering the best customer experiences and making in-the-moment business decisions.

With a read-write storage system as its foundation, MapR has always been used by our customer base as the platform for operational data. Our recent innovations with MapR have been on a mission to enable real-time analytics and rich, insight-driven smart applications on our platform. Many of the features we delivered with our MapR Database with JSON data model (and secondary indexing) and our MapR Event Store for Apache Kafka, supporting new patterns of using a system of record and deep integrations with processing frameworks, such as Hive, Drill, and Spark Streaming, to solve analytics, all fulfilling our vision. Our customers like the reliability and power of the MapR Data Platform to trust their operational analytics and mission-critical business processes on. Our progress in this area resonates and the capabilities we built in the past few years are actively being used by our customers.

Overall, the sessions were inspiring, and it’s exciting to be on the journey with these customers and build next-gen data platforms for the brave, new challenges.

To get started, learn about recent innovations in the new MapR 6.1 Capabilities.

This blog post was published October 15, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now