The MapR Clarity Program Is Your Clear Path to AI, Hybrid and Multi-Cloud, Containers, and Operational Analytics

Contributed by

9 min read

MapR is excited to announce the Clarity program, comprising our latest product release MapR 6.1 and our StepUp program. StepUp is a free assessment service that provides a comprehensive understanding of your current data environment and gives you best practices to achieve a clear path to support AI, cloud, containers, and IoT deployments.

Why Clarity?

Recently Cloudera and Hortonworks announced their merger. The merger raised concerns on the viability and future of on-prem Hadoop/big data technologies in the context of the growing usage of cloud. It also caused confusion for current customers of these two companies in terms of what the future will look like with these product stacks combined and what tradeoffs will be made.

All these concerns are valid. As MapR CEO John Schroeder said in his blog, from a market standpoint, commodity Hadoop falls short in meeting the demands of advanced analytics and AI/ML. Neither Cloudera or Hortonworks are built to meet the evolving needs of these workloads or modern deployment requirements, which are largely being defined by cloud paradigms. Beyond these foundational issues, Cloudera and Hortonworks have several redundant competing technologies, for example Ambari and Cloudera Manager, or Sentry and Ranger. The merger announcement says these redundant technologies will be "unified," meaning some will be discontinued. This will cause customers pain and impose expensive switching costs.

In the midst of this confusion, MapR offers clarity in the market. MapR was founded with the vision of building a next-generation data platform that caters to the evolving needs of workloads from Hadoop to Spark to AI/ML and the progression from on-premises to cloud, multi-cloud, edge/IOT, and Kubernetes. For more than nine years, we have been investing heavily in engineering to deliver this vision. Our strong vision and intellectual property have enabled us to separate our offering from the commodity Hadoop distributions, and many customers have chosen to replace Hadoop-only big data platforms with MapR to build a better long-term data strategy. The Clarity program shows how we can help you use data and navigate the journey to AI/ML, cloud, and containerization more effectively.

So what are the areas in which MapR provides clarity?

MapR Clarity Merger Concerns
Data Science/AI
  • Data science workloads run in the same cluster as traditional analytics. No silos for AI/ML use cases.
  • Users have secure access to all data in place - thanks to open APIs like POSIX and container volume plugins.
  • Robust CI/CD-enabled model deployment pipelines.
  • On-prem solutions from Cloudera and Hortonworks are limited to Spark-only data science. Gaining access to Python algorithms means using a separate cloud solution. Creating separate clusters means data copies, security issues, increasing cost, and other downstream impact.
  • Competing product investments with Cloudera Data Science Workbench vs. Hortonworks’ strategic partnership for IBM DSX (Data Science Experience).
  • ML logistics support is limited without integration with containerization frameworks like Kubernetes.
SQL, Tools
  • Offers open platform for SQL – Hive on MR, Hive on Tez, Spark, Drill, and Impala.
  • Supports traditional analytics use cases and a strategic focus in enabling schema-less data exploration on all types of data.
  • Offers SQL access across historical, operational, and real-time data.
  • Competing product investments between Impala vs Hive LLAP, and Hive on Spark vs Hive on Tez.
  • No support for newer use cases such as data exploration and SQL analytics on operational/real-time data.
  • Provides enterprise-grade security at the platform level.
  • Offers unified security across files, tables, and streams.
  • Covers a range of security capabilities:
    • Ubiquitous data protection
      • Encryption for data at rest and in flight
    • Flexible authentication
      • Wire-level authentication across all services, integration with LDAP/AD and other directory services, Kerberos support
    • Granular row and column-level authorization
      • Access Control Expressions at all levels
    • Robust auditing – high-performance audits for data access and administrative actions
    • Competing product investments in security with Ranger vs. Sentry. Customers will soon be forced to decide on the solution selected by the combined company.
    • Security is an add-on on top the platform. This means every new project needs a way to interface with these tools, making it complicated to bring new capabilities onto the platform.
    • Lack of impersonation across key query frameworks, such as Impala, causes compliance concerns. You can not see/track the actual user actions directly.
    • MapR offers a unified control system to install/config/manage clusters.
    • With fewer services to administer, built-in HA, and autonomous data management, MapR is easier to administer and reduces TCO as a whole.
    • The advanced platform-level management features such as multi-tenancy, quotas, data placement, snapshots, mirroring, and quality of service make it easy to manage PBs of data, thousands of tenants, and hundreds of thousands applications without missing SLAs or data loss.
    • Huge overlap area with the merger.
    • Two competing management interfaces with Cloudera Manager (proprietary) and Ambari (open source) need to be unified. Customers will have to switch.
    • The platform is hard to administer manually at extreme scale despite using any of these tools.
    • Governance is treated as an enterprise-wide challenge.
    • MapR Enterprise catalog offers a full set of technologies supporting platform-based data security, machine learning discovery, data tagging, data rating, data lineage, catalog searching, data dictionary, and data lifecycle management for all data residing on MapR as well as across the enterprise in other data sources.
    • Navigator and Atlas are built on a shaky foundation and have major limitations:
      • No governance support across the enterprise, such as for relational data
      • Data is not classified automatically
      • No ability to rate or review data
      • End-to-end lineage can not be inferred
    • Conflicting investments in governance (Atlas vs. Navigator) means forced migration for customers.
    • Several projects are interconnected, such as Atlas with Ranger and Navigator with Sentry.
    Core Platform
    • Battle-tested in Fortune 100 enterprises on thousands of nodes in clusters that have never gone down for years, demonstrating reliability at scale.
    • Unified platform to manage historical, operational, and real-time data with integrated high-performance analytics across the platform.
    • Built-in multi-model NoSQL database for business-critical applications with consistent SLAs without compromising on data consistency.
    • Built-in global publish and subscribe messaging system.
    • End-to-end JSON flexibility.
    • Platform reliability, scale, performance/SLA concerns continue even with the merger.
    • No end-to-end solution in either Cloudera or Hortonworks to manage and analyze operational/real-time data.
    • Analytics only; cannot run mission-critical business apps on the same platform.
    • Competing investments with Hortonworks using NIFI/Druid to address real-time analytics use cases and Cloudera investing in Kudu/Impala.
    • Streaming not built-in; requires separate clusters.

    MapR also offers a superior platform for areas in which Cloudera and Hortonworks have made little to no investment.

    MapR Clarity Merger Concerns
    Hybrid, Multi-Cloud, and Edge
    • Built for hybrid/multi-cloud environments from the get-go.
    • Transparently synchronizes data between on-prem and one or more clouds. No third-party solutions or manual effort required.
    • Ability to optimize for performance, capacity, and cost with data tiering and S3 API support.
    • Built-in streaming for data sharing and distribution.
    • Low footprint MapR Edge offering for IOT use cases.
    • Global namespace across deployments.
    • Open and consistent APIs for application portability.
    • Infrastructure agnostic with containerization.
    • Automated movement of data across edge, on-premises, and cloud.
    • No ability to synchronize data across on-premises and cloud/multi-cloud environments.
    • No global namespace.
    • Competing early stage cloud investments with Altus, Dataplane service, and HDInsights, each with different focus points.
    • No edge offering.
    • Fully integrated containers and Kubernetes platform.
    • Container-optimized POSIX client to connect containerized applications to the MapR platform through the PACC client.
    • Persistent storage to create and deploy stateful containerized applications on MapR.
    • Hadoop Ecosystem, data science containerized tools on same MapR cluster.
    • Bring your own container to MapR.
    • No current focus on supporting containerized applications on either Cloudera or Hortonworks.
    • Any new joint investment to enable cloud-native deployment models is likely to be impacted/delayed by the merger.

    What Should You Do Next?

    Learn more about the Clarity program. We also encourage readers to register for the webinar to hear more about the MapR Data Platform and the StepUp data assessment offering.

    To get more information about MapR vs. Cloudera, go to our vendor comparison page here.

    This blog post was published November 07, 2018.

    50,000+ of the smartest have already joined!

    Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

    Get our latest posts in your inbox

    Subscribe Now