|Data Science and AI||Support for open APIs like POSIX, allow AI and ML to run on the same cluster as your analytics.||Access to cutting-edge technology like new Python ML libraries and containerized tools requires a separate cluster, resulting in data copies, security and lineage issues, and increased time to value.
Merger Dilemma: Will they ultimately support Data Science Workbench or IBM Data Science Experience?
|SQL||Open platform for SQL, including Hive on MR, Hive on Tez, Spark, Drill, Impala, JSON, and schema-less queries.||No open approach to SQL. No support for schema-less self-service data discovery or data exploration.
Merger Dilemma: Will they ultimately support Impala or Hive LLAP? Hive on Spark or Hive on Tez?
|Security||Secure by default. Unified, platform-level security. Built-in auditing, enterprise-grade encryption, expressive authorization, and flexible authentication.||No unified platform-level security. No expressive authorization. Not secure by default.
Merger Dilemma: Will they ultimately support Sentry or Ranger?
|Management||Unified, actionable, and intuitive way to manage all data via MapR Control System. Less management overhead with built-in multi-tenancy, disaster recovery, and high availability.||Critical areas like high availability and disaster recovery must be managed manually and explicitly, making it harder to administer the cluster in production scenarios.
Merger Dilemma: Will they ultimately support Cloudera Manager or Ambari?
|Governance||Enterprise Data Catalog powers governance across the enterprise, not just for your big data platform.
(Available through MapR today.)
|No support for governance across the enterprise. Cannot classify data automatically. No capability to rate or review data.
Merger Dilemma: Will they ultimately support Navigator or Atlas?
|Hybrid and Multi-Cloud||Built for hybrid and multi-cloud. Transparently synchronize all your data across all deployments. Global namespace provides a single view into all data wherever it is.||Cannot seamlessly synchronize data across on-premises, cloud, and edge deployments. No global namespace.
Merger Dilemma: Will they ultimately support SDX or DataPlane?
|Containers||Support for stateful containerized applications. Full platform integration with containers and Kubernetes.||No existing support for stateful containerized applications.
No Dilemma Because No Defined Offering.
|Real-Time Analytics||Unified platform to manage historical, operational, and real-time data with integrated high-performance analytics.||No end-to-end solution to manage or analyze operational/real-time data alongside analytical applications.
No Dilemma Because No Defined Offering.
(HDFS is not built for real-time. Kudu is behind and requires a separate cluster.)
To learn more about the MapR Clarity Program, click here
Does the Cloudera + Hortonworks merger leave you feeling unsure about the future of your big data environment? Whose HDFS will win? Well, you have to wait until they create and release their unity offering.
You may be required to perform a completely new and clean installation in order to keep working with their future product release. The unity offering will not be innovative or bring any new features or benefits to your business, because it is the merging of two code bases. Maybe you should consider MapR as a strategic part of your roadmap. Choose the patented and proven unified data platform from MapR.
We are highly differentiated from Cloudera, and because of this we have a number of capabilities that can help you achieve the vision you have for your business.
Cloudera requires a variety of different clusters of software, which forces vast quantities of data movement and duplication, all of which forcing greater latencies between phases of the data lifecycle. How about some quick math; Apache Hadoop, Apache Kafka, Apache Nifi, Apache Kudu, Data Science Cluster and Apache HBase for good measure, because most people will not run HBase on the same cluster as a production Hadoop system due to service level requirements. That is 5 or 6 different clusters, and that isn’t even counting if your business runs in multiple data centers. That number grows quickly with multiple data centers, because none of those clusters are built for multi-master capabilities. They all act independently of one another.
The MapR platform natively supports both big data applications and data science applications due to many of the aforementioned capabilities. We also support a variety of open APIs on top of our patented enterprise offering to deliver the most value and capability for the enterprise.
MapR supports the AI software development lifecycle, exploration, training, deployment and putting the models into a production environment. Including full support for real-time event streaming and hot-swappable models.
Our competition doesn’t like to share this dirty little secret, but their data science workbench requires a full copy of the data you want to work within your data science tools. This might seem ok for a couple hundred gigabytes of data, but what happens when you want to run on many terabytes of data? Consider all the data movement this entails. In addition, data copies lose lineage because it takes the data from HDFS to a standard Linux-based POSIX filesystem, which has a different security implementation.
Of course you want to and probably already do version your models with something like Git. But would you like to version your data? What about files and event streams? Database tables? Cloudera doesn't support that. MapR does. This is a major benefit of our platform. It is a homogenous data platform for all these different types of data.
As our industry sector is still largely focused on on-premise solutions for its data, the ability to deliver on-premise, hybrid and cloud implementations that all have an intrinsic and universal security model offers us a seamless future migration as and when this industry sector opens up to that transition.
David Hacker, strategic marketing manager for Edwards