What's New in MEP 3.0

Provides a summary of the new functionality in MEP 3.0.

MEP 3.0 provides a series of stability and security fixes for Spark and improves the speed of ETL and batch processing with a faster version of Hive.

New Features and Additions

MapR Database OJAI Connector for Apache Spark
The MapR Database OJAI Connector for Apache Spark is a new API that makes it easier to build real-time or batch pipelines between your data and MapR Database and leverage Spark within the pipeline. This feature includes:
  • Two new APIs that allow you to load data from a MapR Database JSON table to a Spark RDD or save a Spark RDD to a MapR Database JSON table
  • A custom partitioner that allows you to partition data for better performance
  • Data locality: when the connector reads data from MapR Database, it uses the data locality feature of MapR Database to spawn the Spark executors

For more information, see Understanding the MapR Database OJAI Connector for Spark.

MapR Database Binary Connector for Apache Spark
The new MapR Database Binary Connector for Apache Spark allows you to write applications that consume HBase binary tables and use them in Spark. Features include:
  • Writing directly to HBase HFiles for bulk insertion into HBase
  • Spark SQL can draw on tables that are represented in HBase

For more information, see MapR Database Binary Connector for Apache Spark.

MapR Event Store For Apache Kafka C Applications (librdkafka)
As of MapR maintenance release 5.2.1, you can develop C applications for MapR Event Store For Apache Kafka. The MapR Event Store For Apache Kafka C Client is a distribution of librdkafka that integrates with MapR Streams.

For more information, see MapR Event Store For Apache Kafka C Applications.

MapR Event Store For Apache Kafka Python Applications
As of MapR 5.2.1, you can create Python applications for MapR Event Store For Apache Kafka using the MapR Streams Python client. The Streams Python client is a binding for librdkafka and contains support for high-level consumers.

For more information, see MapR Event Store For Apache Kafka Python Applications.

Key Upgrades

Apache Spark 2.1.0
Spark 2.1 in the MapR converged data platform brings improvements in enterprise-ready stability and security, including:
  • More than 1200 fixes on the Spark 2.x line
  • MapR-SASL support for encrypted Thrift-server connections
  • Scalable partition handling
  • Stable data-type APIs

For more information, see Spark Feature Support.

Apache Hive 2.1.1
MEP 3.0 provides a faster version of Hive to improve the speed of data-processing tasks, to reduce latency for interactive queries, and to increase throughput for batch queries. Key improvements include:
  • 2x faster ETL through an enhanced cost-based optimizer (CBO), faster type conversions, and dynamic partition pruning
  • New HiveServer UI with new diagnostics and monitoring tools
  • Dynamically partitioned hash joins, which provide unsorted inputs in order to eliminate the sorting step.
  • Vectorized query execution that greatly reduces the CPU usage for typical query operations, like scans, filters, aggregates, and joins

For more information, see Hive.

Apache Drill 1.10
Continuing on the iterative releases, Drill 1.10 is another important milestone for Apache Drill. Numerous enhancements have been added to this release for BI tool integration, end-to-end security, performance, and usability enhancements. Highlights of this release include:
  • Tableau native connectivity
  • Support for Kerberos and MapR-SASL authentication between the client and Drillbit
  • Support for the CREATE TEMPORARY TABLE AS (CTTAS) command
  • Ability to query data with Hue 3.12 (experimental only)
  • Improved compatibility with Hive/Spark-generated Parquet files

For more information, see the Drill Introduction.