4 min read
There’s been a lot of buzz and high expectations in the big data community around Apache Spark 2.0 and how it will impact the development of data pipelines, streaming applications, machine learning algorithms and all of the other use cases that Apache Spark is enabling.
Good news—the wait is now over! You can now get your hands dirty with Spark 2.0 on the MapR Data Platform. Whether you’re a data engineer, developer, or data scientist, Spark 2.0 has a broad range of capabilities that you can take advantage of. This release is currently in developer preview mode, which means that it’s not recommended for production use, and is not supported beyond the community forum.
To get the latest and greatest documentation on installing, upgrading, configuring, and using Spark 2.0 Developer Preview with MapR, please check out our Apache Spark documentation. If you’re new to Spark, the MapR Sandbox provides the easiest way to get started with Spark.
What are the new capabilities in Spark 2.0?
Spark SQL Streaming introduces the concept of repeated queries, wherein a particular query is executed repeatedly against every incoming batch of Dstreams (RDDs). Specific benefits that Spark SQL Streaming provides include:
So how does this benefit the development of analytical applications? With Spark on the MapR Platform, you can now build analytic applications that are external customer-facing applications. With Spark SQL Streaming, pre-computation of analytics in a continuous fashion can occur as the data is generated; this can then be served up on web applications by persisting the output in MapR Database.
Spark now acts as a compiler with whole-stage code generation, whereby once it parses the query, it understands what operations the user wants to perform and generates the code for these functions instead of retrieving them elsewhere. The two most important things to highlight here are:
We recently announced the MapR Platform including Apache Spark, which makes it easier for customers to start with Spark as their primary compute engine. This gives our customers a converged compute and storage engine for batch, analytics, and real-time processing that helps them build and deploy applications rapidly. Customers such as Terbium Labs have built cutting-edge applications with Spark on MapR. MapR also has Quick Start Solutions for use cases such as security log analytics, time series analytics, and stream processing that allow you to quickly get up and running with Spark as well as combine other components of the MapR Platform.
We would like to congratulate the Spark community on the 2.0 release and look forward to Spark 2.0 going GA on the MapR Platform in the coming weeks. In the meantime, we encourage you to start testing and experimenting with Spark 2.0 on the MapR Data Platform.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.