Enterprise-Grade Spark: Leveraging Hadoop for Production Success

4 min read

Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications. This is where Hadoop distributions such as MapR play an important role in the value of Spark in the enterprise.

Production Success with Spark on MapR

MapR has been a key partner of Databricks and the Spark community for about a year now—yet another anniversary to celebrate. MapR is still the only Hadoop distribution on the market that provides 24x7 enterprise SLA support for the complete Spark stack including MLLib, GraphX, Spark Streaming, Spark SQL and the Spark Core.

Complete Spark Functionality

Customers can expect the same product quality, reliability, and support for Apache Spark as they receive for their entire MapR Distribution including Hadoop. With support for the complete stack, data pipelining and transformations for complex workflows can be accomplished effortlessly without switching out of the Spark context. For instance, Razorsight – a MapR customer, who provides cloud-based predictive analytics services, relies on the complete Spark stack on MapR for rapid application development and to ease the coding and maintenance of complex ETL workflows. Spark on MapR enables Razorsight to help communications services providers maximize revenue by evaluating customer value, pricing, marketing, retention, and network investment.

Advanced Real-time Operational Use Cases

The MapR Distribution including Hadoop has established itself as the most production-ready Hadoop distribution in the market, and we have noticed that MapR users of Hadoop are far more advanced in the usage of Spark in production. For instance, MapR Database — the integrated in-Hadoop NoSQL database in the MapR platform—in combination with Spark, provides the fastest and most reliable solution for real-time analytics and data-driven applications. Large healthcare providers and retail analytics firms alike are deploying this combination to their advantage.

Fastest, Most Efficient On-Ramp to Big Data

Spark on MapR delivers the combined benefit of MapR reliability and efficiency along with Spark’s own performance and faster development cycles—allowing for the most efficient ramp-up for new big data projects. In simpler terms, Spark on MapR is the optimum platform to build and grow your Hadoop clusters easily and efficiently. Novartis shares its story here on why they chose Spark on the MapR Distribution.

Spark + Hadoop = Real-time Big Data

Spark provides a unified data processing framework for a range of functionality, and works in the background with many storage platforms, including both relational and big data. However, Hadoop has been the most prominent storage platform for production deployments of Spark today stemming from Hadoop having established itself as the preeminent big data platform over the last few years. We now find that the Hadoop users are essentially augmenting their production clusters with Spark use cases. For instance, Xactly, a MapR customer who provides sales performance management analytics, transitioned a good percentage of their existing MapReduce jobs to Spark for easier application development and maintenance as well as overall performance.

We notice that beyond the performance advantage, Spark’s ability to reduce development timelines in building complex data pipelines through easy-to-use APIs plays a vital role in Spark adoption on Hadoop.

Download the new whitepaper from Forrester on the promise of Spark.

Click here for more information on Spark on MapR.

Learn More:

This blog post was published May 15, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now