4 min read
This blog post was jointly written by Cloudera (Alex Gutow), Intel (Weihua Jiang), and MapR (Nitin Bandugula) – all companies that are part of the Hive-on-Spark Team.
As one of the most popular tools in the Apache Hadoop ecosystem, there’s been a lot of noise made about Apache Spark – and for good reason. It complements the existing Hadoop ecosystem by adding easy-to-use APIs and data-pipelining capabilities to Hadoop data, and the project support continues to grow. Since its launch in 2010, Spark has seen over 400 contributors from more than 50 different companies.
This true community effort has secured Spark’s place as an open standard within Hadoop. With a robust engineering focus, its quality and popularity have ensured its portability, with support from all the major Hadoop vendors. Its production use has also led to the development and certification of Spark applications by the leading software companies – opening up Spark to more use cases and users.
One of the most exciting projects around Spark is the community coming together to improve batch processing with Spark as the execution backend. As a powerful batch processing engine, Spark will not only improve the performance of several popular projects such as Apache Hive, Apache Pig, and Apache Sqoop, but will also drive standardization as an execution backend – making management and development more efficient. Back in July, Cloudera, Databricks, IBM, Intel, and MapR announced an industry-wide collaboration to port the open source, MapReduce tools to support Spark. Since the initial announcement, there has been a lot of progress towards making this a reality. Here’s a look at what’s been accomplished since:
Also Coming Soon
Spark has come a long way at an impressive rate, thanks to the community rallying behind it as an open standard in Hadoop. With such robust developer support, we expect to see continued advancements around Spark, especially as it continues to progress as a standard execution engine for key workloads.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.