The rise of mobile technology and Internet of Things (IoT) sensors in cars, industrial equipment, and medical devices etc. has resulted in an unlimited abundance of data. These data are increasingly semi-structured and unstructured in nature and are typically stored in Hadoop and NoSQL systems. New actionable insights from these fuel the business growth and competitive advantages for successful companies. However, for big data to deliver on the promise of its vast potential, technology must be in place to enable organizations to not just capture and store data but also quickly gain new insights that could be used in many ways to improve business performance. With this joint solution between MapR and Mellanox, data scientists can analyze large and diverse data sets faster and formulate new business hypotheses quickly and efficiently.
“The MapR Converged Data Platform integrates the power of Hadoop, Spark, NoSQL, and streaming for developing and running innovative data applications. This partnership between MapR and Mellanox helps customers get even more output and value from their big data deployment.”
— Steve Wooledge, VP of Product Marketing, MapR Technologies
The MapR Converged Data Platform
As a technology leader in Hadoop, MapR Technologies provides a converged platform that includes a complete distribution of Apache Hadoop. MapR continues to be the fastest, and leads in features and capabilities among major Hadoop distributions. The MapR Converged Data Platform integrates Hadoop and Spark with global event streaming, real-time database capabilities, and enterprise storage for developing and running analytics applications. The MapR Platform Services layer, which includes the MapR Distributed File and Object Store and NoSQL database (MapR XD and MapR Database, respectively), provides several enterprise grade features including high availability, data security, and easy integration with NFS. The platform supports a wide range of applications that process structured and unstructured data stored in files as well as NoSQL database. This single data platform for all big data workloads provides the best ROI for hardware use.
The high performance data platform from MapR enables business analysts to formulate new business values quicker than before using different big data applications. These workloads could be SQL based (using Hive, Impala, Spark SQL, etc.), the newer class of NoSQL-based (using MapR Database), or the more advanced applications using machine learning, graph, and streaming analytics. Due to the advantages of MapR-FS and the increasing usage of flash storage in production grade Hadoop environments, network bandwidth often becomes the bottleneck during data ingestion and analytics. Investing in faster servers and flash storage for MapR does not make sense if performance is restricted by the network. Mellanox provides the most efficient end-to-end Ethernet network tailored for Big Data applications at 10/25/40/50/56/100GbE. Mellanox Ethernet switches feature consistently low latency and can support a variety of non-blocking, lossless fabric designs. Furthermore, Big Data applications utilizing TCP or UDP over IP transport can achieve the highest throughputand application density using hardware-based statelessoffloads and flow steering engines in ConnectX®-3 Pro network adapters. These advanced stateless offloads reduce CPU overhead in IP packet processing, allowing completion of heavier analytic workloads in less time in the Big Data cluster.
In large scale-out software ecosystems such as Hadoop and NoSQL, delivering high availability and low-downtime of the cluster becomes a crucial challenge for the datacenter administrators when deploying a wider ecosystem of Big Data software such as Spark, Drill, and Sqoop in addition to MapReduce applications. MapR adds to the ease of management with capabilities not found in other Hadoop distributions that simplify business continuity, tiered storage, security, and provisioning. With MapR Control System, the browser-based administration tool, administrators can manage and monitor the MapR deployment to ensure a reliable, high performance system. Mellanox NEO™ provides the most advanced network orchestration, automation and monitoring platform for IT administratorsto get 360 view of the entire network. Furthermore, with the One-Click feature in NEO, data center administrators can easily configure their large scale-out network in simple template thereby allowing them to automate replication of network state whenever new data nodes are added in the cluster.
Although Apache Hadoop offers a powerful tool for analyzing large and diverse data sets, deploying a successful, reliable and high-performance infrastructure can be daunting challenge. The MapR Distribution for Hadoop provides critical technology advances to make Hadoop implementation easy, dependable, and fast for production deployments. By taking advantages of Mellanox’s advanced big data offloads, IT organizations can no longer have to choose between performance and reliability. With these integrated capabilities, businesses can achieve the competitive advantages of big data analytics faster with less risk and with the confidence of an enterprise grade ecosystem.