For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate and user-generated data, from nearly unlimited sources - the first step is choosing the best IT infrastructure.
As a technology leader for Hadoop, MapR provides an enterprise-class, high-performance big data solution that can be quickly developed and is easy to administer. With significant investments in architectural innovation, the MapR distribution delivers more than a dozen tested and validated Hadoop software modules over a fortified data platform (Figure 1).
The joint solution of MapR Hadoop paired with Cisco® Application Centric Infrastructure (ACI), Cisco Unified Computing System™ (Cisco UCS®) and Cisco Nexus 9000 Series switches working together, can instantly and cost-effectively scale capacity and deliver exceptional performance for the growing demands of big data processing, analytics, and storage workflows. For larger clusters and mixed workloads, ACI uses intelligent, policy-based flowlet switching and packet prioritization to deliver:
Adoption of Apache Hadoop for big data workloads has tremendously increased the amount of data that enterprises store in data centers. Hadoop’s promise of inexpensive and scalable storage and its highly scalable computational capabilities have changed the IT industry. Organizations can scale, with relative ease, from a few nodes to a few hundred nodes.
As the number of nodes increases, so does the workload burden on the network fabric that interconnects all the nodes. To avoid bottlenecks, fabric bandwidth and throughput are critical to helping ensure that all network pipes remain clear to facilitate data movement and data and analytic processing. As a result, fast growing scale-out requirements are pushing data centers toward 10- and 40-Gbps network access and aggregation layers.
At the same time, big data clusters are also evolving. The single-process batch-and-store jobs of the past have given way to multiprocessing, in-memory databases. Hadoop became popular so quickly because it allowed organizations to run a job in minutes instead of requiring days as with traditional approaches. Now organizations are asking whether these same jobs can be run in seconds. They are also asking whether growing workloads can still be completed quickly, and whether larger clusters can be set up easily.
Organizations need to look closely at traditional network approaches to massive big data workloads to determine whether they really deliver the value for their size. Traditional approaches do not really help with modern big data workloads because packet interference and bottlenecks can grow exponentially. What organizations need are innovative solutions to intelligently manage provisioning, data flow, visibility, and instrumentation. Modern data centers require networking solutions with these properties:
Within a cluster interconnect fabric, multiple links are available to funnel traffic. Multiple links for traffic can operate either on a per-flow basis or a per-packet basis.
Unlike per-flow or per-packet switching, Cisco ACI fabrics use a novel approach defined as flowlet-based switching. Typical TCP flows often have gaps between packets. Cisco designed the ACI fabric to use these gaps and divide a single flow into a number of flowlets, which are smaller portions of the TCP flow. Flowlets then become bursts of packets (from a single flow) routed independently.
To achieve performance optimization, the intelligence of ACI determines whether the time required to split a flow and switch flowlets across separate paths is less than the time required to switch the original flow intact but with large gaps. If the time is less, then independent flowlets are switched onto different paths to travel from point A to point B. At the same time, while still making dynamic decisions about switching, the ACI fabric avoids packet reordering.
Thus, flowlet switching, with fabricwide congestion awareness, helps overcome network bandwidth utilization limits commonly seen with traditional (Equal-Cost Multipath (ECMP) based) multilink network designs, which typically use hashing algorithms to determine link paths (Figures 2 and 3).
To achieve load balancing, Cisco ACI uses real-time path congestion metrics. Two dynamic load-balancing (DLB) modes are available, applied according to the amount of gap required to detect the start of a new flowlet:
Uses large inter-flowlet gaps so that packet reordering is avoided
Big data workflows typically are characterized by large to extremely large data sets. However, when you consider the entire data workload environment - from data ingestion, to data protection, to processing of MapReduce jobs, to data analysis - the data mix is a wider cross-section. This cross-section includes small and medium-sized data workloads. Workloads may also range across those with high, medium, and low database processing urgency.
With traditional fabric interconnects, small and urgent data workloads, such as database queries, may suffer processing latency delays because larger data sets are being sent across the fabric ahead of them. This approach presents a challenge for instances in which database queries require near-real-time results.
Cisco Nexus® 9000 Series Switches with Cisco ACI increase performance by prioritizing small workloads for processing, resulting in lower-latency performance throughput (Figure 4).
With ACI capabilities, the result is faster throughput for mixed MapR cluster data workloads, data sets, and data urgency levels. Latency-sensitive operations are prioritized over bulk transfers, such as file-system replication, or batch analytics.
The DLB and packet prioritization capabilities of Cisco ACI complement the big data analytics and storage of MapR-based infrastructure. You can optimize performance throughout all layers of the joint solution. From data ingestion to data analytics, with the MapR and Cisco combined solution you can deliver with confidence a range of possibilities to meet the needs of business executives, managers, and users and data scientists