Big data provides an enormous wealth of information to your organization. But to gain the most benefit, you need to manage it efficiently. And you must make sure that all this data is separated and isolated so that each set of users can see and work on only the data that they are authorized to use.
Organizations seek to share IT resources cost efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multitenant technologies.
Multi-tenancy is the capability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same view of the system. Hadoop is an enterprise data hub, and it demands multi-tenancy. Big data platforms are increasingly expected to support multi-tenancy by default. Multi-tenancy requires isolation of the distinct tenants: both the data in the data platform and the computing aspect.
To support, solutions need to:
The Cisco UCS® Integrated Infrastructure for Big Data solution includes computing, storage, connectivity, and unified management capabilities to help companies manage the dramatically increasing data that they must cope with today. It is built on Cisco Unified Computing System™ (Cisco UCS) infrastructure using Cisco UCS 6200 Series Fabric Interconnects, (optional) Cisco Nexus® 2200 platform fabric extenders, and Cisco UCS C-Series Rack Servers. Installed in pairs, the fabric interconnects offer redundant, active-active connectivity and embedded management using Cisco UCS Manager.
MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide you with a broad set of big data capabilities. The MapR platform provides enterprise-class features such as high availability, disaster recovery, security, and full data protection. It also allows Hadoop to be easily accessed as traditional network attached storage (NAS) with read-write capabilities and multitenancy.
The MapR Distribution offers multitenancy from the start. It provides powerful features to logically partition a physical cluster to provide separate administrative control, data placement, job processing, user quotas, and network access. Volumes—a unique feature in MapR—are the foundation of multi-tenancy. Volumes provide a way to organize data and apply different policies to different data sets, applications, and users and groups. A single cluster can have many volumes: up to hundreds of thousands.
Together, Cisco and MapR provide enterprises with transparent, simplified data as well as management integration with an enterprise application ecosystem. They transparently work together to provide a uniquely capable, industry-leading architectural platform for Hadoop-based applications.
The Cisco UCS solution for MapR is based on Cisco UCS Integrated Infrastructure for Big Data, a highly scalable architecture that includes computing, storage, connectivity, and unified management capabilities and is designed to meet a variety of scale-out application demands. It achieves this with transparent data integration and management integration capabilities built using the components described here, shown in Figure 1.
Fabric interconnects establish a single point of connectivity and management for the entire system. They provide high-bandwidth, lowlatency connectivity for servers, with integrated, unified management for all connected devices provided by Cisco UCS Manager. Deployed in redundant pairs, the interconnects offer the full active-active redundancy, performance, and exceptional scalability needed to support the large number of nodes that are typical in clusters serving big data applications. The manager enables rapid and consistent server configuration using service profiles, automating ongoing system maintenance activities such as firmware updates across the entire cluster as a single operation. It also offers advanced monitoring with options to raise alarms and send notifications about the health of the entire cluster.
The rack server supports a wide range of computing, I/O, and storage-capacity demands in a compact design. The server is based on the Intel® Xeon® E5 v3 Family Processors and supports 12-Gbps SAS throughput, delivering significant performance and efficiency gains over the previous generation of servers. The server uses dual Intel Xeon processor E5-2600 v3 series CPUs and supports up to 768 GB of main memory (128 or 256 GB is typical for big data applications) and a range of disk drive and SSD options. Twentyfour small-form-factor (SFF) disk drives are supported in the performanceoptimized option, and 12 large-formfactor (LFF) disk drives are supported in the capacity-optimized option, along with two 1 Gigabit Ethernet embedded LAN-on-motherboard (LOM) ports. The Cisco UCS Virtual Interface Card (VIC) 1227 is designed for the M4 generation of Cisco UCS C-Series Rack Servers. The VIC is optimized for high-bandwidth and low-latency cluster connectivity, with support for up to 256 virtual devices that are configured on demand through Cisco UCS Manager.
As one of the technology leaders in Hadoop, MapR provides an enterprise-class Hadoop solution that can be quickly developed and easily administered. With significant investment in critical technologies, MapR offers a comprehensive Hadoop platform fully optimized for performance and scalability. The MapR Distribution includes over 20 tested and validated Hadoop software modules on an advanced data platform, offering exceptional ease of use, reliability, and performance for Hadoop deployments (See Figure 2).
The benefits of the MapR’s distribution solution include:
Volumes (unique to MapR) form the foundation of multi-tenancy as offered by MapR.
In a typical deployment, the data for each user, group, application, or business unit is placed in a single volume so that it can be managed separately from the data of other users, groups, applications, and business units.
Other Hadoop distributions do not support volumes, so policies can be defined only at the file or directory level (too detailed) or at the cluster level (not detailed enough). As a workaround, organizations using other Hadoop distributions create separate physical clusters for each tenant, which add architectural complexity, and thus higher risk of errors and failure. Multi-tenancy in MapR also has significant total cost of ownership (TCO) advantages. It allows organizations to use a single cluster for multiple use cases rather than having to maintain a large number of isolated clusters. This approach reduces overall administrative overhead. It also enables the higher efficiency of a common resource pool.
Here are some of the unique features of multi-tenancy in Cisco UCS Integrated Infrastructure for Big Data with MapR:
The current version of the Cisco UCS Integrated Infrastructure for Big Data offers the configurations listed in Table 1. The configuration used depends on the computing and storage requirements of Hadoop.
For more information about Cisco UCS big data solutions, please visit www.cisco.com/go/bigdata_design.
For more information about Cisco UCS Integrated Infrastructure for Big Data, please visit blogs.cisco.com/datacenter/cpav3/.
For more information about MapR, please visit mapr.com.
For more information about the Cisco® SmartPlay program, please visit www.cisco.com/go/smartplay.
For more information on the Cisco Validated Design (CVD) for the solution, please visit www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/UCS_CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_MapR.pdf.