Containers for Big Data: How MapR Expands Containers Use to Access Data Directly | Whiteboard Walkthrough

Contributed by

4 min read

In this week's Whiteboard Walkthrough, Jack Norris, Senior Vice President of Data and Applications at MapR, explains how the MapR Data Platform opens up the use of containers to the big data environment such that you can access data directly, thus taking advantage of otherwise under utilized assets.

To find out more about how MapR provides a highly scalable, random read/write layer that exposes industry standards (such as NFS, ODBC, Kafka) all within the same technology, see:

Here is the full video transcription:

Hi. I'm Jack Norris, Senior VP of Data and Applications and today we're going to talk about big data and containers.

Let's start with big data. Obviously, you can see that this is a representation of Hadoop. Well, maybe not that obvious. There are actually two layers to Hadoop. We tend to focus on this top layer and all of the various projects that make up Hadoop: Hive, Pig, Sqoop, Oozie, Flume, and there's new ones coming day in and day out. The bottom layer, the Hadoop distributed file system, is an important capability and actually limiting factor. What we have here is a Java layer that stores its data in the underlying Linux file system across the distributed cluster.

With MapR we recognized early on that to improve big data, start with the data layer, start with that underlying platform. We extended this platform across the area, eliminated ... this is my eraser ... eliminated the Java dependency, eliminated the storing of the data in the underlying Linux file system and provided a scalable random read/write data layer. Then we exposed HDFS. We exposed NFS. We exposed industry standards for database, for document, for streams. Not only does that provide more capability and more flexibility within a big data environment, but it also extends to support legacy work loads, whether you have existing file applications or database applications.

Today in our brief video we're going to cover containers. If you look at the use of containers, these are servers, there are hundreds, if not thousands of servers in the typical large organization data center. The beauty of containers is that you've got more flexibility to take advantage of underutilized assets. I can take a container here and move it across anywhere. Theoretically, that sounds great. In reality, the use of containers is really limited to ephemeral applications. Applications that don't require accessing data to support state.

With our Data Platform, it really opens up the use of containers because each one of these servers, thousands of servers have a consistent mount point into the MapR Data Platform and use the industry standard that's appropriate, whether that's NFS or ODBC or what have you. Now you can take a container and move that across any of the available resources, access the data directly. Now you can have a very flexible infrastructure, easy access to this data regardless of where it is in the Data Platform, and increased agility for your organization.

Very short whiteboard walkthrough, but that's the beauty of the MapR Data Platform. Thank you.

This blog post was published January 18, 2017.