Big Data and Hadoop

Contributed by

3 min read

A lot of solutions talk about how they handle Big Data, but Hadoop is unique. Hadoop does things that aren’t possible with other solutions -- or if they are possible, they are prohibitively expensive or complex. Hadoop is not an open source alternative to something that’s been on the market for years. That’s why there’s such a premium on key infrastructure innovations like those from MapR.

One of these MapR innovations is NFS support. This greatly enhances one of the big advantages of Hadoop -- the breadth of data sources that can be analyzed. Hadoop isn’t limited to structured data sources, Hadoop can be used to analyze unstructured, semi-structured as well as structured data. Hadoop also doesn’t require an ETL process to get data into a Hadoop cluster. However, you do need to write special programs to batch load data through a Hadoop API.

MapR spent a lot of time innovating at the storage services layer to eliminate the need for special programs and make data input much faster and easier. Now with MapR data can be streamed directly into a cluster. No special APIs, no special clients are required. Not only does this make loading data into a cluster much easier it allows end-users to manipulate and analyze the data directly in the cluster. For example, users can mount the cluster like a storage volume and see the resulting data sets. Users can perform commands such as sort and grep. File associations are also supported so desktop analytic or reporting tools can be used on the data as well. There is no need to batch unload results for further processing.

This is definitely an area of the product where the experience is worth a thousand words. Rather than lengthen this blog, I'll just encourage you to download MapR and see what you think.

This blog post was published July 18, 2011.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now