Deployment of ANSI SQL on Hadoop: Vertica Integration on MapR using the Sandbox

Contributed by

4 min read

With our recent announcement of HP Vertica’s deployment onto MapR, we have already been flooded with questions about the integration.

Use Cases

One of the more frequent questions we’ve seen is, “What’s a typical use case?” While it’s difficult to give an exact answer because the pilot customers are doing many different things, I can easily list some of the successful characteristics:

  • Putting a database on top of MapR means there needs to be a reason. Given the nature of Hadoop, the data must be unstructured, semi-structured, emerging structure, or changing structure.
  • HP Vertica’s extremely fast querying capabilities make it a no-brainer for analytics. In addition, the columnar nature plays perfectly into MapR preferred I/O profiles.
  • Use cases tend to have complex algorithms as well as analytical analysis. It may include temporal analysis (complex algorithms run on incoming data) or dark data analysis (deep analysis of data for many years).
  • There is a need for an integrated platform; management of multiple clusters causes pain.

The Sandbox

Scrum development necessitates rapid deployments, and we believe that applies here too. With our reciprocal agreement with HP Vertica, both companies are able to offer a pre-integrated virtual machine. To download and get started with the MapR Sandbox for Hadoop and HP Vertica, click here.

You aren’t quite able to see the full power of either platform, but you will be up and running in a few minutes. The value that we can’t show in the Sandbox are disk management, horizontal scalability, and database management from the MCS. What you will get is a fully functional database and Hadoop distribution.

And yes, the Vertica files live completely on the MapR Distributed File and Object Store.

Technical Advantages

With what’s been happening in the ecosystem, it goes without saying that this is an important direction for our products. I’ve spent a lot of time with Vertica and MapR, and even I was surprised at how well these two products work together. Sharing a few highlights:

  • ANSI SQL is on Hadoop. Get all the benefits, including access to the ecosystem. I’m anxious to get Vertica working with some machine learning.
  • We’ve seen almost zero performance loss for Vertica. With heavy disk use, there is only single digit performance loss; when you are CPU bound, it’s negligible.
  • Vertica can still take advantage of data locality. MapR has this neat feature of locally pinned volumes.
  • One really cool side effect is that MapR has lifted the requirement of RAID. MapR replication takes place across different nodes, giving Vertica the ability to reclaim some of the disk lost to RAID.

Going Forward

While we have started the integration of our two products, I can assure you that we have many more feature ideas we are working on. This is the beginning of the journey so stay tuned for more as we will continue enhance the value of putting these two technologies together.

If you have any questions, don’t hesitate to reach out to our sales team who has been trained on the integration. Good luck on your ventures and let us know how you are using the integration.

This blog post was published May 14, 2014.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now