The MapR CTO/Co-Founder on MapR Database and Project Kudu – Whiteboard Walkthrough

Contributed by

7 min read

Editor's Note: In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR Database and how project Kudu stacks up to the MapR Data Platform.

Here's the unedited transciption:

Hello, everyone. I'm M.C. Srivas, CTO and Co-Founder of MapR. I'm here to talk about MapR Database and what's the vision we had when we went and build this entire platform at MapR. If you look at big data, big data is really two things. There's big storage and then there's big processing. When you say big storage, what are we really looking at? Right? If you look at Hadoop has now become the defacto standard for big data but let's look a little bit one level inside that big Hadoop picture.

What we have from MapR is MapR Database which is a fantastic system for managing tables and files in the same platform. On top of MapR Database you can run Spark, Hadoop, SQL, and your own customer applications. This is really what we were trying to build or what we have built I should say. What was the approach we took? The approach we took was let's try to innovate while we keep the best ideas that others have innovated before us. We want extreme scale clearly this is about big data so extreme scale where trillions of rows, millions of tables, thousands and thousands of columns is very important.

Beyond columns document model is the right way to do things. Document model from a level of perspective is much more easier to handle with. It's more natural and it's the way people think. Rows and columns are just an approximation to the document model. In MapR Database we have the first time we've introduced JSON to Hadoop. When you talk about data consistency and so on, acid which is atomic, consistent, isolated, and durable are extremely important features and MapR Database provides acid features.

Very importantly, today in this IOT scale of things you don't work in a single data center anymore. Everything is globally replicated across the world and MapR Database naturally fits there. You can run MapR Database tables worldwide with full multi-master application. More importantly we borrowed an idea like I said we innovate and while keeping what good ideas are we borrowed a great idea from the cellphone industry which is zero management.

You just start MapR Database. Start using it and it just works. There's nothing to tune. There's nothing to configure. It just works. It runs on commodity hardware. You don't have to go and make a decision about what hardware you want right now. You can upgrade continuously because MapR Database approach heterogeneous hardware. You can keep your old hardware and make it work with your new hardware. You don't require every inordinate cluster to be identical.

While doing all these we give you 10x their performance of the newest competitor that's out there. The things that we left behind which when we started building this, you said, "Hey, what are some ideas which we didn't want to take along with us?" If you're doing databases typically you're probably used to everything being a transaction. Right? Everything being a transaction is what causes performance problems with Oracle and other databases. With MapR Database, you choose when you want to have a transaction or not have a transaction.

You know everything doesn't have to be acid. Rigid schemas are really a thing of the past. I mean, today with the enormous amount of semi-structured data around these are what I mean by semi-structured is loosely loose schemas like for example email has a schema where you have a from and a to and a date and a subject. The body is loose. The subject is loose. The list of recipients is loose. It's not really rigid schema. We don't have foreign key constraints and things like that. There's no concept of an inconsistent email. Emails are always consistent even if they don't have rigid schemas.

Trying to equate rigid schemas with consistency has been a mistake in the past and we didn't take that forward. Very, very importantly we've done commodity hardware. The appliance approach in the belief that it's going to give a better performance is actually a fallacy. What we have found is that when you do an appliance you're really buying. When you're buying an appliance you're really buying yesterday's hardware in today's prices.

While we at MapR can give you the latest with MapR since we run commodity hardware you can take advantage of hardware as it improves and not worry about having to do a full upgrade every time. Having to talk about MapR Database, I was asked to actually compare MapR Database with Apache, Cloudera, Kudu that was recently released. If you look at this diagram here, this is a diagram that Cloudera has put up where they have created this graph where they show slow and fast on random IO and then slow and fast on streaming IO. They placed Kudu in the middle between HBase and HDFS. MapR XD and MapR Database have already existed now for the ...

FS has been around for six years and DB has been around for four years. If you look at some of the performance numbers we've published in the last few years, we're about 20x faster than HDFS. In fact, Samsung recently published performance number where HDFS does about 500 megabytes per second while MapR XD has a world record setting 16 gigabytes a second, the fastest on the planet today. Similarly, with HBase compared to MapR Database the random IO performance is almost 20x faster. I hope you try out MapR Database. It's free, it's available now in our M3 distribution. Give it a try and see. You'll really love it. Thank you very much.

This blog post was published September 29, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now