Avoiding Latency Issues When Scaling: Lessons from Coinbase on Blockchain Use

Contributed by

10 min read

Blockchain technology and its many applications are fascinating, particularly in the digital currency space. Many engineers at MapR have accounts with digital currency broker Coinbase and follow their technical blogs. Last December, Coinbase experienced some outages, and my MapR colleagues noticed the blog posts Coinbase Engineering wrote on the challenges they face around scaling up, as a result of their call to action for more people to join Coinbase. In the video below, Anoop Dawar, former head of product management at MapR, illustrates some solutions to deal with the latency and scaling issues Coinbase experienced from peaks in their trading volumes. If you have any questions, please add them in the comments section below.

Hello. My name is Anoop Dawar, and I run product management at MapR and have been an avid fan of Coinbase, but as well as Blockchain, Bitcoin technologies, as are many engineers in MapR, and we follow this space closely. We have some engineers who have accounts at Coinbase, and we recently noticed some outages and blog postsfrom Coinbase Engineering on the challenges that you guys are facing aroundscaling up and your call to action for many people to come join Coinbase.

So, here is a chart from, I think, December 6th, but there's a similar one on December 7th blog, and it talks about the latency spikes that you guys are observing because of legitimate transaction volume going really high due to changes in the marketplace, right? So, we were talking about this internally and huddled together, and we've observed, of course, some common patterns here.

One is that the latency of the Mongo Database is spiking up, and at the same time, there's a correlated spike in the queuing, and so we are speculating here, of course, but some speculations say that part of the reason queuing is spiking up is because the database is not keeping up, and then that's backing up the queue of the transactions that need to go through.And this is a very common problem in NoSQL databases – not just MongoDB, but many of the the NoSQL databases out there.

It's illustrated here on another graph that we do internally for our performance testing of MapR Database, which is our NoSQL database, and what it shows you here is that in blue are the latencies, read latencies for a NoSQL database, and then in red it shows the MapR Database, and what you'll notice is massive spikes on the read latencies of the other database, and this is a very, very common problem.

Sometimes, you will see the spikes go 10X in terms of the performance, and there are reasons why this happens, and what we have done in engineering to solve this, but before we go into that, let's talk a little bit about the business challenges you guys I'm sure [are] facing and the technology drivers that come with it, right?

So, for example, in the space that you are – with financial services in a very, very new and nascent market where the rules are changing all the time – there is the requirement to keep up with regulatory and compliance pressures and audits as they might just fly into you at any time, so the agility that's required through that.

Know your customer. It's extremely important in this space to know who is actually trading these transactions, so there is a legitimacy that is established into this marketplace as it comes into the broader open market. Fraud detection. Actually, as of yesterday, on Wednesday, there was another article around potential insider trading as Bitcoin Cash was rolled out on Coinbase.And more importantly, not just detection, but prevention, how you can avoid and actually discard transaction requests even before they take place because you know they are fraudulent as well as predictable and optimized OPEX.

Everybody talks about OPEX, but I think the predictability is equally important, especially when you're getting these spikes, and so they lead to a bunch of technology challenges including scaling and availability at the spot of the moment, and if something happens and you need to be able to scale up and scale down, being able to maintain the availability. Nothing should go down.

Meet business SLAs. Not just average times, which is what these charts I think are showing, but 99 percentile and 95 percentile latency SLAs, and then data fidelity and governance. Right? Can you really trust the data? Just like we do in relational systems, can you say with guarantees that: A) there is no data loss, B) these transactions were atomic, and C) that you can actually bank on this trade? Right?

Based on that, let me share what we see MapR customers do, but before that, a quick view of what MapR is, right, and why MapR would come and have a conversation with you to solve some of these problems with you. We have built a converged data platform that is a single platform for files, tables, and streams – so your NoSQL database, your event streams, and your files– that's a scalable and proven platform, which is runningpetabytes and petabytes with no downtime insome of the largest institutions and enterprises in the world and some of the largest retailers, banks, financial services, customers, and also, governments.

So, I'll take an example of Aadhaar, which is a country-wide biometric system for the country of India. It has over a billion users and over 14 trillion transactions to date and growing massively. It's really the foundation of India's next-gen economy, and if Aadhaar goes down, essentially, the country goes down. Right? They are using it for KYC, for authentications, for fraud detection and prevention. As you bring in biometrics, you got to make sure that these are not fake. As people are authenticating, you got to be able to get them within hundred millisecond end-to-end across the country.And so, the reason they chose us is because we are able to bend the cost curve for these customers.

Let me show you what we mean by that. So, here is an example of what MapR customers are actually experiencing and enjoying. If you look at the red curve, that's typically a revenue curve of a growing product or a business, right? You start small, and then you reach an inflection point of you having an exponential curve, if you will.

And then, when we used open technologies like Mongo or Cassandra and others, what we quickly find out, these customers find out, is that the cost curve is very, very attractive early on, and just when the business is hitting the inflection point, the scalability problems come in, and you need to move to high, higher instance, massively expensive instances, to scale your business out, and the cost curve starts exceeding the revenue curve, just when you are expecting the opposite. What we provide is a stable, predictable cost curve with MapR and cloud technologies like Amazon Web Services.

Then, question often comes up like, "How are you doing that? Is that even possible?" Let me tell you how we do that. Right? So, let"s come back to the technology here a little bit, and I'm going to focus on the database, but it's really applicable across the streams and file system. One is no latency spike. As you can see from this red line, we are able to provide you much tighter bounce on the 95 and 99 percentage latencies. This is proven to scale. The database, just the database itself, is proven to scale to petabytes in multiple customer environments for multiple years, and then the file system is just an exabyte scale file system.

There are no app and analytical sideloads, which are common with NoSQL databases and sharding technologies that are used to solve these problems. Those just disappear in MapR. There is no data loss. The system is proven with reliable uptime and no data loss. We have built a distributed system that allows for this, and it's proven. It allows large doc sizes, which are again a constraint in Mongo.

It has built-in SQL with deep integration with the database, so you can have performance, operational, and analytical queries. It's got built-in machine learning and artificial intelligence and is open for any and every tool of AI and ML in this massively fast-changing revolving space. Finally, fine-grained access control and auditing technologies, so you can have your governance and audits.

We hope to have a conversation, a deeper conversation with you on this, and we will reach out to you, and we would love to sit down and have a healthy conversation on how we can work together to solve some of these challenges for you. Thank you, and have a nice day.

This blog post was published March 21, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now