Sr. Director of Product Management, MapR
Are you thinking about moving to the public cloud to save on costs? To be more agile? To be more innovative? You are not alone. Organizations around the world are migrating to the cloud at a rapid clip, with some estimates suggesting 80% of all IT budgets will soon be committed to cloud solutions.
Many who have started on their cloud journey have realized that it isn’t as easy and inexpensive as they thought. Cloud pricing models can be difficult to decipher and control, applications need to be rewritten, and once re-written those applications are typically locked to a particular cloud. Thankfully, with the right technology in place your cloud journey can go from bumpy and expensive to smooth and contained.
In this webinar, you will learn how to:
Tom Fisher: [00:00:01] Thank you very much and thank you everyone for joining. As David said, this is really about talking relative to critical mistakes to avoid on your journey to the public cloud. This is not intended to be a criticism of public cloud, as a matter of fact as a company MapR is very committed to and working even more closely with many of the leading cloud providers, but as an executioner or as a practitioner, our goal here is to talk about some of the challenges that you may face and how best to overcome them.
Tom Fisher: [00:00:36] There are a lot of reasons to move to the public cloud. For example leveraging the latest innovations, technologies like Kubernetes and merge in the cloud very, very quickly. As a former CIO, approximately four times, reducing IT cost is always a challenge for your senior IT management or the CIO organization in general, and the cloud represents a great way for you to do that. But as Will and I will cover later, you got to be careful about that. You need to know what are the rules around that.
Tom Fisher: [00:01:00] Enable IT scalability. I used to be the CIO for Oracle's managed cloud services and part of the public cloud, and one of the reasons that we heard frequently from customers why they would go there is the elasticity, the ability for you to expand compute and reduce compute based on what you need. The same is true for storage.
Tom Fisher: [00:01:32] Driving business growth. The ability for you to put e-commerce in place with a highly reliable platform without worrying about whether network interruptions or local power challenges are going to potentially impact turn revenue is one of the reasons that many people choose to go to the public cloud. Greater agility, the ability for you to be able to move across different capabilities or provide additional services very, very quickly by cloud providers.
Tom Fisher: [00:02:03] And last but not least, and probably the thing that practitioners face all the time is in the area of security. In the past in public clouds, those challenges have been faced with the possibility of commingling data together, and in multi-tenant architectures that are available today, it's not possible to pierce that connection.
Tom Fisher: [00:02:29] Here's some gee-wiz numbers about who's making the move and why are they doing it. The why is what we covered first, this is the who. Many are making the move. These are statistics that actually came from Gardner, and they're a little dated, August 2016, but what we do as an active cloud member is we track this information carefully because we want to see how reliably Gardner and others are predicting this move. But more than three quarters of organizations are estimated to have used the public cloud services by the end of year 2017. We're waiting for the later release of this report because we believe that number is actually higher.
Tom Fisher: [00:03:13] 23% of organizations that have no current plans to use public cloud will eventually use the public cloud. Even those that are I'm going to run everything myself are getting pressure from their CEO, their CFO on the challenge of moving to the cloud. The spend on public cloud services in general is much higher. Now this number 160 billion which is the 2018 estimate comprises all three, platform as a service, infrastructure as a service as well as software as a service. And you can see the five year CAGR on this is almost 22%, and that's an IDC report which is a relatively conservative organization. By 2021, we're looking at $277 billion in spend by 2021.
Tom Fisher: [00:04:09] As we talked about, there are surprises as you move through. Here is a customer, this is a public piece of information that's out there, on a cash basis we spent $2.6 million on Amazon web services and a mere $2.8 million on our own data center. The business impact is profound. I would change that quote if I could to be can be, because one of the areas that Will and I will talk about is you need to manage the file carefully. There's a lot of capacity and the cloud provider's goal is to provide you with as much capacity as you need, but the difference is they don't understand your business. And from a business perspective, the business drivers that are driving that capacity because it seems so cheap and so free and so available allow people to actually have to be very very careful about how they manage this.
Tom Fisher: [00:05:03] That ranges from in my experience people who put things on Amazon using their credit card and then expense it. I'm sure those numbers were not included, but when you get done adding up the total, because it's so easy to get access to infrastructure as a service, you find out that these are the real numbers.
Tom Fisher: [00:05:26] I'm a CTO, can't count. I know we started with four, but I'm going to cover five, that's why the plus one. And these are the most prevalent mistakes that are made during the journey to the public cloud. At a high level, each of them I'll just go through them.
Tom Fisher: [00:05:43] Many people assume they're not going to get locked in. Secondarily, there's an assumption that you can easily get to multi cloud later. Assuming the cost will always remain low, assuming your existing apps will just work in the public cloud, and finally the one that I most recently added is assuming it'll be really easy to leverage data when it's at the edge.
Tom Fisher: [00:06:11] Mistake number one, and mistake is a harsh word but I hear it all the time at home, you're assuming that you won't get locked in. The cloud vendors are adding services. I believe Amazon adds approximately 10 services a month to their public cloud services, and as a result those services may or may not be available in other clouds. What we're finding is that there are incompatibilities. The one that's listed here is S3 versus the zero data lake store. Now both Google and Amazon are supporting S3, but Azure for very good reasons has made a decision to go with more of a data lake store, particularly when it comes to infrastructures of service or their HD insight capability.
Tom Fisher: [00:06:59] Moving data from one cloud to another or back to on-prem is difficult. That's an experience that our customers have told us about. The ability to move off the cloud to on-prem by design most of the capacity of cloud providers is focused on the growth of their cloud, not on supporting people moving off. So the network bandwidth, if you have a lot of data that you're trying to move off that cloud can be very time consuming, can take a lot longer than you normally would think, and it can be expensive. What you want to be able to do is you want to be able to leverage the latest innovations. These cloud providers are outstanding at data center locations and pricing, but you've got to make sure that you're constantly diligent on making sure that you don't get yourself locked in.
Tom Fisher: [00:07:45] Great example, big table from Google. It runs on the Google file system, which is a very powerful file system. But the reality is you can't take big table and drop it on Azure, and you can't take big table and drop it onto Amazon. That's not a criticism of the cloud providers, that's actually a feature that you'll see that Google presses. The same is true for Red Ship from Amazon. It's a great product, we're not downgrading it in any way. But the reality is Red Ship will not work anyplace else other than Amazon. And that's what we're talking about relative to lock in and lock in on services.
Tom Fisher: [00:08:24] The assumption that you can easily get to multi cloud later. The reality is that automatic data movement from one cloud provider to another, it's not possible today. Well I'll put a plug in, unless you're using a product like our own MapR which is trying to be as cloud agnostic as we possibly can. The reason for that has to do with the different security models, the different versions of the Linux operating system that they may be running. So cloud vendors on-premise offerings are very limited. We've heard about Amazon dropping their solution behind other peoples firewalls, but we have not tracked somebody who has done it successfully. It's probably out there, but they're probably not allowed to talk about it. And the other thing that you've got to be aware of, the cloud providers are very good and they get paid to keep their platforms up. But the reality is they do go down from time to time. I remember last December Amazon had a very significant outage across all of their platforms, and one of the things that Will will talk about is how do you protect yourself against that.
Tom Fisher: [00:09:30] What it means is you need the disaster recovery capabilities of a true multi cloud deployment. In other words, if data resides on Google and data resides on Amazon and data resides on Azure, you need to be able to know where that data is backed up to so if there is an issue your disaster recovery processes can kick in.
Tom Fisher: [00:09:48] And finally you need the wider range of data locality. One of the things that GDPR and other new regulatory items have come out with is data has to be physically located if you're a multinational company in a particular region, and it cannot be accessed by anyone else. That's for their own protection. But whether we like it or not, it's a regulatory requirement that's in place today.
Tom Fisher: [00:10:18] Number three, assuming the cloud cost will remain low. Getting on the cloud is really cheap. I can tell you as an experienced CIO, it is very cheap. But it starts to add up when you start layering charges. I want all my data in SSD versus spinning disk. That's a significantly more expensive way in which data is stored and managed, and Will will talk about how we can help in managing that better, but the reality is that storage and movement is very fast, it's very efficient, standing up environments is very fast, but you also need to be diligent about making sure that you take those environments down when you stop using them. Because the cloud providers don't know what you use or don't use. So the reality is it's to their advantage if you're not doing the tidiness that's required when environments are no longer required.
Tom Fisher: [00:11:18] The fourth one is assuming that your existing apps will just work on the public cloud. We're all technology professionals. We all know that certain applications have the requirement on operating systems or databases that are out there today. This is a reality. It's very, very difficult. If you have the example that we wrote here was applications are written against data in [inaudible 00:11:44] and now all of a sudden you're sitting on a [inaudible 00:11:46] technology. The fundamental differences are significant and you can't take advantage of the speed associated to [SAN 00:11:52] or proliferation of data associated with the SAN. But all of them will advertise that they support the open APIs.
Tom Fisher: [00:12:00] Why this matters to you? Particularly for custom applications, you may face a rewrite. If you take a particular application to the cloud or refactoring. And by the way you can't discount that cost because rewriting applications entails cost and time and you may or may not have the skills available to you.
Tom Fisher: [00:12:23] Last but not least, I wanted to make sure that we've got this. One of the things that we're seeing is particularly with the advent of internet of things or IOT or Edge as it's referred to, in the case of Cisco they refer to it as fog computing. You need to be aware of that increasingly data is going to be at the edge. Cloud options for analyzing data at the edge are available but they're quite limited, and Will will cover this more in detail about how we help you manage that better.
Tom Fisher: [00:12:54] Why it matters? The reality is edge represents a disruption to the cloud. Because more increasingly, data is going to be at the edge and that data is going to become increasingly critical. We have customers who use edge devices on pipelines. If there's seismic activity with the edge divide, they may be able to shut down the pipeline through machine learning algorithms before there's a real problem and report that back. Because many times these pipelines for example aren't in areas where there's 3G, 4G, LTE or even the new future 5G capability. They have to run trucks up and down the pipeline for WIFI access to deliver that data back. And with the edge, you now have the capability to do a lot of that machine learning and a lot of that compute requirement at the edge. You want to be able to take advantage of that.
Tom Fisher: [00:13:50] With that, I get the opportunity now to turn this over to Will, and I won't attempt to butcher his last name.
Will: [00:14:00] Thank you Tom, great introduction. From this point forward, I'm going to concentrate on what it is that MapR can do to help companies effectively make the move into public cloud or make the best of your public cloud investments. I'm sure for the many of you that are on this call, you're either in a situation today or you're looking to get into it, and that's kind of what you want to hear.
Will: [00:14:30] For starters, I wanted to just take a step back and go over the main goals that we had when architecting our system in the first place. If you look at the design center for the MapR platform it was really centered around four key use cases that we saw emerging.
Will: [00:14:51] The first was the desire of companies to do advanced analytics, machine learning and AI, not only just to the side in their enterprises, but integrating the machine learning and AI models that they build into their operational systems. What we built with the MapR platform is a system that can very tightly tie your analytics and machine learning in with your operational workflows so that you can have a continuous cycle of learning and delivery.
Will: [00:15:31] Beyond that we knew that multi cloud and hybrid cloud mobility was going to be critical to companies, so we designed the system to do that and that's where I'm going to spend a lot of time today.
Will: [00:15:44] For companies that are embarking on this journey they're realizing a good supporting infrastructure is needed to run those containers, particularly when it comes to data persistence. And that's another key thing that we tackle.
Will: [00:16:10] And the last, and Tom just got done talking about this, is the ability to not only do analytics in a data center or in a public cloud, but also pushing the results of those analytics and those models out to the edge so that you can get the most out of your infrastructure and make business decisions as quickly as possible when it's still relevant.
Will: [00:16:37] Now digging into some of these details. I'm going to address the challenges that Tom mentioned one by one. One of the challenges that Tom highlighted was the issue of lock in or the idea that if you create your application specific to an individual cloud, you may end up in a situation where those applications cannot be either put back on premises or moved to other cloud providers. And this is a key thing that we hear executives wondering about day in and day out.
Will: [00:17:18] The important thing that MapR does in order to help out with this situation is we have a philosophy around supporting only open standard faces APIs. I'll draw your attention to the diagram kind of in this middle where there's a gray box API. You see here a list of APIs that the MapR platform exposes to the applications that run on it. And these APIs are a mix of standards based APIs, so NFS and POSIX are standards body backed, been around forever, and lots and lots of applications out there seek these APIs.
Will: [00:18:02] Several other of these APIs like HDFS, H-based , and Costca aren't standard body based APIs, but they are supported by open sourced projects which means they're very safe APIs to write applications to and it's going to be possible to run applications with these APIs in any location. These are the APIs we've chosen to expose from our platform when supporting these applications.
Will: [00:18:32] And this is critical in making sure that when you move your applications into the cloud, the APIs that your applications expect stay there, and if you develop new applications that you have the portability that you need.
Will: [00:18:46] So specifically artificial intelligence and ML framework and technologies often run out of box directly talking to the [inaudible 00:18:58] APIs. This is because oftentimes when the researchers at oftentimes universities write new machine learning framework, they write them assuming that they're going to run on someone's laptop or they're going to run on a single server, which means they're just going to pick up files from a file system. And when you design a framework that way, it becomes very hard to adapt it to a completely different type of storage system like an object store or a HDFS type storage. Those can leverage our NFS and POSIX interfaces to make the framework think that the files are local and effectively scale out those algorithms very effectively.
Will: [00:19:44] We do support the HDFS API for the plethora of analytics frameworks out there that speak it. Real-time applications would connect to our KAFKA API, operational applications or more structured applications would speak a combination of H-based or J-Son APIs, and we handle the data and application portability.
Will: [00:20:11] On the subject of portability, for companies that are trying to build a multi cloud strategy or they're trying to move applications into the cloud, you need two things. You need application portability and you need data portability. And this is kind of obvious, right? It's not enough to just move your application if the data that supports that application is either not going to be in the location that's needed or it's going to be accessible via a different API than the application has already been speaking. And these are the two key areas that we concentrate on when providing multi cloud and hybrid cloud solutions.
Will: [00:20:59] guess first things first, to realize MapR is fully integrated with all of the public cloud providers out there, and many were in some cases even with integrated hourly billing, but certainly we've integrated with the cloud technology for provisioning and integrated with the various cloud marketplaces out there.
Will: [00:21:24] Now given that, solving this application and data portability problem, taking first the data problem of how do you get the right data in the right place on the right API. We have core technology built in, such as mirroring and data replication that ensure that data can easily move between MapR clusters no matter where they're located. This can be on-premises, it can be on the various public cloud providers, it can be at the edge.
Will: [00:22:05] And underpinning this technology are all of the hard problems that you have to solve in order to keep data synchronized between locations. This is everything from optimizing for sending only the minimal amount of data that's needed by compressing it, by doing incremental block level transfers instead of moving whole files that they changed. It's very tightly optimized. And also made extremely reliable where even if you have connectivity issues between sites while data is being replicated, the process still continues and your data is always reliably replicated.
Will: [00:22:49] Beyond that, we are able to make efficient use of cloud storage, object storage, in order to keep the price points right for storing your data. That's the data portability piece. For application portability, really it's all about containerization. With the rise of Docker and Kubernetes the other containerization technologies, for the first time companies have the ability to create an application, package that application and run it anywhere that supports the fundamental APIs like Kubernetes and Docker. And at this point even up until a couple of days ago with the Amazon announcement of general availability of elastic Kubernetes service, this is now reality. If you write your application to run on containers and Kubernetes, you can rest assured that it can run on any cloud. So then it just becomes a matter of the data problem which we can of course solve.
Will: [00:23:56] A little bit more details. In the last slide I mentioned that we're able to make use of low cost object storage in order to achieve really good price points and good price performance characteristics for data. I wanted to talk a little bit more about how that works. Essentially what we do is we create a tiering strategy under the hood of MapR, meaning the MapR platform will connect directly to an object storage system like Amazon S3 and write actual blocks of data to that storage system as defined by the policies that you put into the system. It's all transparent to the users and transparent to the applications, meaning an application sitting on top of MapR is not necessarily going to know or care which files are stored in local blocks to the MapR cluster or blocks that are backed by an Amazon S3 or other object store. The name space is going to all stay consistent, and applications are going to be transparent to it, which is a huge value prop because it means you can transparently archive data to an object storage to save cost and recall it in the event that it's needed for more high performance analytics or applications. It's all managed under the hood. There's a lot of value in that.
Will: [00:25:42] Given that you have a platform that enables hybrid cloud and multi cloud, I just wanted to talk through a couple of simple use cases for how people look at using that. With the first one, this is kind of my easy first step no-brainer use case, which is use the public cloud as a disaster recovery site for your on-premises data center. There are lots of companies out there that have multiple data centers and they fall and then DR strategies and of course the money to maintain those, but there is a lot of companies out there that have a single data center but still want the peace of mind of having all of their data in a disaster recovery site where not only is it backed up, but that data is available for non-stop operations in the event of a disaster.
Will: [00:26:37] It is very easy using the technology that we have to set up a small disaster recovery footprint in a public cloud that accepts a replica of data from an on-premises data center and makes heavy use of low cost object storage in order to save that data in the event of disaster. And in the event of disaster, you can quickly spin up a infrastructure that resembles what you had on premises, so you can move your apps over and resume your operations on the cloud. So you don't have to maintain the footprint 24/7/365, you can just have a way of provisioning it in the event of a disaster.
Will: [00:27:22] Use case number two assumes that you're starting with disaster recovery in the cloud and it goes one step further to say well if you have all of this data replicated into the cloud anyway, why not use the infinite resources of the cloud to do birth analytics or birth machine learning? Because for the applications that benefit from highly parallel processing, usually you can do the math to say if I have 30 servers on premises that can do this work in 30 hours, I can have ... now I'm going to mess up the math here ... If I have 30 servers that can do this work in one hour, now I can spin up 300 servers to do this in one hour and divide it by ... eight minutes. There we go. Six minutes. Sorry, long day. Again, analytics first thing.
Will: [00:28:27] Now last one is operational applications often benefit from having data replicated in multiple locations. Take a mobile app that has users that are spread everywhere on the globe, for latency reasons it becomes very advantageous to direct users in various countries to the closest point of presence, which of course means that you have a distributed network of operational systems handling that data and the data being replicated between all the sites. You might even want to make use of multiple cloud providers. For instance in China you may have a different preference for a cloud provider than you would in Europe or in the US. MapR can support active type deployments for those types of applications.
Tom Fisher: [00:29:25] If I can jump in here for a second, Will, this is Tom.
Will: You want to fix my math?
Tom Fisher: What's that?
Will: You going to fix my math?
Tom Fisher: No, no, actually what I was going to do is you hit on a key part. As a CIO, one of the most significant costs that I was running into and drove us to the cloud was the cost of networking and networking bandwidth. And where clouds have these points of presence, it allows you to significantly reduce your transmission cost because they're taking care of the movement, the network traffic, and doing it in the most efficient way.
Tom Fisher: [00:30:05] For example, in areas like Latin America or in Southern Africa or even out in Australia where you may have employees or manufacturing or whatever you may have, particularly in areas like China, the ability for you to be able to rely on a cloud provider to provide you with lower cost network traffic, also as something that we started to build into our model as a CIO at Oracle, when we would do the presentations to customers. Because it's an enormously costly, very expensive proposition and most people don't realize it when they start setting up their own data centers in each of these remote locations. They're trying to put their own points of presence there.
Will: [00:30:51]Yep, great point Tom. And that reminds me of another scenario that we were talking with a customer about as well. We were working with a customer that had several small we'll call them edge sites where the data was originating, and this customer wanted to adopt a multi-cloud strategy where they did different types of applications and analytics in two different public cloud providers. And the way that they had been looking at designing their environment before was to have all of their edge sites ship data to one of the public cloud providers and then replicate it or copy it to the other public cloud provider.
Will: [00:31:35] And the math that they were doing on that said that the egress network charges for getting the data from one cloud to another was going to be astronomical. And what we helped them design out was a situation where each of those edge sites, instead of replicating in kind of a chain fashion from one public cloud to the other and having those egress fees, the edge site would replicate directly to both of those clouds, which effectively avoids the egress bandwidth charges because it's just an ingress, and clouds love bringing data in, that's free. They just don't like letting it out. By using our multi wave replication, we helped that customer avoid a big charge.
Tom Fisher: Good point, thank you.
Will: [00:32:30] On the subject of lowering fees and lowering billing, let's talk about some specifics about how using MapR you can actually lower your operational cloud costs. I'm going to present a few slides with a cost model and some results, that's the result of a lot of financial modeling that we've been doing internally. All of the information that I present in this section and more are soon going to be made available through White Paper, so be on the lookout for that, I'm sure we will send it out to everyone on this webinar when it's ready.
Will: [00:33:15] When it comes to using MapR to lower the operational cost of running in the public cloud, there are several factors that make this possible. Number one is the feature that I talked about earlier, which is object tiering. By using object tiering you can make use of the most cost effective storage on the planet, which is object storage, S3 or Azure Blob or Google Cloud Storage. They're all pretty competitive when it comes to cost. Everyone I've talked to that does anything in the cloud loves the cost of these services, maybe not the performance, but that's why peeling policy makes sense.
Will: [00:34:03] Our advantage here makes heavy use of that cloud storage. Now combining that with point number two, which is all data that you put into MapR is automatically and transparently compressed before it hits any storage.
Will: [00:34:23] Now if you put these two things together, what you end up with is if you compare putting a terabyte of data directly into an object store versus putting a terabyte of data into MapR and letting MapR tier it to an object store, with MapR because we do the compression on the way from you to the object store, you can save depending on what kind of data it is, potentially up to 50%. One of the big cost savings that people see when using MapR in conjunction with a public cloud is we can save significantly on your object storage bill, typically the savings more than pays for the MapR infrastructure in doing that. That's pretty key.
Will: [00:35:13] The industry standard APIs kind of appeal to the other side of the cost equation, which is cost of migrating applications to the cloud, which often includes a rewrite expense. The fact that we've built in database and streaming services into the platform and can host those given the same footprint or resources that are doing traditional file storage makes it so that oftentimes you can avoid the expense of going with ala cart data services.
Will: [00:35:44] I don't want to pick on Dynamo and Kineses specifically, but these database and streaming services exist in many of the cloud providers and they all kind of had the same cost model. That could potentially be avoidable.
Will: One of the key use cases for MapR is data warehouse offloads, so to the extent that you are looking at running data warehousing workloads in the public cloud, you can use MapR to offload some of that processing onto something that's more cost effective. And we have instances where customers have used MapR differentiator capabilities to simplify their application architecture and make it easier to create applications. So in one case, a customer told us that they were able to increase their application creation ability by 3X compared to developing on cloud services directly. That also plays to the application creation side.
Will: [00:36:51] Now we have a couple of slides on given these advantages and some assumptions about environment, what cost savings are you looking at. They're busy slides so I'm going to point out some of the key points. The next two slides are going to go over an individual type of application. For the first type of application we call it a larger sized analytics use case. This assumes that it's a data lake-like use case. So you're putting data in some storage medium and they're doing analytics on that data. Here it's mainly taking into account the object storage offload as well as the compression, but you see here by adding MapR which is the red component of the cost, you can significantly lower the AWS billing, which is the blue component of the cost. So the overall operational cost of that equivalent environment is significantly lower, 286K per year.
Will: [00:37:52] Taking a different type of use case, which is a medium size complex use case, complex to us means that you're doing database, or you're using the database or you're doing real time stream processing and maybe avoiding the cost of some of the specific cloud services. It's a similar story, you have the red component and you bring down the component of the cloud cost. Plus we factored in a fair amount of billed cost difference assuming you can more effectively create new applications to that cloud.
Will: [00:38:29] Now if you assume that a company has a roadmap of use cases that they are creating over a five year period, it's a mix of analytic oriented use cases, complex use cases, you can kind of look at the different year one, year two, year three, year four, year five between using cloud alone or cloud plus MapR and you kind of see that the MapR advantage stacks over time for a pretty good long-term advantage.
Will: [00:39:02] Like I mentioned, all of these slides are supported by some work that we're going to put out on our website very soon and also there is a calculator in the back that allows you to change the assumptions. We don't want to make this a black box, we want to be very consultative with this and see where we can offer advantage to our customers.
Will: [00:39:28] Now covering on the bonus, the plus one advantage. Like Tom said, there is a lot of use cases out there where there is an enormous amount of data that is created at the edge. The things that we traditionally call the edge are things like oil wells, refineries, mines, drones that are flying over a farm, it could be TelCo towers, lots and lots of edge sites that generate a lot of data. And what's common about a lot of these that I mentioned are while the amount of data that's being generated in these locations is going up, the amount of bandwidth that connects them isn't necessarily doing the same. Often there is little to no internet connectivity, satellites or otherwise.
Will: [00:40:24] What companies are looking to do because they need to have intelligence at the edge, they need to be able to predict failures, detect anomalies, control operations, you have to push processing to the edge. What we have is a product called MapR Edge which is a miniature version of the MapR platform that works hand-in-hand with our core MapR data platform which can run either on-premises or the cloud or a combination. And through published, subscribed messaging connecting the two, you can build out what we call an act locally, learn globally infrastructure where you send interesting data to a central site, use modern machine learning and artificial intelligence frameworks to build models and then publish those models back to those streams, have them go to the edge and then use those models with local applications in order to have local intelligence.
Will: [00:41:29] That summed up what does MapR do about the various issues that companies might run into in moving to the public cloud. Just as a wrap up I have the summary slide of when MapR wraps your cloud, because in some senses that's what we do. You want to move into the public cloud. We can give you a nice layer around that cloud that gives you a consistent set of APIs which is on premises and a set of services that would not have been available, had you not used MapR. And as a result of that, business apps are easier to migrate because you don't have to rewrite them because of the APIs. Machine learning and AI frameworks are easy because they have access to more data, this is the whole moving a framework from the laptop approach to a clustered approach. Cloud storage drops due to the storage optimizations I mentioned, object storage and compression. Apps can be created easier because of the unification of services and some other capability, and those workloads are multi cloud and hybrid cloud ready.
Will: At this point we can transition into Q and A, I think I'll hand it to Mitesh.
Mitesh: [00:42:51] Okay thanks Will and thank you Tom. It was great deep dive into some of the pitfalls I think organizations may encounter along the way in their journey to the public cloud. We've got quite a few questions here rolling in. If you have any questions please jot them in now, and we'd be happy to get to them. I think the first question here is around managed services. Maybe Tom, this is best directed to you. The question is any plans on offering MapR as a managed service on AWS? For example just like Dynamo DB and RS, they'd like to be able to use MapR and H-based APIs as an interface to the MapR cluster running on AWS as a managed app.
Tom Fisher: [00:43:37] Thanks Mitesh, that's a very good question. Let me start out first by a little bit of education. There's a big difference between a managed service versus a software as a service. What typically people think about is software as a service where you're actually running the application and getting the benefit of the application. I used to be the CIO and ran operations for Success Factors whereas with managed services there's less of the tight controls associated to it. For instance one of the biggest challenges we used to face at Success Factors as a cloud application provider was that everybody had to upgrade at the same time. And sometimes that was not conducive to the business schedules or the business models or the business planning associated to the organization, but we were gonna do the upgrade anyway. One of my customers used to refer to it as the forced march, and I know as a former customer of SalesForce.com, there were times when those upgrades would occur and they were not convenient to our business plan.
Tom Fisher: [00:44:40] In that case, customers have started and cloud providers have begun to offer what is referred to as a managed service. It essentially allows you to do those upgrades on your own schedule. Now in many cases managed services used to be done by outsourcers, systems integrators, but the reality is that the technology provider needs to be able to better manage, is better positioned to do that type of work like patching, like upgrades, and in our case we actually have a lot of automation in the back plane that we've developed that allows that to happen.
Tom Fisher: The answer to your question is yes we do it for on-prem, we do it in the cloud. As a matter of fact, Will and I worked on the definition of our managed services early on.
Tom Fisher: [00:45:32] In addition, we have a number of partners. Partners like DXC or companies like HackStream that we've partnered with that also offer managed services, because at the end of the day, one of the things that you have to understand about MapR, we're a product company. We have a professional services team, we have a managed services team, but we really try to keep our emphasis and our focus on the delivery of our technology and keeping up with the latest ecosystem releases of products. So as a result, what you'll see is we have a smaller managed services practice than some of our larger strategic partners like Deloitte and DXC and others.
Tom Fisher: [00:46:13] I hope that answers the question, but the answer is yes, we have managed services, we do offer it on Amazon, we offer it on Google, we offer it on Microsoft Azure, we offer it on prem. And for those customers, particularly in the on-prem world who have partnerships with some of the large outsourcing providers, we're already working with those guys to be able to provide managed services through their own or using us as the behind-the-scenes getting stuff done.
Mitesh: [00:46:45] Awesome, thank you Tom. There are a couple of questions that seem fairly technical in nature. Will, I'm going to direct the first one over to you and it's about really mirroring and I guess compatibility between different versions of MapR. The question is if the MapR versions offered by different cloud providers are different, is the MapR communication for mirroring guaranteed?
Will: [00:47:13] Yeah, the way this works is a couple fold, one is we in our working with the various public cloud providers control which versions are available on which cloud and when because we're the ones responsible for posting those. What that means is we don't run into situations where a different version of MapR is available in one cloud versus another.
Will: [00:47:39] And to the second piece, which is if for whatever reason one cluster is upgraded before another, the way our mirroring works is we maintain forward compatibility, meaning an older version of MapR can always mirror to a newer version. Because we know that typically when people have multiple environments they're not going to upgrade all of them simultaneously, they need to do it one at a time. That's critical.
Tom Fisher: [00:48:14] There is full compatibility for example between our current release 522 and 6.1. but what you're saying Will is from a data perspective those things, we automatically manage the synchronization between the different versions if in fact we have customers ... We don't encourage this, we prefer everybody running on the same version, but if you in fact have a requirement or for some business reason you make that determination we can support it.
Will: [00:48:44] Yeah. Yeah. Exactly. We are very careful to ensure that every new version of MapR is going to understand data created by every up to that time created version of MapR. We have a lot of reverse or a lot of backwards compatibility built into the product in order to support that, yes.
Tom Fisher: [00:49:08] One of the things that we also do a little differently is we don't create new copies in order to move. We actually use change data capture. It also, again, reduces network bandwidth. That's another area that you talked about where we can really drive down costs for cloud providers because with change data capture we're only moving the bits that changed.
Mitesh: [00:49:36] Good point. Related to this is actually another question about mirroring and replication capability that is probably worthwhile getting to now. And the question is regarding data portability between cloud platforms or all physical data persistence subsystems replicated between that for instance on different cloud platforms. I'm just going to follow that up with basically the answer is yet. All the data in MapR that's housed in MapR, whether it's files tables or streams are either replicated or mirrored to other clouds, therefore sitting on these other cloud providers. In the case of files, let's say [inaudible 00:50:14] snapshots and mirroring in the case of tables and streams that's handled through real-time replication. That's just a quick follow-up to that.
Will: [00:50:24] And a quick clarification on the word automatically. The answer is yes it can all be automatic, but we certainly give you the control to decide which data should be replicated and which data shouldn't. In many cases there are data sets that are critical and need to be in all places, and other scratch data sets that you wouldn't want to waste bandwidth on, so we give you that control.
Tom Fisher: [00:50:50] That's a really good point Will, because many times if you look at our object tiering capability and the things you'll probably want to replicate and do that most immediate is the data that's stored in hot because that's what users are using all the time and requires more prolific and better management versus some of the data that may be two years old and on spinning disc.
Mitesh: [00:51:18] Perfect. Okay thank you Tom and Will. There's a question here I guess about deployment on containers and our strategy there and some road map question. Maybe Will, if you wouldn't mind taking this one. Is there a plan to host MapR MEP processes like Oozie for example into containers rather than with a warden?
Will: [00:51:42] The answer to that is yes. An early proof point of that is the MapR data science refinery. With MapR data science refinery, that was the first net open source project where we released it as a container instead of as a young, more apt package. And that has been very highly successful, so we're looking to replicate that model across the board.
Will: [00:52:14] And I'll actually do you one better and say that we're looking at containerization across the board for the MapR platform. Not just MEP components, but also core components. We'll be talking a lot more about that later this year.
Tom Fisher: [00:52:30] And one of the areas that we believe, if you look at us and I get to say this kind of stuff because I'm more strategically focused, ultimately what MapR as a data platform wants to be and will become is essentially a container for data, regardless of where it's persisted or what form its persisted in, we want to be the way in which applications are contained, we want to provide and are providing that same or comparable capability for data.
Mitesh: [00:53:09] Excellent. Okay, hopefully that answered the question. I see a question here about security which I can field and maybe Will or Tom can add to if you'd like. The question is what data security issues do you see with using cloud and edge deployments? Are you assuming all encryption for data at risk?
Mitesh: [00:53:29] Security is obviously not just about encryption, it's about how you authenticate users, it's about automatization, a very important component of security here and auditing. We actually handle all of that and we handle it at the volume level that could be really a collection of files tables and streams. This could actually be a pitfall or a mistake number six, but we already had five and didn't want to add plus two. It's how you manage security. It could actually be confusing and problematic in these public cloud services, as you have to go in and configure and reconfigure permissions across different services. With MapR it's actually much easier to do really at the volume level where you can control and authorize permissions on data for files tables and streams all together.
Tom Fisher: [00:54:19] And just to pick up on what Mitesh was saying, when you think about the proliferation of data across the internet of things, the ability for us to be able to manage through a common security model literally thousands and thousands of edge points into a single solution, it goes way beyond just an encryption of data, whether it's in transit or at rest. It is also that as Mitesh touched on, the ability to authorize because you don't want somebody spoofing an IOT edge point trying to get into your system.
Mitesh: [00:54:51] Excellent. Okay, thanks for that follow-up. I think there's a few more questions rolling in, I'm not sure we're going to have the time to get to all of them but I do see a question around the cost modeling, very much deeper question on that. I think Will, we'll wait for the paper to come up and certainly send that to the questionnaire here but also we'll follow up with you offline to do a deeper discussion on that one.
Mitesh: [00:55:15] In the meantime maybe we have time for just a couple more questions. One here, maybe Will I'll direct it to you. I think we touched on this, but just to clarify again, could you please touch base on the technicality of how MapR makes data available on different cloud services? And I guess the follow-up here is by whom industry standard APIs are being defined by. Is it MapR or the open community? So two questions here, how do we make data available on public cloud services, and second what do we mean by these industry standard APIs. Where did that definition come from?
Will: [00:55:51] Yeah, maybe a clarification. From the way the question is worded it seems almost as if you had the impression that we were synchronizing data between the native cloud service systems themselves, which actually is not what we intended to say. What we're saying is given a MapR footprint in multiple public cloud providers, we can synchronize the data between those MapR cloud footprints, which is how we do it in the first place because we have control over both sides and we can optimize the transmission of data between them.
Will: [00:56:31] To the second part of the question of industry standard APIs, I guess if the question is who's calling them industry standard, I think it's a pretty objective definition. It's NFS and POSIX, there are standard bodies that have worked on those and certified them. And with the rest it's the Apache community.
Mitesh: [00:57:04] Perfect, thanks Will. I guess there's not enough time unfortunately to go through all the questions, but we'll certainly follow up offline here. Maybe one last question for you, Tom. We listed a number of pitfalls here, number of mistakes that can be made. What is our rationale here? Are we trying to be antagonistic towards public cloud vendors here?
Tom Fisher: [00:57:24] Yeah that's a good question Mitesh, because sometimes it comes across that we are. And the reality is all the major providers, including some that are emerging we're partners with. The goal here is to be able to say from an experiential perspective as well as from a real world implementation of MapR today, we wanted to point out what some of those common pitfalls are, some of those common challenges are. They're not negative on the cloud, it's just understanding what you're getting into. When you're in IT, you're always looking for the good and the bad, so the bad is not necessarily a negative, it's just whether a particular cloud provider has all the features, all the functionality, and if you take on their additional services, what are the risks? There's always inherent risk in IT. What are those risks? And that was our goal was to try to establish a framework where you think about those things. And it was a good thing that you added the sixth one, because security is much broader than just a log in in the cloud. So thank you for giving me the opportunity to clarify that.
Mitesh: [00:58:37] Absolutely, thank you. I guess that's time. I'll turn it over back to David to close things up.
David: Thank you Tom, Will and Mitesh and thank you everyone for joining us. That is all the time we have for today. For more information on this topic please visit MapR.com/Solutions/Cloud or for other useful resources please visit MapR.com/resources. Thank you again and have a great rest of your day. [00:59:15]