Q&A with Sam Charrington: Kubernetes for Machine Learning, Deep Learning and AI eBook


Speakers:

Sam Charrington

Founder & Principal Analyst, CloudPulse Strategies

Ronak Chokshi

Product Marketing, MapR Technologies


What is Machine Learning? Deep Learning? How is all of this related to Kubernetes? What are some of the pitfalls organizations fall into when deploying these technologies in production? Are there good examples of how enterprises are extracting real business value from these technologies?

Yes, lots of questions. We recently sponsored an eBook published by Sam Charrington, host of the This Week in Machine Learning & AI podcast, that answers these in great detail.

If you are being asked these questions and are looking for answers, come join us in this Q&A session to learn more. We will have Sam Charrington, an esteemed speaker, author, advisor and an expert in these areas and to talk about all of this.


Transcript

Ronak: 00:08 Hi, everyone. Welcome to the Q and A with Sam Charrington. I'm part of product marketing at MAPR, and we are going to talk to Sam in this hour about a book that MAPR recently sponsored and which he wrote. Now, this is a Q and A, and I will be asking Sam a few questions along the session, but then I would encourage all the listeners here to go on the chat window and enter any questions you may have during the session. We'll try to answer as many as we can.

Ronak: 00:49 All right. Let's talk about Sam a little bit. He's the host of the very popular This Week in Machine Learning and AI Podcast. It's a leading podcast for data scientists, developers, business innovators, and other AI enthusiasts. His podcast now has about five million listeners in the two and a half years that it has been the podcast. You can subscribe to it on iTunes, Stitcher, Spotify, Google Play, and I think it airs once a week pretty much every week. He's also the founder and principal analyst of an industry research and analysis firm called CloudPulse Strategies. As part of that, he develops a lot of content, white papers, eBooks, and so on like the one that we are going to talk about here in this webinar shortly. He is considered as an AI and ML expert in the industry and consults with small and large enterprises on related strategies and so on. He also acts as an advisor for them. He's also speaker on a variety of events. Sam, does that summarize your credentials well? Anything else you'd like to add?

Sam: 02:08 Well, first off, Ronak, thanks for having me on this webinar. I think in terms of the ... The folks that are here for this podcast would probably enjoy checking out ... or this webinar, I should say ... would probably enjoy checking out the podcast, and they can do that at twimlai.com. The podcast is actually for ... It's two and a half years old. For the past, at least a year, we've been publishing shows at least twice a week, sometimes as many as five times a week, and we'll be around ... The middle of the year will be our third birthday, around 300 shows, and that's when we expect to hit that five million listen mark, so it's been a really exciting project, and I'm looking forward to digging in with you more about the eBook that we created around Kubernetes for machine learning and deep learning and kind of all the context around that.

Ronak: 03:21 Great. Let's get to the topic of today's discussion. Like I said earlier, MAPR has sponsored this eBook called Kubernetes for Machine Learning, Deep Learning, and AI. I encourage all listeners to check this out on mapr.com. It touches four of the most exciting and most talked about topics of today, right? Machine learning, deep learning, AI, and Kubernetes. Sam, thank you for writing this book and being here on this webinar to talk about it. Walk us through, if you could, what led you to writing this book and what do you want the readers of this book and listeners here to take away?

Sam: 04:12 Yeah. This book really grew out of a series of conversations and observations that I've made in speaking with folks on the kind of end user and enterprise side over the past, maybe six months or so. Those conversations all had a very similar flavor, and the general idea is that as enterprises, there's been a ton of investment in machine learning and artificial intelligence and data science over the past two to five, in some cases, more years, but those projects have all been characterized as individual proofs of concept or POC types of efforts. Sometimes, I describe them as individual snowflakes. A data science team will get, in many cases, embedded in a line of business, and they'll work to understand the unique challenges that that line of business is dealing with and the data that they have available and how to turn that into a machine learning model and build some kind of system around that.

Sam: 05:46 These pilot projects have been ongoing. They're all starting to mature right about now or many of them are starting to mature, and the folks that are leading these machine learning and data science groups are all kind of reporting to me that they're experiencing success with these early projects although they're not without their challenges, and that success has led to a tremendous kind of swell of interest on the part of their businesses, the lines of business that they support to do more, to get more of these models into production, and to kind of AI-enable, if you will, kind of more of their projects to bring more machine learning to their organization. They see the promise and the vision and some early success, and they want to accelerate and scale their ability to get these models into production.

Sam: 07:02 With that as an objective, the way that they've been doing things to date isn't necessarily the way that they're going to achieve that goal. Treating each individual project as a snowflake, if you will, isn't the way to scale. My observation was what we can figure out, every individual enterprise can kind of figure out the way to scale on its own or they can look at what some early adopter organizations or AI innovators have done and kind of draw some lessons from those. That was really the genesis of, not the fifth eBook but an AI platform's series of podcasts that we launched last year that really took a look at what folks like Facebook and Airbnb and LinkedIn and Comcast and Shell and others have done to really industrialize or scale the way that they deliver machine learning, and in particular, the platform technologies that they've built to support ML at scale.

Sam: 08:19 This first eBook really looks at this challenge from the bottoms up because one of the common or key factors in scaling machine learning is being able to provide a distributed infrastructure for, we'll talk about this late in a lot more details of data management, experiment management, and model management. This eBook really looks at what some of these folks have done with Kubernetes to achieve this goal, and that'll be the topic of our conversation today.

Ronak: 09:03 Awesome. That's great. Before we dive into the book, I thought I'll explain or highlight to listeners here that we are talking about two separate trends, which are converging, coinciding in time really to enable the next generation of applications and have them go in production faster. AI, which includes machine learning and deep learning, has been around for some time now, and Kubernetes, which is obviously an open source container orchestration system pioneered by Google that enables applications to go into production very, very quickly through workload portability using container technologies, and so on.

Ronak: 09:57 I also show some references, one to an article here from datacenterknowledge.com saying Kubernetes and AI is a marriage made in IT heaven. The search volumes on Kubernetes is also on the rise since the last three or four years, and also a blog post on Anaconda. In a nutshell, MAPR is definitely not the only one that believes that Kubernetes and AI ML go hand in hand in productionizing applications. Sam, every technology goes through this hype cycle, right? Where are we with these technologies, and do you see organizations and enterprises gaining real business value out of these? What's your opinion?

Sam: 10:59 Yeah, I really like the way you characterize these as converging trends, and I think that's the right way to think about it. I referenced earlier kind of my conversations with folks on the enterprise side and where AI adoption is in terms of folks kind of getting their feet wet with pilot or POC projects. Deloitte released a survey recently in which they spoke to enterprise early adopters and found that 55% of the executives that they surveyed said that their companies launched six or more pilot projects, which was up from 35% the prior year. Surprisingly, nearly the same percentage were reported to have undertaken six or more full deployment, so it's really no question that ... Well, there's certainly no question that AI is kind of on people's minds.

Sam: 12:09 There's also no question that enterprises are taking it seriously and investing in trying to get up to speed with AI. In fact, in that same Deloitte survey, the vast majority of companies, 84% said that they plan to increase their AI investment in 2019 with a bit more than half of them reporting that they're planning to grow their investment by more than 10%. That investment, I think, is going to continue and it's representative of the importance, strategic importance with which organizations see AI.

Sam: 12:59 Kubernetes is on a kind of a similar trajectory. For its primary use case, which is supporting web and cloud native applications, it's much more mature, and I think in a lot of ways, you could argue that Kubernetes has become a de facto standard platform for deploying cloud native types of applications. For the AI use case, though it's a bit earlier in the lifecycle, but at the same time, when I speak with AI early adopters, these are folks like Airbnb and booking.com and others, they're all taking the approach of building machine learning platform technologies with Kubernetes as kind of the underlying substrate to manage the physical resources or connect the workloads to those physical resources.

Sam: 14:06 In fact, at the last KubeCon, which is the big Kubernetes conference, the one in December in particular, which was held in Seattle, there was an AI track that was very well attended, several hundred people, all very focused on this challenge of providing infrastructure to support machine learning and AI workloads, and so this is a ... Again, it's early, but these are trends that are converging, and there's a lot of growth and investment happening in these converged areas today.

Ronak: 14:57 Great. Awesome. Thank you. Let's talk a little bit about why sort of you're seeing that these trends happen now, right? I personally like this part of the book where you've described these four areas of technology or rather four technologies that are coming together to truly sort of enable AI in production, deluge of data, innovation in hardware, data center, cloud, improved performance of algorithms, and sort of the plethora of tools available to data engineers, scientists, IT, et cetera. You also have some examples, and you talked about booking.com. You also have another example from Home Depot in the book. Do you think this is merely a timing thing that's playing out, which is creating the hype or were the business teams, LOBs always asking for these use cases to be implemented, but they just weren't possible? Tell us a little bit what is leading to this hype or these trends.

Sam: 16:21 Yep. We'll talk a little bit in a moment about kind of defining AI, machine learning, deep learning, and these things, but artificial intelligence is not a new concept. It's been around for a very long time, but it wasn't until relatively recently that it became something that offered kind of accessible, tangible value to enterprises. That has a lot to do with a confluence of these four things. The amount of data that organizations have been collecting has been growing exponentially for many years. That's something that we won't belabor that point because I imagine everyone on this call has been following this trend for quite some time. Likewise, the availability of both the cloud and specialized hardware has been a key boon to the development of new approaches to AI, in particular, deep learning, which we'll talk about because it's very resource-intensive.

Sam: 17:48 Having hardware easily accessible has really driven the advancement of some of these new or improved methods very quickly over the past few years. That has all led to the ability to iterate on and refine and develop some of the core algorithms that people are very excited about, namely deep learning, which once again we'll define in just a moment, and the kind of promise and excitement that deep learning offers have led to a proliferation of new tools that serve to make it even easier to lower the barrier to entry to building applications based on these types of ... based on machine learning and specifically, by tools, I'm talking about frameworks like TensorFlow and PyTorch among a very broad set of tools that are used in this space.

Sam: 19:09 I think all of these these factors have come together at the same time to enable a set of advancements around machine learning that has cost the interest and attention of enterprises and enabled use cases that I think, yes, to your point, Ronak, they were latent use cases. I don't know that they are necessarily sitting around on a list. Hey, we always wanted to use machine learning or AI to do X, Y, Z, but the popular use cases are things that organizations have been trying to solve by other means for a long time. For example, fighting fraud or reducing or predicting customer turn or determining next best offer or dynamically pricing or making recommendations. Many of the machine learning applications that are often kind of the low-hanging fruit or the first places to start are problems that an enterprise really understands well, and machine learning offers a new technique that they can apply to solving those problems.

Sam: 20:37 Others, things like computer vision, natural language processing, these were all possible, but they were much, much, much harder before, and so they're kind of re-entering the enterprise radars as possibilities now that that some of these advancements have made them much more accessible.

Ronak: 21:04 All right. That's a really good answer. Let's move on a little bit. Now, let's break this down a little bit here. You bring out some really good points in the book about software 1.0 going to 2.0. You distinguished between machine learning, deep learning, deep neural networks, and about Kubernetes. Now, how do you define machine learning? How do you define deep learning and Kubernetes as you have in this book for our listeners here? You already talked a little bit about how they are related. Any more thoughts on that before we jump to the next session?

Sam: 21:52 Yep. I'll kind of walk through some thoughts on this and some definitions and also try to incorporate some of the audience questions that have been submitted so far because many of them kind of fall into this, how do these things all fit together type of perspective. Yeah, I guess I'll start with kind of a definition of AI, machine learning, deep learning, just to set that context.

Sam: 22:27 You know, when I think of artificial intelligence, I think of systems that perform ... they can perform cognitive tasks and do things that we think of humans as being able to do. Granted this is a very kind of a soft definition and in a lot of ways, it's a moving target ... There's a bit of a meme or a joke that artificial intelligence is kind of the thing that we can't yet do, right, in the sense of for example, we all use AI or machine learning on our phones all the time like predicting the next word in our text messages or emails. At some point, it just becomes what our phones do, and it's not really AI anymore, right?

Sam: 23:21 That is a bit of a moving target, but there's ... When you think about artificial intelligence, there are lots of ways to achieve artificial intelligence. One way to achieve artificial intelligence is just to hard code a lot of rules into some kind of system. When there's some input, the system takes its rules and responds accordingly. In fact, much of the previous ways of artificial intelligence was based on these kinds of systems, what are called expert systems or rule-based systems.

Sam: 24:06 What's been really exciting more recently has been the development of machine learning and improved machine learning tools and processes and algorithms that make it more accessible, and so what machine learning is in contrast to a system that is based on a bunch of hand coded rules, what machine learning introduces is the ability for the computer to automatically determine its own rules based on a training process. That training process is one in which you provide your machine learning system with a bunch of data. Typically, that data is labeled, meaning you've got some data, you have the examples of the things that you want to predict, and then you run this through this training process, and it produces a model. That model is essentially kind of the rules, but again, the computer figured out the rules. You didn't have to figure out the rules.

Sam: 25:26 Maybe this is a good place to introduce an example. Think about the process of doing facial recognition on images. This is something that's been done for a long time. Historically, the way this is done is you would run it through a system that essentially use things like edge detectors and pattern detectors and things like this to try to determine where there are features like eyes and noses and things like that, and you would kind of construct systems based on essentially rules based on these patterns or features.

Sam: 26:18 If you think even more broadly about how you might build a facial detector based on images and think about kind of how you might construct rules to do that, it is not an easy process. That's been changed by machine learning, and in particular, deep learning in that you basically create a dataset that has lots of pictures, and then you label those pictures as has a face or doesn't have a face. You run it through this training process and "magically," your machine learning process learns how to create a model for us, which can then detect these faces. That's machine learning. It's a statistical technique for creating models based on data.

Sam: 27:20 Deep learning. Just as machine learning is a subset of AI, deep learning is a subset of machine learning, and it's using the same statistical process, but to train a very specific type of model based on a neural network or in particular, a deep neural network. Without getting into a lot of technical details, these deep neural networks are kind of characterized by having many layers that are ... not individually, but that are kind of collectively trained. These layers form the kind of the basis of a model that can be used to make predictions.

Sam: 28:11 One of the interesting ways to think about the opportunity created by kind of modern approaches to AI deep, learning in particular, and this idea of having the computer create the rules was characterized by an individual named Andrej Karpathy who is now the head of AI at Tesla, and he wrote a blog post about software 2.0. It really captures a much more elegant detail, the idea that I just expressed, this idea that having computers write the rules based on the data is a game changer in a lot of fields and opens up a whole new way of thinking about software. He talks about in this post and in some of his presentations how kind of the software 2.0 is eating away at the software landscape at Tesla and other places and allowing them to do some pretty amazing things.

Sam: 29:18 There are a lot of kind of adjacent technologies that support this. You hear about frameworks like PyTorch and TensorFlow. These are frameworks that typically ... or and these two cases are living on Python ecosystem, Python being a programming language that has found a great deal of use and favor in machine learning. PyTorch and TensorFlow are frameworks for expressing machine learning programs, and in particular, deep learning programs, and supporting this training process. We'll talk a little bit more about how Kubernetes supports training and AI, but in most cases, PyTorch, TensorFlow, and those types of frameworks are complementary to Kubernetes. If you use a Kubernetes to scale your ability to train PyTorch and TensorFlow types of models, one is not a substitute for another, and that in particular addressing a question that Chandra Muli asked.

Ronak: 30:47 Yes, that's correct. In fact, you answered a lot of questions here that I see in the chat window. Let's move on to my favorite topic, which is machine learning. You already talked a lot about this entire process. It's a complex process. You acquire data. You start to build the models. You start to train them and there's a supervised learning, there's an unsupervised learning process. You deploy them in production, and then you start to monitor them, monitor the models, sort of evaluate the performance of the models, and then feedback any corrections during back to the training. Anything else that you'd like to add here for those listening?

Sam: 31:44 Yeah. There are a couple of, I think, interesting takeaways here. I think at the top level is the idea that machine learning and data science more broadly is a very highly iterative process. It's an iterative process of iterative processes or composed of iterative processes, so the types of folks that are involved in this process, for a long time, we've kind of lumped them all under this banner of data scientist. More recently, that role has been evolving and specializing in such a way that now, the way folks tend to talk about it is you've got this data scientist or in some cases, they're going to be called the research scientist that is really kind of at the front end of exploring the use of these kinds of statistical techniques to solve some kind of business problem.

Sam: 32:59 Often, they're kind of at the ... They're trying to like invent new algorithms and things like that, but there are other roles like a data engineer who's really focused on making sure that the data is available to even create these models, so building out data pipelines. You've got machine learning engineers. These are software engineers that have familiarity with machine learning models and the frameworks like TensorFlow and PyTorch and how to build engineered systems out of these tools, and make sure that they are production quality or production ready,

Sam: 33:57 When I think about this machine learning process, I think about it in terms of these three core disciplines. The first is data management, and it's everything that goes into making sure that data is available to both build models as well as make decisions based on models in production. Typically, that involves a lot of data acquisition and data preparation during kind of the exploratory phase, but even once that exploratory phase has resulted in a model, all of the things that you did to prepare the data to train a model need to be built into a repeatable pipeline, so that when your model is in production, you can give it the data that it requires in order to make a decision.

Sam: 35:05 The experiment management discipline is where model training, and development comes in. This is a big area where the difference between kind of every project as a snowflake and a more industrialized process comes into play here. Here's there's a lot of ... Actually, in many of these stages, there's just the iterative nature of the stages makes for a lot of opportunities for automation, so within experiment management, there's an opportunity to automate a lot of the experiments that go into determining the best set of models and model parameters that the field calls hyperparameter optimization. That is typically an iterative process. When done manually ... Data scientist would kind of run these experiments and try the different model parameters that the were exploring and spreadsheets. Now, we've got software tools that can do all of that or much of that for you.

Sam: 36:30 Then, on the model management side, that's everything that goes into your "last mile" of getting the mile or actually into production. There's a lot of care that needs to be taken when you do that. For example, you're making some assumptions that the statistical properties of the data that you trained on are the same as the data that your model is seeing in production. You really want to have tooling in place to check those assumptions to make sure that your model continues to perform well, and your model needs to be instrumented so it's easy for you to manage its performance over time. All of these kind of tie together into this machine learning process, and opportunities for automation and really where Kubernetes starts to come into play to support performing this process at scale.

Ronak: 37:49 Right. All right. That gets us to a little bit further in the machine learning area and sort of how do you take that to scale. You talked about proof of concept, and then putting a pipeline together and then putting use cases into production. You highlight in the book, and you explain about elasticity and multitenancy, immediacy, and programmability, What does mean for the compute technologies? Do you already see a lot of GPUs and tensor processing units, and what is the role that they play in this? In addition, that is also a question on, what are some of the soft skills required of a data engineer, data scientist, or machine learning engineer? Maybe take that question as if you can.

Sam: 38:55 Okay. For me, this is really about looking at the characteristics of the process that we previously described and what are the things that we can do to scale it. If you think back to kind of data management, what we're really doing there is we're establishing some data pipeline analogous to like ETL or ELP, but we need to be able to run this pipeline both in kind of a batch-oriented mode when we're training our models but also in much more of the real-time mode when we're doing production inference. That requires an ability to scale across, in many cases, a large amount of infrastructure.

Sam: 40:07 On the experiment management side, we're training models. Depending on the type of model you're training, this can be very compute intensive. Deep learning, for example, is ... Training deep learning models, these are used for tasks like computer vision and natural language processing. These can, depending on the size of your dataset, the things that run for weeks, days or weeks or even months. Not only do they require kind of raw scale, you're starting a training job, you're kind of ... This is an iterative process, so training jobs kind of come and go, and so there's this desire if you're going to dedicate a lot of hardware to machine learning, for there to be some elasticity both in terms of being able to grow the amount of compute you've got access to over time but reallocate on the fly the way you're allocating your compute between training jobs and inference jobs, and some of this data pipeline work.

Sam: 41:29 You want all those it to be programmable so that you can manage the underlying infrastructure requirements, the APIs. That's really what Kubernetes is doing. Kubernetes is kind of that layer that is making the underlying infrastructure kind of manageable. When we jump into talking about some of the details of Kubernetes, I'll explain some of those core concepts.

Ronak: 42:07 All right. That's the perfect segue into this slide. At a high level, we know that containers allow workload portability, and Kubernetes enables that orchestration of containers, right? We also know that Kubernetes has its roots in virtualization technologies, VMwares of the world, and so on. That really allows IT teams to abstract the infrastructure components into virtual environments, and hence, making application development easier. You have these two diagrams in the book. Tell us how is Kubernetes and then within that pods, containers, volumes, persistent volumes, how are these helping the IT teams do all of this better, easier, and helping application developers?

Sam: 43:16 Yep. You touched on some of the core concepts there. At the highest level, what Kubernetes is really allowing me to do is it's providing levels of abstraction in the way I think about both the resources that I have available to me from the compute perspective and the applications or workloads or software that I'm deploying out to those resources. Within Kubernetes, you've got this concept of a cluster, which attracts your hardware resources, and then you've got this notion of a pod, which is really a collection of containers. It's kind of the basic workload unit in Kubernetes. Fundamentally, what Kubernetes is allowing you to do is to deploy your pods out to your cluster, right? That's through a process called scheduling.

Sam: 44:20 What's unique about this and important is that scheduling is done declaratively. When you deploy a workload to Kubernetes, you specify in its configuration file the requirements of that workload, and then Kubernetes takes responsibility for figuring out which specific nodes to deploy the workload on, so you might specify in the plan for a given pod that, hey, this workload requires ... it needs to be on five nodes and they all need to have GPUs. You don't have to go in and say which five nodes or specify which ... think about which of those has your GPUs in it. You just specify the requirements, and then Kubernetes knows the capabilities of your nodes and can put the workloads all in the right places.

Sam: 45:28 What this enables is a separation of concerns in a sense from between the application developers in the case of the traditional cloud native app development use case or the data scientist in the case of the ML and AI use case. The data scientists can worry about what the requirements of their training job is, and the team that supports the Kubernetes cluster, whether that's IT or machine learning infrastructure team, they worry about specifying to Kubernetes what the capabilities of all the nodes are, and Kubernetes handles that mapping between the two. In doing so, it allows the data scientist to focus on kind of their lane and the IT to focus on theirs.

Sam: 46:29 This is all done continuously, meaning the scheduling process is done continuously, so if a node fails or new nodes are added or new workloads are added, Kubernetes is constantly working to ensure that what everyone said that they needed is available. If you're training workload is deployed to, and you said you needed five nodes and it's on five nodes but one of those dies, it's going to recruit another node from elsewhere in the cluster to support the contract that it's got with you. That's kind of the first piece.

Sam: 47:12 The second piece is how we support really stateful applications. In the real world, most applications are stateful in some way, and certainly in machine learning and AI, applications are stateful in that they need access to data both in training and in production. Kubernetes provides these abstractions called volumes. There are different types of volumes, but the main distinction is between kind of standard volumes, which are managed at the pod level, and persistent volumes, which are managed at the cluster level. It's through these volumes that Kubernetes is able to allow workloads to connect to different data sources.

Sam: 48:09 For example, you may have a data pipeline that ingests a bunch of web logs into a data lake or data warehouse. Your infrastructure team is set up at persistent volume. They connect to that data lake or data warehouse, and now, as a data scientist, you can access those web logs in a training job that you are using to create, for example. a recommendation model. You don't have to think about creating physical access to this data lake. You just are telling in your configuration that you need access to this logical persistent volume and the folks that manage to Kubernetes cluster will make sure that that is available. Really, the big benefit of Kubernetes and the way that it's supporting these types of applications is in managing this type of abstraction. We'll elaborate on that a little bit more in the next slide as well.

Ronak: 49:29 All right. Before we talk about this slide, there's a question on does MAPR support Apache Spark for ETL ELT pipelines? Short answer is yes. Please go to mapr.com to find more details. All right, so I agree that Kubernetes containers are powerful combination of technologies, and I'd like to remind the listeners here that MAPR plays a very important role in the market here to solve these challenges that you see on the slide here with the ability to bring silos of data together, stitch it in a single fabric, and manage data assets across your enterprise, whether data is on premises, in the cloud, or at the edge. It sort of gives a unique flexibility to your IT teams to enable the needs of the business. Anything else you'd like to add here, Sam? We're running a little short on time, and we have a little bit more content, so I'll let you talk anything else on the slide if you like.

Sam: 50:47 Yeah. I'll kind of point out the highlights here. I think as infrastructure organization supporting data science and machine learning process, there are three key things that we need to do. The first is to get data to our workloads, the training and inference workloads. Kubernetes supports that as we've discussed through these notions of volumes persistence and regular volumes. On the back end there is a robust ecosystem of vendors that provide connectivity via these volumes. Kubernetes simplifies getting data to your training and inference workloads. Third-party solutions like data pipeline, automation tools, and data fabrics make this even more powerful.

Sam: 51:58 The next thing that we need to do is we need to use our compute effectively. We need to be able to support multiple, simultaneous projects. We need to allow some agility between the different workloads as they're shifting over time, and just make that all easy to manage. Kubernetes provides that through the abstractions that we've talked about.

Sam: 52:25 Finally, we need to eliminate the complexity of doing all of this and allow the data scientists and machine learning engineers to focus on the things that they care about and allow infrastructure people to deal with the messy details of the underlying compute infrastructure, and that as well is a big thing that Kubernetes could provide. Containers also really help with ensuring consistency because you can train a model, capture it as a container, put that same container into production, and you're not trying to figure out ... You're kind of ensuring that the thing that you have in prod is the thing that you've developed in tests, which is kind of the same ... Fundamentally, it's the same concern that led to the development of Kubernetes and the popularity of Kubernetes for supporting your traditional DevOps types of use cases.

Ronak: 53:34 All right, so let's quickly talk about the ecosystem here. Lots of players, lots of technologies, frameworks, and you touched on a few of them. Anything you'd like to add more on this slide?

Sam: 53:54 Yeah. I think at this point, I will refer folks to the paper, which if they haven't already taken a look at, they can download from your website. The key thing that I wanted to convey here was to provide in a kind of a point in time snapshot of some of the tools and technologies available in this ecosystem. Many of which are not mature. In fact, many are very immature but provide interesting examples of what's possible when you combine Kubernetes and machine learning and deep learning. In fact, this paper was published right before KubeCon and a bunch more stuff came out, so we'll be continuing to update this, but it's a good way to get a sense for the various pieces that you can put together to help automate and scale your machine learning and deep learning processes.

Ronak: 54:54 Okay. Now, you have some good examples, booking.com, OpenAI, which is obviously a nonprofit organization that's trying to make sure AI is safe and widely used, so great examples. Anything you'd like to add with respect to what's working, what's not working, and how do you see MAPR addressing these challenges? I'll talk about our platform as well in the next slide.

Sam: 55:31 Yeah. In addition to referring folks to the eBook, we published a series of podcasts with some of the folks that we profiled in the book and others, Facebook, Airbnb, OpenAI, LinkedIn and others, Shell, and Comcast, and that's available. Most easily, you can get to it via twimlai.com/aiplatforms2018 and these are podcast conversations that really dig into how folks are supporting ML and AI. I think to the point about how MAPR is addressing these challenges, I'll go back to the slide a few slides ago. A fundamental requirement for all of this stuff is data, and you can't do anything if you can't get your data to your models when you're training them and when they're in production. That's where MAPR comes in in providing a fabric that makes it easy to connect existing enterprise data stores, data lakes, data warehouses with native support for Kubernetes and making all of those data sources available to your workloads.

Ronak: 57:09 All right. This is a high-level diagram I wanted to show all the listeners here about the MAPR data platform. MAPR really has re-imagined the foundational layers of applications and introduced a new layer called dataware. Dataware, it's an attempt to essentially decouple data from the applications, the construct of applications, middleware, and hardware. As a result, you as an organization sort of get control. You get to control data security, placement, access, tenancy. I saw a lot of questions along the way on all of these topics. Sort of control data independent of any other layer. Like I said earlier, whether you have data lying in the cloud, on premises, at the edge, the dataware concept is unique and sort of plays out really well in terms of implementing ML DL applications and putting them into production using persistent volumes and so on with Kubernetes.

Ronak: 58:36 All right, so to conclude the session here, I hope you enjoyed this Q and A with Sam. I did. Please go to mapr.com for the eBook that we just talked about and some more materials, some more eBooks that I'm showing here. Back to you, David.

David: 58:56 Great. Thank you Sam and thank you, Ronak, and thank you everyone for joining us. That is all the time we have for today. For more information on this topic and others, please visit mapr.com/resources. We will, as I mentioned in the chat, we will be sending out a link to the recording as well as the slides and some additional asset shortly after this event. If we did not address your question, we will be following up with you to ensure that we get you an answer to your question. Thank you again and have a great rest of your day.