Show 3.2: Kubernetes – What it is, Why it Matters, and How to Get Started - Part 2

Podcast transcript:

Jim Scott:

Hi. My name's Jim Scott, the host of Data Talks. Today, we're back for our second session with Michael Hausenblas, from Red Hat, and Sebastian Goasguen from Bitnami. And we'll be discussing topics covering server-less technology, also known as function as a service, container technology, and the all important, ever elusive, how to actually store your data for technologies like Kubernetes and your containers.

So, I want to bring back up a topic that we touched on earlier. I know Michael's probably champing at the bit for this topic. So, server-less, aka function as a service, is a topic that I've been close to for quite a while. Sebastian, what do you think is the driving force behind the rise in popularity of serverless, or function as a service deployment model today?

Sebastien G.:

For me, I come from the user side. I was a Computational Scientist for a long time, so I needed to have access to infrastructure to be able to solve computational problem. Especially electromagnetics problem, nano electronics, computation, and so on. So the question for a lot of users is "how do I access the computing power to solve my problem? How do I access, you know, potentially a lot of storage for my data."

So, with serverless, it's another level in simplifying the way that you access compute power out of a high-level mindset. So, of course you can build containers and then deploy them, or build an application, run it on bare metal and so on, but it's still quite complicated. You need to learn API's and things like this. So the next step is "let's use a PaaS," and then here's my application." And now you take care of my application, you take care of scaling it. And of course scheduling it somewhere in the data center, scheduling it, building all the routes so that the networking works and so on.

The bottom line is that the user who is really developing the application doesn't really care about, you know, what's happening under the hood. Okay? And that's where the tension happens with throwing things over the wall, right? The user would really like to throw things over the wall. Okay? But we need to have a stronger interaction, we need to have empathy and so on.

But serverless is just yet another iteration along that spectrum. It's another iteration on the PaaS spectrum, at least to me. And the idea is that you're now decomposing your application in much smaller business-logic unit of function, you know? And that's why they are called function. And you just deploy those functions. So, the granularity is getting finer. It's easier for the user to deploy. And the big kicker to me in all of this is that functions are now leveraging cloud-services. So functions are tying cloud-services together, they're stitching cloud-services together to be able to build a much more interesting application.

And the case in point: it's data pipelines. You know, you want to drop data into an object store, when that data is dropped you want to call a function for some type of post-processing. Once the post-processing happens you want to admit the data and then you want to store all of this in the relational database, and then you want to stream something to some other clients. That relatively complicated data-pipeline is now made much, much more simpler with using cloud services, and little functions that you deploy. So that's why people are getting super excited about it. It's a way to build very interesting distributed applications that in the end are going to be managed by the system with a super fine granularity, which makes it easy to deploy. The user only cares about the function code. And then there's things like auto-scaling and fine-metering and so on.

Jim Scott:

Yeah, that's really great. And the thing that I find to be really important here is at the end of the day, businesses want to be able to write software, get it out; they want to be able to put in place service levels and expectations. So, the specifics on the underlying storage aren't necessarily important to everybody. They're important to some people who are managing against those service levels. And being able to make those changes, or to hide them at an appropriate level I think is a great abstraction. And I think that that's one of the real driving forces behind this technology stack.

Sebastien G.:

You should ask Michael, because Michael has a ton of views on this.

Michael H.:

So, I'm a huge fan of function as a service, and I still regret that I called that book a few years ago ... a report, like 60 pages, whatever, serverless ops rather than function as a service, test ops. But the takeaway there for me, essentially, was that it's great. It's a great fit for many, many really interesting event-driven workloads. But I doubt that many of the bigger corporate users or environments have really thought it through what it effectively means. I'll give you a very great example.

Imagine you, rather than splitting up a monolith into ten microservices that you can deploy [inaudible 00:06:03] for example. You split it up into 200 or 500 functions. And let's say for the sake of arguing, you're using AWS Lambda, which is we have a huge head start, which is dominating the market. Who do you guess will be on pagers? There is no ops. And you'll certainly not get the AWS Edmonds on your pagers, so your developers, the person who writes the function, will be actually on [inaudible 00:06:35] Are you prepared for that? I, again, I'm a huge fan of function as a service, I think it will have a very big share in five to ten years' time. But people need to think about the consequences. If you're not prepared to put your developers, so the people who write the functions and upload them, on pagers, or make them available for any kind of "oh, this doesn't work, this has broken down, there is a problem here," then you probably want to think twice about it.

Jim Scott:

That's a really great point. Making sure that people understand how to manage what they've been building. The actual day to day operation is very important. You did bring up one important point, though. I want to call it out here as you mentioned AWS Lambda being able to put the stuff out there, not having someone manage it. It really is critical that people acknowledge and understand where you run your function as a service is very important to manage and moderate your business. Right? The types of service level that you have, your lock-in, your performance levels, all of that type of thing.

So guys, I was wondering, as we were just discussing the function as a service concept, there's a really nice project that I've come across. It happened to be the entire reason why we picked up this podcast in the first place, which was, I was asking Michael about a project called Kubeless. This is where he brought up Sebastian to me and I was wondering, can you, Sebastian, elaborate on the merits of Kubeless and your role in the project?

Sebastien G.:

Definitely, yeah. So Kubeless is, shortly put, meant to be a clone to AWS Lambda for people who want to benefit from Lambda functionalities on premises. So, definitely Lambda is extremely useful, but then you're tied to the AWS cloud, you're tied to the services that you can access on AWS. The whole idea with Kubeless is that you should be able to have these Lambda-like functionalities, so you should have a FAS functionality on premises. You can deploy those functions on premises. Then you can use them to stitch together on-premise services or potentially even start building some hybrid type solutions that bring in services from the cloud providers on premise. That's the thirty-second summary.

Jim Scott:

So what's your role in the project?

Sebastien G.:

So I created it. It started over a year ago just after I came back from Kubecon in Seattle. There were quite a few talks in Seattle, especially from Brendan Burns, he talked about his project, Metaparticle, and then there was a talk by Kelsey Hightower about compiling and application to Kubernetes. The whole idea for me that resonated was this concept of you're writing an application and it feels like you're writing an application that's going to run serially and that's going to run locally. In fact when you're compiling, there is this sort of auto-compiler that's going to distribute your application and break it up in micro-services so that different parts of your app that you've written locally for a seemingly central operation, those different parts are actually going to become distributed micro-services.

I came back from Kubecon with those ideas, and of course there was AWS Reinvent where everybody went crazy about Lambda, and Lambda has been here for about two or three years already. I was thinking, hey, Kubernetes is actually the perfect platform to build a FAS. So we should be able to do this relatively quickly because we have all the API objects, we have all the primitives to build a powerful system. So I started hacking with one of the guys who was working with me at Skipbox, Tuna, Adrien is his real name. So he came to my house, and we had a four-day hacking session together, and we put together Kubeless as a Kubernetes-native system, which makes use of something called custom resource definition. A way in Kubernetes to extend the API. We used that and we built a controller, and very quickly we had the basis for Kubeless.

Jim Scott:

Well that's pretty cool. When I read through the documentation on Kubeless, did I take it correctly that it doesn't have a direct dependency on a specific container technology like Docker?

Sebastien G.:

That depends entirely on how you set up your community's cluster. Kubeless is not tied to a container runtime. It's tied to Kubernetes. So if you set up Kubernetes with Rocket or CRI-O, you can use Kubeless using those runtimes. Right now, I think ... we have to be fair, most Kubernetes deployments are on the Docker runtime.

Jim Scott:

Oh, sure, yeah. I just wanted to get the distinction between it depending on Docker versus it just depending on Kubernetes and then working with whatever Kubernetes was set up to work with.

Sebastien G.:

Right, yep.

Jim Scott:

So, Kubeless sounds pretty awesome to me, quite frankly. It looks like it's got the makings of what could be the framework for people going forward to be able to deploy and build function as a service in their environments that they're leveraging Kubernetes. Michael, is Red Hat getting behind Kubeless at all?

Michael H.:

So, yeah, Kubeless, as you said. It's an awesome environment and an awesome way to do function as a service on top of Kubernetes. I think the buy-in and the ... energy that [inaudible 00:12:52] and Sebastian's team put in there actually shows that this is a real, serious contender. Sure, I think we [inaudible 00:13:02] ... can't name names but there are [inaudible 00:13:07] customer prospects that have big interest in using it in production. So ... From Red Hat's perspective, I would say we try to be neutral in the sense of we support different ways to do function as a service on top of [inaudible 00:13:28]. So we're not telling people, you can only use X, or Y, or Z. But we certainly work together on Kubeless there and the integration bits, especially now, around the - so quotes here - these "custom resource definitions" are one of two [inaudible 00:13:47] items there.

Jim Scott:

Alright, so, if we kind of take what you just said and break it down a little bit, I've personally researched a handful of different frameworks that can be used for a serverless function as a service infrastructure. What are some of the others that you see people using, Michael? And if you have any shortcomings, or this is better than the others, because every time I talk to people about this topic, there's a lot of open air around it. They want to try it, they're not quite sure which framework to use, they don't know the pluses or minuses of one versus the other. What knowledge can you share around this for everybody?

Michael H.:

So the first big distinction that I would make is, are you using ... if you subscribe to the idea of function as a service, the unit of deployment being a function, a stateless, very short running function that is executed, and someone else is taking care of all the permissioning and scaling all the scaling and so on. Their first question is where do you want to do it? Do you want to do that in a public cloud environment, where you obviously benefit from the autoscaling and ... you actually only pay for what you are using there? Or are you doing that in ... on premises? In your own data center, where still, typically, you cannot ... rack something up within microsections. That typically takes a few weeks or a month. In the first case, in the public cloud setting, AWS has such a head start, especially where you mentioned the testing introduced in 2014. And they've been doing AWS, [inaudible 00:15:24] has been doing such an awesome job in terms of integration, and that's the bit that is really hard.

There are three parts, right. You have the triggers, like any kind of ... it could be a file upload to S3, or it could be time as a trigger, or a HTTP call, or whatever the trigger is that triggers the function. Then you have the management bit in the middle that does the actual function execution, there you're talking about, probably under the hood you see containers, and isolation and what not. And then, the most interesting part where AWS has a huge head start, is integration pits. That's essentially ... because these functions are so limited, every sort of state needs to be managed outside. And if you don't have that integration in place, then you're rather limited with what kind of workloads you can do.

So for example, let's say you have a function that converts some image based on some user event, a like or whatever. Then you need to sort that image somewhere, and that is for example in S3 or somewhere else. If you don't have that integration with your serverless or FAS framework in place, then you can only do so much with it. So looking at which one to choose, I would definitely go for that seeing how well the triggers are supported or which triggers are supported. Most of them are pretty good at that, and then the other part to it is the most important part [inaudible 00:16:56] to really attack serious enterprise workloads is that integration with anything out there to keep the state around.

Jim Scott:

Okay, so, one of things you mentioned and Sebastian mentioned, and I think it's really important for people to understand is, there's a lot of analysts out there that they're now getting into talking about this type of functionality. The very basic, most simplistic way for people to understand this type of technology is, it's literally a microservice that's wrapped and deployed with a series of triggers separating out the deployment model. To simplify it and say, here's how you auto-scale it, here's how you deliver a service level, right? It's just some small unit of work that is effectively equivalent to a function, deployed, wrapped, all set up. Now, when we look at something like an AWS Lambda, the analysts out there are basically saying ... and I've not heard one person dissent from this. If you program specifically against their API, vendor lock-in is off the scale. It's mainframe equivalent. So that's why I really like keeping my eye on these types of projects like Kubeless to ... make sure people are aware that they can do this type of a programming model outside of the cloud specific APIs.

Do you guys have any thoughts on the binding to the cloud specific APIs?

Sebastien G.:

Definitely, when you start using a cloud provider and you start making use of one of their services that they are the only one to provide, you're getting into a pretty severe cloud lock-in. Definitely you don't see any efforts across the cloud providers to come up with standards, whether it's ... a standard to define virtual machines or provision virtual machines, with the basic EC2 or even S3. So even those basic services, the industry didn't really come up with a ... a standard for those. So as soon as you starting going to more exotic services like the machine learning, the data streams and so on, those APIs are going to be very cloud provider specific, and when you use them, well, it's going to be that much harder to actually move to another cloud service.

So Lambda ... I'm not too worried about the actual interface of Lambda, what is a lambda functions, because projects like Kubeless or even OpenWhisk ... we can relatively easily I think take a function, and then deploy it within another system. The big trick is really the event sources and what's going to trigger that function. That's the big problem, that's where WS has a huge advantage, and you know, it's all props to them again. But they can trigger function spaced on Kinesis Stream, on S-tree, upload, SNS ... SQS events, so they're looking at all their services and they're making sure that notifications from all those other services can be used as triggers for lambdas. And that's the real kicker. That's something that's a huge help for people building serverless based applications. And you don't find that on the google cloud or even a little bit on the azure cloud, but not that much. For kubeless, that's definitely something that we're making sure that we build from scratch, with a definition of a trigger, and then making sure that we can enable those scenarios. Specifically so that we can provide a way to ... not being too tied to the cloud provider.

I don't know if my answer makes sense or if my [inaudible 00:21:01] agrees, but that's ...

Jim Scott:

Yeah, I think it makes perfect sense, and hopefully ... honestly, you've gotta give props to amazon, right? They did things, people liked it, they adopted it. Others are following. So they've done something right, but, my really hopeful takeaway for people is, just be thoughtful about what you're doing because if lock-in is a concern for you, know what's going on. That's it.

Sebastien G.:

And one word on AWS, I've always been a big fan of AWS, but I'm also an open source guy, so I'm a little bit sad when I see EC2 for example, and all the efforts that an entire community put into building EC2 clones, whether it was open stack or cloud stack or open nebula, you know, all those efforts. And then efforts of trying to push a standard for VM provisioning. And then you know, AWS doesn't really engage in those efforts, of course because their bread and butter is the EC2 service. So, I'm very curious to see what's going to happen now. It seems that they are going to reinvigorate the open source office, so I'm looking forward to seeing more presence and more activity from AWS in the open source world. Most probably through the cloud native foundation, it's going to be very interesting to see what are they going to contribute to Kubernetes now that they announced that they were building EKS. Then to see what they're going to do through the CNCF.

Jim Scott:

Excellent. So, I've got a question, and it might seem like it's a contentious question, but I think it's really critical to the future of these technologies. So, we all know Docker has garnered the lion's share of the marketplace for containers. But - Docker as a company ... I think pretty much everybody in the marketplace has seen dwindling away, Docker swarm is effectively going to disappear, its adoption rate has plummeted from the people who were testing it and actually pulling it into their environments. Do you guys feel ... we'll start with Michael. Do you feel there's anything happening within Kubernetes that's going to cause any further drop around Docker adoption? Maybe people choosing the CRIO implementations that are out there, maybe the Intel implementation of CRIO? Anything like that?

Michael H.:

So I'm not going to comment on any company rated things like Docker Inc, whatever their business model or whatever. I'm really trying to focus on the technical bits. There ... yes, absolutely. So CRIO certainly has a real chance to meet the requirements that a lot of cluster [inaudible 00:24:08] have, and has a good chance, there are a couple of companies, including ourselves, including Red Hat, are behind that. It will be part of [inaudible 00:24:17] shift very soon.

On the other hand, I also don't think that this is a criteria that keeps people awake at night in the first place, right? So no one is gonna choose [inaudible 00:24:30] distribution purely based on, does it support this or this one thing. Most people, I would guess, are just gonna run with whatever is the default there unless they have already experienced issues [inaudible 00:24:46] or whatever. And they know, quote-unquote they know what they're doing and potentially choose a different runtime there. So I don't know if that satisfies ...

Jim Scott:

Well, the reason for my question is literally because there's still a lot of people out there who have not started containerizing applications. And every once in a while, I hear someone say, well, I don't know, we're thinking about starting with Docker, then I'm reading about the CRIO or Rocket or something. You know, they'll have come up on something else, they'll have worked their way into the equivalent of a Docker solution, and then they're kind of contemplating, should I choose this or should I choose that? If this is on a decline and this is on an incline, then maybe I wanna choose something else. It's literally just because, I do believe there are some people still questioning whether or not they should go whole hog into a Docker container environment, or if these others are really gonna pick up at a rate fast enough for them to say you know what? I'm just gonna go there. Because it gives me more flexibility and I don't have to worry about it.

Michael H.:

Right. The point of CRIO really is that as a developer, or as someone who's managing these container images there, you do not notice a difference to your previous Docker ... Sebastian, is it 1.12, or 1.13 that we're currently using in Kubernetes? So you do not notice that. Right? You do not notice ... you're not creating different types of images or whatever. Luckily, we're positioned that for the images, and for the ... way that Kubernetes interacts with these runtimes, there are standards. There is OCI for the container runtime and images, and there is CRI, the container runtime interface, for how a specific runtime interacts with Kubernetes. With the Kubelet on the work node. In order to launch containers, pods, so on and so forth.

So from a pure end user perspective, there is no difference. That's the beauty of it. If there would be, you would say, oh but now you're using CRIO so you need to create different kinds of container images, or whatever, that would be a problem, and that would be enough to go [inaudible 00:26:57]. Not sure if I explained myself well, but ...

Jim Scott:

No, I think that makes perfect sense. Did you have anything you wanted to add to that, Sebastian?

Sebastien G.:

So the runtime, I think choice is always very good. So the fact that you can use ... Docker runtime or CRIO ... runtime or potentially Rocket if you want, even though Rocket is ... seems to have stalled a little bit. I think a choice is always good for other users. So that's one. Interestingly from a Kubeless perspective, as a user, what I'm trying to do or what I want to give through Kubeless is the ability to deploy apps without even knowing that there is a container. So the UX of Kubeless, and you'll see where I'm going with this. The UX of Kubeless doesn't make you build a container. Okay? You don't do any Docker build, or any Docker tag or Docker push. Okay? You just have your function and then you say, hey, Kubeless, deploy this code. And then it goes.

So you're not aware of the potential Docker UX, you're not aware of what runtime you're using. So it could work in anywhere. So, you all, overall, the choice of runtime ... I think it's extremely positive and we've seen some very good movement in the community as Michael mentions, with OCI. We've seen Docker, Inc, the company moving container D and their CNCF. So there's been a lot of extremely good movement to actually come together as a community and agreeing on ... a certain level of the runtime and certain level of the formats and so on. So all of this is extremely, extremely good.

Now, you know, you also ask a question about Swarm. You know, we have no ... idea of what's the future of Swarm as Docker Inc intentions its future to be. Definitely the fact that they just put Kubernetes inside the Docker for Mac, and then soon the Docker for Windows, is definitely a sign that they are seeing demand from their own users and their own customers, they're seeing a demand for a better integration with Kubernetes. So that's extremely, extremely interesting and is a great testimony to the strength of Kubernetes more than it is a sign of the weakness of Swarm. So let's ... we always should, you know, be extremely positive and look at it from the glass half full, which is ... you know, it's a sign of Kubernetes strength and not a sign of Swarm's weakness. I'm being very political here.

Michael H.:

I have an analogy that might help illustrate that. Especially ... you will remember the times when [inaudible 00:30:13] started out, and people ... initially didn't quite understand that it's the same interface. It's [inaudible 00:30:21], it's mass produced, it's just better implementation. Right? And that's the same fear. You have the same interface, you have a drop-in replacement for what you're already running. Why do you care? Well, you want better performance, more reliability, whatever, in terms of implementation. As long as the APIs [inaudible 00:30:40] into there is given, you don't have to change your images, so on and so forth, you can benefit from it. Make sense?

Jim Scott:

It does.

Sebastien G.:

If I can pick up on this and going sideways a little bit, we're seeing different ways to build containers, for example, I ... every now and then I work with Basil, the built system ... or Bay-sil, the way you pronounce it ... because you can build containers with Basil. So you don't write a Docker file anymore, you don't do a Docker run. But the output is a Docker image. Okay? So that's one thing. But the most important thing to me in terms of container, is actually what's inside the container. All the different systems that we've talked about, they can actually take a Docker image format, but the big question is, as I.T. professionals ... and in your enterprise, what are you putting in that image? How are you managing what's inside that container? If there is a CVE, you know, how do you detect that CVE and how do you rebuild the images? Are you running as known root inside the container?

Jim Scott:

What is a CVE?

Sebastien G.:

The security vulnerabilities. So if there is a security vulnerability that comes out, like Spectre for example, I mean, ... even though it may not be a good example here, but if there is a vulnerability in a library that you're using inside the container. How do you detect that and then trigger rebuilding a container image? So, you know, that is not going to change whether you're using CRI or Docker or anything else, and that I think is the super important part of this entire discussion, is the life cycle of your software, the life cycle of your application. And making sure that you're only packaging what you need. You don't need to put an entire distro inside an image. So that's the really, really important bit.

Jim Scott:

All right. Well, I'm gonna shift us a little bit, but it's specifically going to be a shift based off of things that both of you have said. You've both mentioned data in different ways. And the ephemeral nature in use case is that people are choosing to put on Kubernetes to get started. What I'm curious about, is are there any ideal pieces of infrastructure that should be in place to ensure that the user will be successful in their implementation of Kubernetes? And let me just give you an example.

So, with Map-R, for the strata event, we're announcing that we are now supporting persistent volumes with Kubernetes and Map-R working together. So when you look at that as a piece of infrastructure, we don't actually care about the underlying infrastructure that you're running on from the physical server to the ... network you're running on. But we're now going to be providing a persistent volume for data storage, which allows the software developer to not have to worry about where the data's gonna land. Both static and dynamic models are going to be there. I'm curious - what is your guys' take on critical pieces of infrastructure for success within a business to enable Kubernetes in this type of a view?

Michael H.:

If I can give it a stab, I don't really think that there are that many directly [inaudible 00:34:21] things ... so absolutely great to have something awesome around storage like Map-R provides, or having an awesome software defined networking layer and so on and so forth. That's all very, very helpful. But in reality, what I see where people often struggle is that they want to benefit from Kubernetes, but don't necessarily want to put in the initial groundwork. They don't want to invest into the things that are required to get there. By that I mean simple things like, actually, I have all my source code in Git or whatever, [inaudible 00:35:04]. Or I have a CICD pipeline that is actually accepted and you know, everyone is on board with using that CICD pipeline. Because at the end of the day you need to produce a container image that Kubernetes can then pull and run as a basis for a container and a pod.

So if these pieces of infrastructure and ... not only the infrastructure but also the social component there, is not in place, then I fear most folks will have not such a great time using Kubernetes. Maybe I'm a little bit too pessimistic and I would love to learn what Sebastian thinks about that.

Jim Scott:

Sebastian, do you have thoughts?

Sebastien G.:

So I actually, to be quite honest, I don't really understand the question.

Jim Scott:

Well, let me elaborate, then. [crosstalk 00:35:54] within Kubernetes, Michael said early in the questioning that we had, that it's really easy for people to get started with stateless applications.

Sebastien G.:

Mm-hmm (affirmative).

Jim Scott:

Right? So stateless applications only go so far because stateless applications, they tend to not do much. So when you look at stateful applications, stateful applications depend on being able to read and write data. So, whether it's writing log files that are gonna be monitored, storing metrics for performance analytics, data center monitoring or actually doing database work, things of that nature. What are critical pieces of infrastructure that you say are required? My example was persistent volumes. That's one way of exposing a location for the software being managed by Kubernetes to read and write data, because regardless of where that container gets spun up in the infrastructure, if it has to be stateful, it has to have some place to read and write, and a persistent volume is the abstraction that enables that capability without having to be bound to a physical piece of hardware.

Sebastien G.:

Yeah, yeah, definitely. So, I think, you know, when Kubernetes started, the persistent volumes and persistent volume claims, I would have to go back to the actual time when the ... when they were created. But at the beginning, the data story or, I should say the storage story was not that strong. But then, things evolved in Kubernetes, and now you have a concept of storage class, of default storage class, which allows operators to say, what's physically my default storage provider? So these days, if an application developer, you know, needs one gigabyte, 500 megabyte, you can't just declare a claim, or declare a need for storage. If the ... Kubernetes cluster has been configured probably, dynamic provisioning will take over and we'll provision that persistent volume. That persistent volume, ideally, ... depending on the underlying storage, should be available on all nodes. Okay?

So all of that to say that the storage story, and the capabilities in terms of storage have greatly improved in Kubernetes over the last two years. And when I do Kubernetes training, I do a basic demo of running Wordpress. And these days, if you do a basic Wordpress deployment on GKE, the google Kubernetes engine, and you create a persistent volume claim for backing up your MySQL database, Google automatically provisions a GC persistent disc and attaches it to your pod wherever it lands. So now you get into a situation where, even though this is stateful, the actual underlying infrastructure is quite good for stateful. Dynamic provisioning of GCE, it's autoformatted, it mounts wherever the pod lands. So this is actually quite good.

So that's one part of the discussion. The other part is that there is a lot of work still going on with standard spec like CSI, the container storage interface, which is kind of the equivalent of CNI for networking. You know, roughly speaking, basically a standard spec for attaching storage in Kubernetes. So there is work going on in CSI. Then we are seeing new software join CNCF. Things like Rook, which under the hood uses CEPH distributed storage. So ... then, probably the Map-R solution ultimately that will come in play. So we're seeing more and more high performance distributed storage solutions that will come in play using, you know, standard specs for using them as PVs.

So I think the stateful story is actually getting stronger and stronger. But that said, it's not because the stateful story's getting stronger that you should start putting all your apps on Kubernetes. Now, I would recommend people to indeed start with stateless, start simple, actually think about what you're trying to do. What's your goal. Why are you actually using Kubernetes? Why are you containerizing that part of the app? And then, you know, move slowly from there before tackling the hard problem that needs some advanced storage configuration, and scaling and so on.

Jim Scott:

Yeah, and if I can give an analogy just for people to understand, what you just said is exactly what I've been telling people for years about all of these new fast-moving, highly scalable technologies. You've gotta get comfortable with the technology before you start running and doing everything with the technology. Don't just jump in head first and say, 'woohoo! Everything's gonna be amazing!' Because it is a sure way to have failure.

So, I am curious, when you look at something like a persistent volume. You guys talk to a lot of people out there. Do you hear people say, oh yeah, ... we're using a NAS or a SAN or we're using the NFS interface specifically and this is how we're handling dynamic volumes. And I'm asking this outside of the cloud use cases specifically. Because if you're running at Google you're gonna use their storage, and if you're running at Amazon you're gonna use their storage for the Kubernetes plugins. But when it comes to any people running these in their own data centers, or separating that out from the cloud, is there specific hardware they're using for their storage?

Sebastien G.:

I can just say a quick thing about this. When you think about moving a workload to Kubernetes, you should break things up and try to tackle the easy bits first. So what I've seen is ... a lot of people, or I call databases, are making use of a [inaudible 00:41:52] database. And instead of starting talking about moving a database inside Kubernetes, you know, first what you should start doing is head the actual logic that's using the database. We can containerize that running in Kubernetes, and then bring in the external, or I call database, as a headless Kubernetes service. You bring that in as a Kubernetes object and you start using it.

So you can do it with other types of storage as well. Let's say you have a CEPH based object store, or S3 based object store that's not running in Kubernetes. That's actually running in an existing infrastructure. I'm not gonna call it legacy but it's running outside your Kubernetes infrastructure. And now you deploy your bits that make use of that S3 storage, you deploy it in Kubernetes, and it accesses that S3. So it's really a step-by-step process. First, start deploying ... that business logic inside Kubernetes, containerize it, deploy it, make use of the existing storage infrastructure. Make sure that all of that runs, and then you can start thinking maybe about, hey, actually how do I run my S3 objects ... inside Kubernetes? Is it a menu based system? Is it a Map-R based storage underneath? GFS based, whatever. Michael, I don't know if you have ...

Michael H.:

Yeah. Yeah. I think I find I largely agree with what Sebastian said there in terms of the strategies ... to sum it up in one sentence, keep your stateful persistent part off of Kubernetes in the beginning, wrap it up in an API and start with the stateless part. ... Consume the database [inaudible 00:43:49] as a service or whatever, and then move it on later there to Kubernetes. One thing that hasn't changed, though, is the fact that data has gravity. And Jim knows exactly what I mean by that. Our old CTU, [inaudible 00:44:03] Map-R used to say that very well. That means, you know, especially if you talk about multi-cluster deployments, fail over or whatever, it's not that easy, right? It's [crosstalk 00:44:14]

Jim Scott:

It's a really critical topic. It really is. And I'm glad you brought it up because its actually one of the reasons I'm excited about Map-R finally having a persistent volume plugin for Kubernetes is because I do believe Kubernetes has dominated the install base when coming to cloud deployments. But I think that the benefit here is I've got Kubernetes, which could effectively be set up and run anywhere. And when I look at something like Map-R, which can be set up to run anywhere, and I can have my persistent storage available in my on-premise data center. When I want to take workloads and run them in the cloud, or have a disaster recovery scenario in a different cloud provider or run multi-cloud and not have to worry about any vendor lock-in through that data gravity scenario. Right, where I've got petabytes of data storage, and, you know, I might have a mixed data center model, and to me, this is one of the most important pieces of this capability is that it's finally started to open up the door to it's more than just the cloud offering. It's run it anywhere you want to.

Michael H.:

Right. And that's where I think we still see, ... we as a community, people who are using Kubernetes in production still ... it's very, very early days. Right? We have not yet enough data points for establishing best practices around that. My hope is, to be honest, since I'm fairly familiar with [inaudible 00:45:48] offering that what you guys have done and provided and contribute to that space helps us improving this really hard issue, right? I mean, multi-cluster deployments especially around stateful stuff is really, really hard.

Jim Scott:

It is. It's complicated and until people have gotten their fingers into it to really see how it works and where the potential failure points are, they do not grasp it. Even something as simple as, you know, when I got to a lot of conferences and talk I tell people, I'm like, hey, look. I've got this great tweet that I reference about Amazon S3 having failed. Right? And it's happened a couple times in the last year. And it's not to say don't trust the cloud. It's literally to say, plan for anything to be able to fail. We had, when I was an ad tech, a data center provider turn the power off to our cage. Like, you don't plan for that. It's a scenario of your whole data center just exploded. Right? I mean they turned the power off, we were able to recover everything, and we got back to it. We didn't miss a beat though, because we had planned for complete data center failure. But until you go through it, it is absolutely implausible that you're just gonna fall backwards into "yep! It worked perfectly!" It's just, it's complicated.

Michael H.:

Yeah, I saw that tweet the other day, and that proves that I do actually read tweets, Jim. Said, if you don't have backups with the database, you don't have the database. You have a prolonged state of optimism or something like that. Wonderful.

Jim Scott:

Wow, this went really fast. This is all that's happened that we have for today. Please join us for tomorrow's episode when we continue this discussion with Michael and Sebastian. We'll be covering social media influencers and role models that these guys have. We'll also be discussing the foundation for creating a roadmap to ensure success with these new technologies, and how to make the most or least of the cloud. As well as our rapid-fire segment. There's a couple of additional resources worth mentioning, kubeapps.com, in order to find all of the applications that you can easily run on Kubernetes. And also, I wrote a book titled A Practical Guide to Microservices and Containers, which can be found at mapr.com/ebooks. For all of us here at Data Talks, I'd like to give you a hearty thank you for listening, I'm Jim Scott, on twitter @kingmesal. Be sure to tell all of your friends, your coworkers, your loved ones about Data Talks podcast. Before we sign off, I'd like to leave you with one final thought.

Subscribe to the Podcast

Be the first to hear our newest interviews and other DataOps topics

Subscribe Now

Data talks - A Podcast to Help You Drive a DataOps Culture

Subscribe