10 min read
In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.
Note: This video describes what is required to support micro-services architecture with streaming data. If you would like to know more about a message transport technology that meets these requirements, see Chapter 5 “MapR Streams" in the book Streaming Architecture or read about Streams as part of the MapR Data Platform.
Here is the unedited transcription:
What I'd like to talk about today is some of the key requirements for streaming platforms. In particular, key requirements for streaming platforms in the context of micro-services. Now, micro-services have been defined as services that have isolation, and limited context. Isolation meaning that the implementation of other services, other than the particular one that we're looking at, can't really matter to the one that we're looking at, as long as the inputs and the outputs all meet the standard contractual deals between different services. These services have limited context, meaning that changes here can only change a small range around us. Our ins and outs might format change, or something like that that would require our neighbors to change what formats they accept, but we can't be changing this, and changing a service over there as a consequence. Those two requirements are enough for us to understand a lot about how the streaming connections between services might happen.
Let's review for just a moment. I'm going to write services as blocks, and they're all going to be blank today, because all we're talking about is services in the abstract. Services can interact with other entities, either other services, or with people, or user interfaces, or anything like that, by one of two general means. One is a query response sort of service that's often implemented in micro-services using REST technology—some sort of HTTP-mediated request and response—or you can have stream-oriented interactions between services. Often you even see them together. Close to the users, you usually get more query response, but closer to the back-end analysis, you very often get much more streaming. As I said, you can even get some of each. You might have a user interface action of some sort—perhaps a query engine, or recommendation engine—where there's a query that must have a response quickly, but then there's some deferred work. Updating of statistics, modulation of the overall recommendation model, that sort of thing. That work can be deferred out of the critical path of the query response, and so we have a transition to a streaming sort of data transfer.
Let's concentrate for now on streaming. Now, we talked about how we wanted isolation between the processes involved. We've got two processes, two micro-services here.
What does isolation really imply? First of all, the implementation here, and the implementation here have to be relatively independent. The details of how you do your job should not matter so much as long as this service and that service do their job within the constraints of the organization. Now, one implementation might be that these things share resources sequentially. One uses the resources, and then the other. This would mean that they might be implemented in some sort of batch-wise fashion. Even though you think of them as streams, they might run for a little bit—the first one, the one on the left, and it would stop, and then the one on the right might run for a little bit, and then stop. Each one would have full access to all physical resources while it's running, but it would not interfere with the other, because the other would get full resources when its turn comes.
Now, one of the consequences of that, that's a fine implementation, but we might change the implementation also. The timing of when those run, exactly whether they run intermittently or continuously, all of those are implementation details that should not matter. If this is run intermittently, there will be a time when it runs, and a time when it doesn't. If this is run intermittently, there will be a time when it runs, and a time that it doesn't. As such, there might well be a time that this one sends a message and exits, and this second process is still not running yet. The implication here is that any messages sent out on the stream must be persistent. There must be some streaming infrastructure that holds the message while neither of the services is actually executing. Thus, the first requirement of an effective micro-service streaming infrastructure platform is that it has persistence—persistence is our first requirement.
Now, another thing that we want to have here is a bit of uniformity. We want a uniformity because if there isn't uniformity in exactly how every service sends messages and receives messages, then we have coupling, effectively, because we have to investigate the choice of which streaming infrastructure is made, and that's a way that the implementation of one service affects another.
We want uniformity, and to get that uniformity in the presence of choice, we have to make at least one streaming architecture (streaming platform) be completely easy to use, free, like as in air. One property of the air that you breathe is that you don't normally worry. You take a big, deep, deep breath, will there be enough air in the room for you to get that whole breath? There must be enough performance headroom in the streaming architecture, the streaming platform, so that you can take as deep a breath as you like, so you can send as many messages as you like as quickly as you like. That's our second requirement, and that's performance.
We have two of the key parameters here, persistence and performance. But in addition, we also really need to have scale in order to make it a free and easy decision. A free and easy decision like, “Should I take another breath”? Yeah. “Should I take two? Two small ones? One big one?” We need to be able to have a sense that the stream can scale, not just perform in the moment, but scale in terms of how many messages it can buffer, how long they can be retained, how many topics there might be, so we need persistence, performance, and scale. Each of these has a number of nuances to them, but that's a nice easy way to remember which platform you need to select.
You need to select one that will persist messages for as long as you might like. Some days that will be seconds; other days it might be weeks. It's very convenient to have messages stick around for quite a while so that you can debug old problems, or so that you can add on new services that want to look back in time a little bit. Sometimes the persistence might be desirable to extend even back to years. That allows you to hold onto the raw input of a service as long as any regulatory agency might require it.
Performance. How high can it go? Well, I know people already who need 10,000,000 messages per second for certain services. It's quite plausible that things go that high. Even if you think your business is much smaller, much lower speed—to have the freedom to use messages at any point, you probably want to have head room at least to hundreds of thousands, perhaps millions of messages per second. In the extreme, you may want performance into the hundreds of millions, or billions of messages. That would let you have enough headroom so that you can't ever find the boundaries. The boundaries will be out of reach, therefore it will feel unlimited. Scale comes in different ways: how much data is stored, how much variety is in the data that's stored. For instance, if we're talking about Kafka here, how many topics can we have? Thousands, or millions, or billions? Picking how many topics can substantially simplify the job of one service or another. Different technologies will give you different boundaries there. I think it's important to pick technologies that have as large a scale boundary as you can find.
Those are, as I see it, the key requirements of streaming platforms. The key requirements that enable you to have true, independent, decoupled micro-services. Thanks very much.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.