Introduction to the Zeta Architecture – Whiteboard Walkthrough

Contributed by

12 min read

Editor's Note: In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, gives you an introduction to the Zeta Architecture, a high-level enterprise architectural construct which enables simplified business processes and defines a scalable way to increase the speed of integrating data into the business.

Here's the transcription:

Welcome, my name is Jim Scott, Director of Enterprise Strategy and Architecture at MapR and this is a Whiteboard Walkthrough. Today, I'm going to be talking about the Zeta Architecture and give you an overview. This is a next generation architecture, nice pretty diagrams, a few bullets that we are going to run over. This next generation architecture is meant to be handled as enterprise architecture. What that means is this is going to form the foundation of a business.

Typically what happens is most new businesses will come along and they'll start building their enterprise applications and what they'll do is they'll actually mash solution architecture and enterprise architecture together. What this is, is this is actually a re-imagining of the enterprise architecture on a big data platform. If you start over with your big data and you start separating out your applications, you could build your applications on a proper enterprise architecture which means that all of the benefits that come from this enterprise architecture will be able to realized in all of the solutions you build for your business.

Each of those solutions may encompass one or more applications but the goal of a solution architecture is actually to solve a specific problem. The goal of the enterprise architecture is actually to deliver the benefits to the enterprise as a whole that you'll need. Let's go over some of those things that we are going to need in this next generation architecture.

First off, all of the hardware that you have in your business we want to be able to leverage it all. What that means is that if you have database servers, you have web servers, whatever they may be, we want to be able to utilize them in a heterogeneous fashion. We don't want to have little isolated pods of compute power sitting in different spots in the business because typically what you see is that with things like web servers is that they'll only have about a 5% utilization over an entire day or they may spike from to peek to period but through the rest of the day they are sitting mostly under utilized.

The next goal that we have is isolation. What I mean by this is that we want to create isolation for your processes, for troubleshooting, things of this nature that will help you identify when there is a problem. This is a very important facet to enterprise applications and this is actually a driving feature as to why people have done things in enterprises that they've been doing for many years now, and so we want to create isolation that will support the operation of the business and support troubleshooting when there are issues.

The next thing is we want to improve operational processes. Operation processes take time to build, define, follow and train people on and because of this we want to make sure that all of the processes that we have in place are as simple to handle from a business standpoint as possible. When you have a lot of different types of systems in your business, you have to have systems administrators for each of those different types. One of the goals that we would have with this type of operational process is we want to make it so that people will have an easier time administering the enterprise.

The next thing that we want to talk about is DevOps. We want to improve these processes. A good DevOps procedure to build software and move it from development to QA and to production is to typically build the same binary and just move it between environments. Some problems exist with configurations and moving those between environments because the shape of the environment between development, QA and production is typically very different which can make those configurations very different as well and so we want to make sure that DevOps is considered as a major facet of this new enterprise architecture.

Then the last major goal that we have is we want to be able to enable real-time business continuity. What is that mean? Well, effectively this means you don't want to have to wait to recover from any type of major fault. You don't want to have to wait in the case of a disaster. If you have to two data centers running, you don't want both data centers to be lost or you don't want all of the traffic that your business may have on one data center to just die. You want it to go to another data center. You want to make sure that all of the data you have in data center A is in data center B in real time. That if data center A does in fact blow up, you know that you haven't lost any data or in the worst case scenario you've only lost a few bytes, a few kilobytes.

These are the major goals that we have for this enterprise architecture. Next, is this nice hexagon that we have. In this hexagon what I've done is I've laid out the different components that it will take inside of an enterprise to deliver this architecture. At the very bottom we have the distributed file system. I've placed it in this location for a specific reason which is it's at the foundation supporting all the rest of this platform.

On the side we've got real-time data store. This is necessary because we need to be able to handle operations in real time. One of the things that people have learned over the last few years dealing with big data is that batch processing alone isn't enough, things are moving toward more real time activities and we need to be able to support that. Online transactional processing systems require real time. On the other side we have a plugable compute model and execute connection. What this delivers us is the ability to plug in any type of platform that we want like Apache Drill, Hadoop MapReduce if you are using that still, some people are transitioning away from it. You can even use Apache Spark in this instance.

Above this what we've got is we've got containers. The entire concept of a container is to create an isolated environment that you can move between machines and have no concerns about the environment changing. On the other side as I mentioned earlier solution architecture is here. The solution architecture is here because it goes beyond just this physical pieces of software and it actually gets into how you want to start tying all of this different pieces together, how you want your interactions to work, it could even have an impact on the libraries that you use like machine learning algorithms.

Then on the top are the enterprise applications. The easiest example of something that fits in that category would be like a web server. That web server would have the ability to write directly the distributed file system or log directly to one of the real time data stores. However you want to configure it to benefit your business and then in the very center we have the global resource management. Now the reason I placed this in the center is because it touches everything in some capacity. Once you have this in place and once you know how to run your business in a more dynamic way, it kind of fades into the background like a good administrative interface should.

Everything revolves around it but the base of the platform is here in the distributed file system. When you look at it, these two components right here are the key components to deliver this enterprise architecture. Now some of the software that falls into this in the open source community would be things like MapR XD, HDFS, Amazon S3. Over the real time data store you have MapR Database, you have HBase, I don't put something like Cassandra in this category right now because Cassandra doesn't play on the same distributed file systems as other people or other platforms rather.

Over on the plugable side, I mentioned a couple of examples before but there are many more that are out there and then in the container area, if you look at the options that are Near Doctor is probably the most prominent, but there are others like Rocket, BicoralS, there is also the Kubernetes Project that Google contributed. Over in the solution architecture, if you look at things like the Lamda architecture, if you look at machine learning algorithms, recommenders

Then on the top if you wanted to look at an example of building out a business application or say perhaps an advertising platform, this is something where you probably use something like a recommendation engine, you'd probably build docker containers and you'd have something like MapR Database on top of the MapR Distributed File and Object Store and you'd probably have MapReduce, you'd have Spark, you'd have Apache Drill for all the different business intelligence needs that over here.

Then in the center you'd have the ability to use something like Mesos for resource management, you can use YARN and you can use Myriad which actually enables Mesos to manage YARN which gives you the ability to dynamically spin up YARN clusters. All of these things combined create a platform that enable you to dynamically expand and contract the resources in any part of your business. Whatever is most important now is where you can put your resources at.

At the end of the day, this architecture is going to give you the flexibility you need to stop worrying about all of the little details underneath, focus on your problems of your business, solve them and deliver them on a platform and take and derive all of the benefits of everyone of this components combined. No more log shipping from your web server to distributor file system or your cluster then separate because it's now one in the same. This architecture is the next generation enterprise architecture.

That's it for this Whiteboard Walkthrough. Please follow us on Twitter @MapR and if you'd like leave comments below if you have questions, you have concerns, if there is things you'd like to know about this that weren't answered here. That's it for this Whiteboard Walkthrough – thank you!

This blog post was published July 01, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now