A Practical Guide to Microservices and Containers

by James A. Scott

Application Agility

Digital transformation is built upon a foundation of flexible applications and infrastructure. This requires a different approach to building applications versus those that have been used in the past, one that is driven by the need for agility. Traditional, monolithic applications don’t lend themselves to agility. Agile applications may be used in many contexts. They typically perform a limited number of functions that can be called via APIs. A simple example of an agile application is a newsletter sign-up form or payment widget which can be embedded in a website. The application performs a single function, but may be used in millions of ways in millions of places.

Agile applications are enabled by virtualization, which enables them to be launched quickly and shut down quickly when no longer needed. Two important infrastructure elements of agile applications are microservices and containers – we’ll discuss each here.


A microservices architecture is a method of developing applications as a network of independently deployable, modular services in which each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. Think of it like a honeycomb. Each cell in a honeycomb is independent from all the others and may be used for a different purpose. Each single cell isn’t very useful, but when combined with each other, a strong and flexible network is created that supports many uses.

There’s nothing new about this concept. The vision of a service oriented architecture (SOA) was first sketched out in the 1980s as a way to unify monolithic islands of automation into a functional goal. It’s no coincidence that the concept of SOA arrived at the same time as the internet made large-scale peer-to-peer networking possible.

Applications based upon services need other services to manage them. The concept of middleware is based upon this idea. Middleware coordinates groups of services to ensure that data flows smoothly between them and that services are available when needed. The enterprise service bus (ESB) is another way to coordinate services. This is an integration architecture that uses a common communications layer (the bus) to orchestrate a variety of point-to-point connections between providers and consumers. For example, an ESB may call upon separate services for a shopping cart, a credit approval process and a customer account to present a unified checkout window to an online shopper. In most cases, a common data storage layer is shared by all services.

The difference between a microservices approach and an ESB is that microservices architectures have no single coordinating layer. Each service communicates independently with the others. This enables applications to be assembled quickly and flexibly by teams working in separate functions. They can each be written in different languages, which gives developers flexibility to match the language to the task. Each service can be developed using the programming language most appropriate to the task. The goal in microservices development is to deconstruct the application into the smallest possible parts so that services can be shared and combined easily with other applications.

There are several robust frameworks that developers can use to build microservices-based applications. Frameworks incorporate libraries of essential services that developers need to build microservices-based applications. Here are some examples.

  • Spring Boot is a highly regarded framework for dependency-injection which is a technique for building highly decoupled systems. It’s known for simplicity, flexibility and support for distributed programming techniques like inversion of control and aspect-oriented programming. It also permits developers to choose between multiple web servers like Tomcat, Jetty and Undertow.
  • Jersey is a RESTful Web Services framework for development of RESTful webservices in Java. RESTful refers to a popular development technique in which messages between microservices are handled with the web-standard HTTP protocol. Known for its ease-of-use, Jersey provides support for JAX-RS, a Java application program interface (API) specification for the development of Web services.
  • Swagger is a framework of development using APIs. it enables consistent descriptions of APIs that machines can read and that can serve as documentation. Swagger also automatically generate client libraries for API in a wide variety of languages.

The microservices approach to application development has been enabled by a number of new technology and development techniques:

  • High-speed, low-latency networks enable sophisticated distributed applications to be constructed of services from many providers. For example, an application can call a secure document signing service or fraud detection service in milliseconds in order to close a sale.
  • Containers are lightweight virtual machines that contain only the infrastructure elements necessary to perform a service. They can be launched and shut down quickly with minimal management overhead. Each service can be encapsulated in its own container and stored in a library.
  • RESTful APIs are defined by whatis.com as application program interfaces (API) that use HTTP requests to GET, PUT, POST and DELETE data. They’re a low-bandwidth way for services to communicate with each other using a standard set of commands. This makes them well-suited to loosely coupled applications on the internet. The population of services exposed as APIs is exploding. ProgrammableWeb lists more than 17,000 APIs, up from just 300 a decade ago. Not all microservices are RESTful, but all are message-driven.
  • Distributed databases have multiple services handling data requests. Each service works on a subset of the data and coordinates results with other services via an orchestration platform. This allows for highly scalable applications to be built at low cost using commodity servers.
  • DevOps is an agile programming technique that emphasizes modularity, frequent releases and a constant feedback cycle.
  • DataOps combines the concepts provided by DevOps and also layers in the management and availability of data models. This enables the ability to quickly productionize intelligent machine learning models without the old approach of throwing the model over the wall with fingers crossed that someone else will figure out how to put it into production.

An important element of the microservices concept is that services communicate with only a few other services using lightweight, flexible protocols. This might entail using a remote procedure call (RPC) protocol such as REST or a messaging system such as Apache Kafka or MapR-ES. For developers who are steeped in the techniques of building monolithic applications, microservices requires a different way of thinking. It requires thinking of programs as flows rather than states. The power of this model is its flexibility and scalability.

Microservices and Big Data

It’s no coincidence that microservices and big data have gained popularity at the same time. Both employ similar approaches to managing data. The big data approach, as embodied in Hadoop, partitions data into smaller subsets that are processed close to the data with the batches aggregated into a single result. This creates a highly scalable architecture using low cost hardware.

Why: Types of Scaling

Microservices scale in much the same way. Their modularity supports independent updates/deployments and helps to avoid single points of failure, which can help prevent large-scale outages. There are three basic approaches to scalability1:

X-axis scaling runs multiple copies of an application on a shared data set. Workloads are managed by a load balancer. This approach is relatively simple, but it requires more cached memory and does not scale well with complex applications.

X-axis Scaling

Y-axis scaling splits the application into multiple services, each of which is responsible for a specific function. For example, one service might handle checkouts while another manages customer data. The results are harmonized at the application level. This approach lends itself well to complex applications.

Y-axis Scaling

Z-axis scaling is similar to x-axis scaling in that each server runs an identical copy of the code, but in this case each server manages only a subset of the data with a primary key used to partition the routes. A router sends content to the appropriate partition and a query aggregator combines the results. This approach is more memory-efficient and scalable in transaction scenarios, but it has some of the same complexity drawbacks as x-axis scaling.

Z-axis Scaling Y-axis / X-axis Scaling compare

Microservices are well-suited for modern application development if the design of the application is structured to use services from the ground up. However, microservices can also be called from legacy applications to support new functionality.

Microservices Design Architecture Patterns and Examples

Event-driven (trigger based) microservices provide a structured way to coordinate multiple services in a highly scalable manner by using message queues and parallel processing. Some common deployment patterns for microservices include event streaming, event sourcing, polyglot persistence and command query responsibility separation. We’ll look at each in brief.

Event streams leverage streaming engines such as Apache Kafka or MapR-ES to capture and process streaming data in parallel. Streams of events are grouped into logical collections called “topics,” which are then partitioned for parallel processing by microservices. Events are processed in the order in which they are received, but remain persistent and available to other consumer services for a specific time period or permanently. Messages may have multiple consumers, and services may perform different functions on the same data. This enables high scalability and flexibility, since services can be provisioned on a mix-and-match basis depending on the needs of the application. Converging file, database, and streaming services via a publish-and-subscribe framework also enables analytical workloads to be constructed consisting of a combination of recent and historical data.

Event sourcing is an architectural pattern in which the state of the application is determined by a sequence of events, each of which is recorded in an append-only event store or stream. For example, each event could be an incremental update to an entry in a database. The entry is the accumulation of events pertaining to that entry. The events can be used to reconstruct a series of transactions by working backwards through the events in the stream. Events can also be ver, microservices can also be called from legacy applications to support new functionality.

imagine each event as a change

Polyglot persistence assumes that applications are evolving to use a variety of data storage techniques, often several within the same application. Rather than force-fitting data to the application, functions are defined as microservices, each of which works on the most appropriate data store. This may include applications which use a combination of structured and unstructured data. A distributed file system manages data access across a wide range of applications defined as microservices while an event store serves as the system of record. All changes to the application state are persisted to the event store which enables the state to be rebuilt by rerunning events in the stream.

inserts / updates

Command and query responsibility segregation (CQRS) separates the read model and queries from the write model and commands, often using event sourcing. Traditional, monolithic applications perform write and read functions on the same database, which creates a bottleneck.

monolithic application

In a CQRS model, reads and writes may be processed separately - even in separate databases – with updates communicated as messages. For example, if a user is looking at a webpage containing product ratings and submits a review, the change is routed to a separate command model for processing and the result is communicated to the query model to update the webpage. CQRS works well with event-based microservices, but can add significant complexity, particularly when used with monolithic applications.

rate item event / get item ratings rating events / shopping events / weather / sports

Microservices and Containers

Containers, which are discussed in the next section, are an ideal deployment mechanism for microservices. Containers are essentially lightweight virtual machines that can be provisioned with only the infrastructure and management tools that are needed for the service. They can be stored in a library, launched quickly and shut down easily when no longer needed. Because each is self-contained, they can run different services and tools without conflicting with each other.

Containers are ideal platforms for deploying microservices, but they aren’t yet ideal for every development scenario. For one thing, they were originally intended to be stateless, meaning that they don’t store data locally, but rather retrieve it from somewhere else. Microservices that require persistent, stateful storage require special consideration if they’re to be containerized. Examples would be services that process streaming data or events in a queue waiting to be written to a database. Complexity increases when different data stores are involved. This can quickly overwhelm the capabilities of a network file share.

New technologies are rapidly emerging to make containers more appropriate for enterprise-class applications. One of them is Kubernetes, an open-source platform for automating the deployment, scaling, and operations of application containers across clusters of hosts. Originally developed by Google, Kubernetes is a highly functional and stable platform that is rapidly becoming the favored orchestration manager for organizations who are adopting containerized microservices.

New technologies are also coming online to provide flexible, stateful storage. For example, the MapR Converged Data Platform can accommodate streams, files and tables into a single file system. It scales smoothly and provides a single platform for functions like authentication, authorization and management. Using a converged platform enables microservices to remain stateless without losing the benefits of data persistence. Such an environment is easier to manage and scale.

A Once-in-30-year Shift is Underway2


As noted earlier, 40% of respondents to one recent study are already using containers in production and only 13% have no current plans to use containers in the coming year. Datadog reports that the average company quintuples its use of containers within the first nine months. For a technology that is barely three years old, that is a stunning adoption rate. Even though issues like persistent storage support and security have yet to be fully resolved, IT organizations are moving ahead enthusiastically.

What are containers, and why the groundswell of support to use them? Containers enable each workload to have exclusive access to resources such as processor, memory, service accounts and libraries, which are essential to the development process. Containers run as a group of namespaced processes within an operating system, which makes them fast to start and maintain. They can be configured to include all of the supporting elements needed for an application, which makes them especially popular with developers. Unlike virtual machines, containers can be spun up in seconds and can be stored in libraries for reuse. They are also portable; an application that executes in a container can theoretically be ported to any operating system that supports that type of container.

Kubernetes has been a big step toward making containers mainstream. Kubernetes introduced a high-level abstraction layer called a "pod" that enables multiple containers to run on a host machine and share resources without the risk of conflict. A pod can be used to define shared services like a directory or storage and expose it to all the containers in the pod. This simplifies the administration of large containerized environments.

Kubernetes also handles load balancing to ensure that each container gets the necessary resources. Kubernetes monitors container health and can automatically roll back changes or shut down containers that don't respond to pre-defined health checks. It automatically restarts failed containers, reschedules containers when nodes die and can shift containers seamlessly between servers on premises and in the cloud. Altogether, these features give IT organizations unprecedented productivity benefits, enabling a single administrator to manage thousands of containers running simultaneously.

Docker is the most popular container platform by a wide margin, but alternatives are available, such as rkt from CoreOS, LXD from Canonical and Azure Container Instances from Microsoft. It’s important to note that Container makers have been careful to avoid the standards wars that undermined the Unix market more than 20 years ago. The Open Container Initiative is an industry standards initiative that is working to create a basic set of format and runtime specifications that permit interoperability without limiting innovation. It enjoys broad support. A related open-source project called CRI-O would put Kubernetes at the center, enabling it to work with any container engine that is compliant with OCI specifications.

In the last year or so, containers have become widely viewed as enablers or optimizers of greater efficiency in DevOps, big data implementations, and microservices. This important new role is attributed to their resource sharing compared to VMs. Instead of waiting hours or longer for virtual machines to be provisioned, developers can outfit their own containers with the infrastructure they require and launch them whenever needed. Containers can also easily be deployed across different platforms, including cloud deployments.

Among their many desired attributes, containers can be launched or abandoned in real-time. This makes them a great match for workloads that are subject to sudden bursts of data activity. A standard VM would be forced instead to undertake a fresh reboot process, which consumes time that containers can spend on actual production work. Containers also raise consolidation benefits to new levels, thanks to the fact that there is no need to essentially reboot the operating system every time a new container is launched. In every virtualized workload, the operating system takes up some portion of the footprint. But there is no requirement for operating system overhead in a container environment. That frees up valuable space for additional memory, processing and other vital development resources.

So not surprisingly, the test and development area is a fertile one in which containers are taking root. As security concerns wane over time, and as container development enables greater capabilities for storing data securely, more and more production workloads will leverage this fast-growing technology. Ultimately, developers using containers will seamlessly move their test and development projects directly into production without major porting efforts. This ability to transfer workloads from one environment to another is destined to become much more important in the emerging hybrid IT environment, in which infrastructure is a combination of existing legacy systems, on-premise and off-premise private cloud, and public cloud.

It’s no surprise that containers and microservices have grown in popularity in lockstep with each other; they go together perfectly. Microservices are well-tuned to a container environment because they typically perform a limited set of tasks and are called upon only when needed. Containers are a perfect vessel for microservices. Services can be stored in a library and spun up quickly upon demand, then shut down to be accessed again directly from the library.

As a general rule, containers are stateless, meaning that they don’t contain persistent information. When they shut down, any data that was in memory or stored inside the container goes away. Since microservices are miniature processing engines, they typically don’t require persistent data. Containers also include all of the support software needed to run the application. This minimizes the risk of conflicts and failure due to other environmental variables. Microservices embedded in containers are self-contained, portable and consistent.

The stateless nature of containers can be a problem in some cases, particularly as the number of instances grows. While it is possible to store data Inside containers, it’s not considered a best practice. A better approach is to keep data in a separate data store and then access it upon the launch of the container. Containers enable a wide variety of big data scenarios. For example:

  • A web server can run in a container to enable public access to data without risking exposure of sensitive information in adjacent containers.
  • That web server can selectively pull user profile data from an RDBMS (SQL) database in another container and combine it with analytics data from a third container running a NoSQL database to deliver individualized shopping recommendations without compromising security.
  • Resource efficiency also makes containers good candidates for event-driven applications that use streaming data delivered by Apache Kafka or MapR-ES. Multiple streams can be processed in parallel and combined for delivery to a Spark analytics engine, for example.
  • Machine learning algorithms running in separate containers can access the same data for different kinds of analysis, greatly improving the speed and quality of results.

Containers are quick to launch, but loading data into containers can be slow by comparison. For that reason, it’s tempting to keep a persistent copy of data obtained from, say, a Kafka stream inside the container. The problem is that containers work best when they’re stateless, and storing data inside them makes them stateful, or heavy. A profusion of stateful containers can quickly become an administrative nightmare, as well as a security risk.

A better approach is to separate data into a persistent and flexible data store that can be accessed by any container. The problem with that approach is that not all data stores are appropriate to all types of data. NAS filers, for example can’t accommodate block storage, and some storage subsystems are too slow to handle streaming data at all.

MapR approaches this problem with its Converged Data Platform for Docker, along with a Persistent Application Client Container (PACC). Together, these technologies make it possible for containers to store their operating state upon shutdown and to load any kind of data – including structured, unstructured and streaming data – from a single persistent store. The approach is linearly scalable and provides a single platform that includes authentication and authorization within one global namespace.

One of the most powerful features of containers is that they can be customized using scripts contained in a Dockerfile, which is a text file that contains all the commands, in order, needed to build a given image. Organizations can start with a basic set of container images that provide basic services for a particular application. For example, a web application container might provide an Apache server, local database and commands for opening a particular server port.

Dockerfile scripts can specify additional resources to be loaded or invoked for a base Docker container depending upon the needs of the application. This reduces complexity by separating the configuration process from the container itself. In general, containers should be kept as simple as possible and modified using Dockerfile scripts at load time. Dockerfile itself provides a limited syntax of just 11 commands that cover most usage scenarios: ADD, CMD, ENTRYPOINT, ENV, EXPOSE, FROM, MAINTAINER, RUN, USER, VOLUME, WORKDIR.

These can be combined to quickly build special-purpose containers, as exemplified in this ElasticSearch dockerfile:

# Dockerfile to build Elasticsearch container images
# Elasticsearch Dockerfile

# Pull base image.
FROM dockerfile/java:oracle-java8

# File Author / Maintainer
MAINTAINER Example McAuthor

ENV ES_PKG_NAME elasticsearch-1.5.0

# Install Elasticsearch.
  cd / && \
  wget https://download.elasticsearch.org/elasticsearch/elasticsearch/$ES_PKG_NAME.tar.gz && \
	  tar xvzf $ES_PKG_NAME.tar.gz && \
	  rm -f $ES_PKG_NAME.tar.gz && \
	  mv /$ES_PKG_NAME /elasticsearch

# Define mountable directories.
VOLUME ["/data"]

# Mount elasticsearch.yml config
ADD config/elasticsearch.yml /elasticsearch/config/elasticsearch.yml

# Define working directory.

# Define default command.
CMD ["/elasticsearch/bin/elasticsearch"]

# Expose ports.
#   - 9200: HTTP
#   - 9300: transport
To build an image based upon this Dockerfile, the user simply types:
sudo docker build -t elastic-search
To run the instance, the user types:
sudo docker run -name es-instance -i -t elastic-search

Many organizations are now conducting nearly all of their new development work using containers in order to ensure the maximum portability, scalability and flexibility. Containers are also increasingly being used to host legacy applications. Together with microservices, containers are rapidly becoming an essential building block of agile applications.

Application Agility

Consider the example of legacy banking applications. Constructed in the age of mainframes for use with green-screen terminals by data processing professionals, these applications must now be accessible to any customer from any device. What’s more, they must be intuitive to use. It’s no surprise that many banks have spent years overhauling their application portfolios for the age of self-service banking.

Today’s business environment permits no such flexibility. Consider one of the new breed of mobile payments apps like Square. Its developers must cope with a constant stream of new devices, payment methods and customer requests. The company closely monitors social media activity to measure customer satisfaction and identify bugs, which it has a few days to fix, at best. The most popular mobile apps are updated weekly in order to maintain feature parity with their competitors. In many cases, those apps must be synchronized with desktop versions that work on any platform and in any browser.

The frenetic pace of business in the age of digital transformation demands maximum application agility. Thanks to cloud computing, the barriers to entry have fallen, and the only way market leaders maintain their positions is by innovating faster than everyone else. Switching costs are low and customers have seemingly endless choices.

Machine Learning

In a recent National Public Radio interview, Australian data scientist and entrepreneur Jeremy Howard described how to build a machine learning program that translates back and forth between English and French in near-real-time. “You download a lot of sentences in English or French, write three or four lines of code, let it work overnight, come back in the morning and see if it works,” he said. A programmer can do this without knowing both languages “It’s pretty hard to describe exactly what is done and often I don’t understand at all how my own programs work,” Howard said. He added that he once wrote a deep-learning algorithm that figured out how to spot lung cancer better than a panel of four top radiologists, despite knowing nothing about lung cancer.

Machine learning is the next frontier of application development. The pace of business is speeding up to the point that humans are too slow for some tasks – like live language translation. Machine learning is a form of predictive analytics that charges through large volumes of raw data to look for patterns. It continually tests the patterns and tries to assess their relevance to the task, iterating on the good ones and discarding the bad ones. In some respects it’s like the opposite of a query engine. Instead of submitting queries and searching for answers, the machine suggests queries that yield interesting results.

Agile development is a good use case for machine learning. Libraries of algorithms can be encapsulated in containers and run continuously as processing cycles permit. The results guide developers toward more useful applications. A growing number of machine learning libraries are available as open source. Here are some of the most popular:

TensorFlow - Originally developed by Google for its own internal use, TensorFlow Is highly regarded for its ease-of-use and flexibility. It consists of a Python library that supports a graph of data flows. Nodes in the graph perform mathematical operations, while edges provide data to the nodes. Tensors are multidimensional arrays. Among the notable applications of TensorFlow is Google’s image recognition feature.

Caffe - Caffe is a C++/CUDA deep-learning framework originally developed by the Berkeley Vision and Learning Center. It is especially well tuned for applications related to image recognition and processing. Caffe is optimized for use on graphical processing units (GPU) acting as coprocessors. Machine learning algorithms tested on the GPUs can be easily redeployed into production on the same computer. A new version called CaffeOnSpark enables deep learning models to be deployed onto a single cluster, but be warned that is not production quality.

DeepLearning4J (DL4J) - This is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala and integrated with Spark. It’s designed for business applications using distributed GPUs and CPUs. DL4J can import neural net models from most major frameworks and provides a cross-team toolkit for data scientists, data engineers, DevOps and DataOps teams.

MXNet - This open-source deep learning framework is primarily used to train and deploy deep neural networks. It’s known for its flexibility and scalability, and it is compatible with a wide variety of programming languages, including C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl and Wolfram.

Theano - Python developers are the target of this open source project developed at the University of Montreal and named after a Greek mathematician. It’s a library for fast numerical computation that can be run on a CPU or GPU, and it’s considered a foundational library for creating deep learning models or wrapper libraries. Theano includes a compiler for mathematical expressions and is noted for the efficiency and speed of its code.

Computational Network Toolkit (CNTK) - Microsoft developed CNTK as a way to streamline its own machine learning efforts around speech and image recognition. It’s a unified computational network framework that describes neural networks as a series of computational steps in which each node is an input value and each edge is a matrix operation. It can run on anything from a laptop to a cluster with multiple GPU nodes. It was the primary machine learning engine used to build Microsoft Cortana.

Torch - Noted both for its ease-of-use and its flexibility, torch is primarily used for building scientific algorithms quickly and easily. It comes with a large ecosystem of domain-specific packages for such functions as computer vision, signal processing and image recognition, and is considered an excellent choice for use in a GPU-enabled environment using Nvidia’s CUDA parallel computing platform and application programming interface model.

PaddlePaddle - Not one to be left out of the party, Chinese search giant Baidu released PaddlePaddle as open-source following similar moves by Microsoft, Google, Facebook and Amazon. Baidu says its toolkit is easier to use than others, but just as powerful and more efficient, requiring only about one quarter the amount of code demanded by other deep learning libraries. Among the applications are machine vision, natural language understanding and route optimization.

Each of these libraries has its own set of tools and processes, which may conflict with each other when running in the same VM. For example, many frameworks are written in Python, which has seen seven new releases since the 3.0 version was introduced in 2008. Trying to run machine learning libraries using incompatible versions of Python in a conventional VM could cause conflicts or prevent programs from running. By containerizing the libraries, each can run in isolation without affecting the others. Containers can likewise moderate differences in CPU requirements, cache, network addresses and other environmental factors that might otherwise threaten stability of the entire system.

That’s why NorCom, a full-chain supplier for big data solutions, is using containers as the foundation for the deep learning algorithms it’s developing for use in autonomous driving applications. The company uses a purpose-built deep learning framework to efficiently manage massive data sets generated by sensors and cameras in self-driving cars. By running containers on the MapR Converged Data Platform, Norcom is getting the speed, scale and reliability to enable multiple deep learning applications to analyze data continuously.

Using Docker containers helps the company scale with agility. The MapR Converged Data Platform’s support for multiple deep learning frameworks gave it the flexibility to choose the best framework for each use case. By using a single platform across all data centers, cloud deployments, and edge clusters, the company can quickly roll out containerized machine learning models that can be applied to newly created data anywhere in the data fabric. That data can also be immediately added to the training data set because both reside on the same platform.

NorCom is building for the future by planning to support a broad variety of data types, including files, documents, images, database tables and data streams across multiple edge, on-premises and cloud environments. Containers enable IT to support transactional workloads, with high-availability, data protection and disaster recovery capabilities built in3.

Application Performance Management

IT infrastructure has traditionally been managed from the inside out. The focus was on optimizing the performance of discrete components, such as systems, networks and databases, but without the holistic view of application performance in general. Now, digital transformation is driving organizations to expose applications in a multitude of ways, ranging from APIs to web interfaces to mobile apps. Successful big data deployments continue to get bigger and more complex. Microservices, containers and automation tools make developers more productive and applications more agile, but they also introduce new complexity. With new data sources, new use cases, new workloads, and new user constituencies, managing growth requires a complete understanding of what is currently happening in the system. This demands a more integrated, top-down approach to management that begins with the user experience and works back to the management of underlying components.

The task of managing applications in a traditional IT environment was simple compared to today’s multifaceted and constantly changing workloads. Hadoop clusters, for example can become dense as they grow, making it difficult for administrators to pinpoint bottlenecks and outages. An environment today may encompass thousands of volumes and millions of objects. Service levels are impacted by a vastly larger number of people and applications that are accessing corporate networks. The cloud offers relief, but also more complexity.

Administrators must constantly monitor and fine-tune their environments to address questions like these:

  • How is my storage usage trending? Do I need to add storage nodes?
  • Are all of my cluster nodes evenly utilized?
  • Why isn’t my service coming up?
  • Why is my job slower today than yesterday? Why did it fail?

Application monitoring centralizes the collection of data from across the environment, including logs, metrics, alerts, outages and slowdowns. Administrators can search these logs using tools that excel at leveraging and understanding unstructured data. Typically, monitoring is done from a single dashboard that is customized for the needs of individual administrators. Ideally, the environment also integrates third-party open standards for monitoring.

MapR Monitoring Architecture

A robust application performance management system peers into core filesystem and database sources as well as ecosystem components like containers, microservices and orchestration tools like YARN and Kubernetes. At its heart is a collection layer that brings together logs from each node and logs them at a pre-configured frequency. Collecting a new metric, or monitoring a new service, is as simple as adding a new plug-in into your metrics collector or log shipper.

Time series databases like OpenTSDB allow millisecond monitoring and historical analysis. Visualization can be provided with open source components like Grafana and Kibana to enable administrators to customize their dashboards.

Say you want to monitor your cluster and you’re only interested in looking at CPU and memory across the cluster. You can add that to a dashboard, and that should be it. But if you want to look at all of the information on a particular node, you can tag that particular node into your system and look at memory, disk, and CPU next to each other for that particular node. Not only can you build customizable dashboards, but if you have your own environment for monitoring other components of your infrastructure, you can easily integrate it using the APIs that OpenTSDB and Elasticsearch provide. That way, you have a single pane of glass to look at everything within your monitoring environment.

Microservices Performance Management

Microservices introduce an additional wrinkle to application performance management because each service has its own set of resources, dependencies and overhead requirements. A microservices-heavy environment may require many remote calls, each dealing with issues such as network latency, queues, connection reliability and workloads. There are also overlapping interdependencies. For example, if microservice X relies on output delivered from microservice Y, a slowdown in Y may appear as a problem in both microservices, even though only one is to blame.

The best way to troubleshoot such complex environments is with logs, stream processing engines and search engines like Elasticsearch that handle unstructured data well. It’s important administrators understand the interdependencies of all microservices in production in order to determine the root of a performance problem. One way to help downstream efforts is by leveraging build and deployment automation tools such as Jenkins, Chef and Puppet. Dependencies can thus be mapped prior to deployment, making troubleshooting easier. Automation can also be used to constantly test applications in the same way a user would to look for early signs of developing problems. These workloads can then be failed over automatically or an alert sent to an administrator for intervention. Also keep in mind that every opportunity to use an event stream is a way to identify traffic flow which later enables replayability when performing root cause analysis.

Container Performance Management

The process of managing containers is quite similar to that of managing microservices, but the variables are different. For example, an administrator needs to know how many containers are running, the CPU and memory utilization of each and the health of the network they’re running on. Performance management requires knowing the communication dependencies between individual containers, the image that’s deployed in individual containers and the services that run there.

Because containers can be spun up and shut down quickly, traditional performance management technology is less effective in these environments. Stateless, containerized applications benefit from a persistent and comprehensive data services layer to not only provide resilient storage for containers, but also the database and messaging/streaming capabilities that many containerized operational applications require. Fortunately, Docker provides rich APIs for such functions as starting and running containers, creating logs and metrics and managing images, networks and volumes. However, the ecosystem of management tools is still incubating, forcing many companies to roll all or part of their own solutions.

In sum, achieving the goal of application agility demands an environment that is unconstrained by the traditional limitations of systems management. Developers need the flexibility to launch instances when they need them, not when IT says they can have them. Those instances should contain the tools developers need to do their work, not just the tools that the existing infrastructure can support. Tasks that used to require manual intervention now can – and should – be automated. Seamless orchestration across multiple servers should be assumed. All of these capabilities are available today for organizations that are ready to embrace them.

1Images courtesy of DevCentral