9 min read
Editor's Note: This is an excerpt from the book, "A Practical Guide to Microservices and Containers: Mastering the Cloud, Data, and Digital Transformation".
It is one thing to make a grand statement like "go run it in the cloud," but it is quite a different thing to "just do it." Moving any or all workloads to a cloud environment, or even hybrid model, requires careful planning. There are a number of considerations when moving operations of any type to the cloud.
Before jumping to the cloud, there are two important terms I want to ensure you are adequately acquainted with–data gravity and cloud neutrality:
Businesses struggle the most when moving from one scale of operations to another scale. For example, going from having nothing to one of something is a huge and complex task because everything has to be built in the first iteration. Growing from one of something to ten can be handled, but going from ten to one hundred creates a completely different need for deploying and managing infrastructure.
If you have never experienced this, then I implore you to consider applying the learnings from others early in your lifecycle and leverage the newer orchestration techniques and technologies available to make it so that deploying and managing one of something is the same as a hundred.
Agile infrastructure should minimize the need for human intervention in routine tasks, such as resource deployment and management. Overworked IT administrators and paperwork can introduce significant delay that undermines the value of cloud environments. Automation makes cloud computing fast and efficient by using software tools to handle these tasks.
For example, automation can enable the setup of multiple virtual machines with identical configurations using a single script written in Puppet, an open-source configuration tool that enables applications and infrastructure to be defined using English-like commands. Puppet scripts can be shared and used to enforce changes across data center and cloud platforms.
Ansible is an open source automation platform that can be used for tasks like configuration management, application deployment, and task automation. It can also be used to automate cloud provisioning and intra-service orchestration, using a "playbook" metaphor that permits multiple automation tasks to keep the combine to powerful effect.
Kubernetes as a global resource manager and orchestrator brings the same kinds of automation and orchestration capabilities to containers, with features that are customized for the unique characteristics of applications in containers. Kubernetes is optimized for orchestrating large numbers of containers, ensuring that each has the resources it needs and providing for things like health monitoring, restart, and load balancing.
Kubernetes isn't a replacement for Puppet and Ansible, but is another resource that works specifically at the container layer and that can be managed by those automation tools. The combination of VM automation and Kubernetes gives IT organizations unprecedented productivity advantages compared to manual systems administration.
Learn more about the MapR Data Fabric for Kubernetes here.
The Internet of Things is creating vast new volumes of data, a flood that International Data Corp. expects will reach 44 zettabytes annually by 2020. To help visualize that amount, if you covered a football field with 32 gigabyte iPhones and kept stacking layers on top of each other, by the time you got to 44 zettabytes the stack would reach 14.4 miles into the air. At that altitude, the temperature is -65° and the barometric pressure is 1/30 that of the surface of the earth. IDC further estimates that machine-generated data will account for 40 percent of the digital universe in 2020, up from 11 percent a decade prior.
These unprecedented data volumes require a new approach to processing, since traditional server, storage, and network models won't scale enough. This is why edge computing is rapidly emerging as a new architecture.
Edge computing distributes resources to the far reaches of the network, as close to the devices that generate data as possible. Edge servers collect streaming data, analyze it, and make decisions as necessary. These servers can pass selected or summary data to the cloud over the network, but most of the processing takes place locally.
Edge computing has some important implications for IT infrastructure and application development. Many applications will need to be restructured to distribute logic across the network. Storage will likewise need to be decentralized. This will create new issues of reliability and data integrity that are inherent in broadly decentralized networks. Cloud servers will become control nodes for intelligent edge devices, performing summary analytics while leaving real-time decision making to edge servers.
Containerized microservices will be an important technology in the construction of IoT backplanes. Distributed processing frameworks will require federated, multi-domain management with intelligence moving fluidly to the places it's most needed. Automation and orchestration tools like Kubernetes will evolve to meet this demand.
Learn more about MapR Edge for Internet of Things here.
Cloud computing has made servers transparent, and serverless computing–also called event-driven computing, or Function-as-a-Service (FaaS)–takes this to another level. It reimagines application design and deployment with computing resources provided only as needed from the cloud. Instead of being deployed to a discrete server and containerized, microservices-based routines are launched in the cloud and call upon server resources only as needed.
The idea is to remove infrastructure concerns from the code, thereby enabling microservices to interact more freely with each other and to scale as needed. The user pays only for server resources as they are provisioned, without any costs associated with idle capacity.
Amazon Web Services Lambda is an example of serverless computing. It's used to extend other AWS services with custom logic that runs in response to events, such as API calls, storage updates, and database updates. PLEASE PLEASE PLEASE take this is a WARNING that any software implementation sitting atop a cloud API for FaaS suffers from 100% vendor lock-in. It locks you into their infrastructure, data store, and code APIs. There is no bigger lock-in model out there than hosted FaaS.
While still a fledgling technology, serverless computing has great potential to enable the development of applications that are far more scalable and flexible than those that are bound by servers or VMs. Containers and microservices will be key to the development of this new model.
The fundamentals are simple; the choices are plentiful. The implementation and approach takes thought to carry out successfully. My personal recommendation is to be sure you take concepts like data gravity and cloud neutrality very seriously. They may seem arbitrary in the early stages of an implementation, but they can cause the most severe types of pain in existence within technology.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.