12 min read
Developers, data scientists, and IT operations are working together to build intelligent apps with new technologies and architectures because of the flexibility, speed of delivery, and maintainability that they make possible. This post will go over some top trending technologies, such as machine learning, containers, Kubernetes, event streams (Kafka API), DataOps, and cloud to edge computing, which are driving this revolution.
AI, Machine Learning, Deep Learning
Predictive machine learning uses algorithms to find patterns in data and then uses a model that recognizes those patterns to make predictions on new data.
Why is this so hot? Analytical technology has changed dramatically over the last decade, with more powerful and less expensive distributed computing across commodity servers, streaming analytics, and improved machine learning technologies, enabling companies to store and analyze both far more data and many different types of it. According to Gartner, over the next few years, virtually every app, application, and service will incorporate some level of AI or machine learning.
The microservice architectural style is an approach to developing an application as a suite of small, independently deployable services built around specific business capabilities.
A monolithic application puts all of its functionality into a single process; scaling requires replicating the whole application, which has limitations. With microservices, functionality is put into separate services, allowing these services to be distributed and replicated across servers.
A microservices approach is well-aligned to a typical big data deployment. You can gain modularity, extensive parallelism, and cost-effective scaling by deploying services across many commodity hardware servers. Microservices modularity facilitates independent updates/deployments and helps to avoid single points of failure, which can help prevent large-scale outages.
A common architecture pattern combined with microservices is event sourcing using an append-only publish subscribe event stream such as MapR Event Streams (which provides a Kafka API).
MapR Event Store provides high performance messaging, which can scale to very high throughput levels, easily delivering millions of messages per second on modest hardware. The publish/subscribe Kafka API provides decoupled communications, wherein producers don't know who subscribes, and consumers don't know who publishes, making it easy to add new listeners or new publishers without disrupting existing processes.
When you combine these messaging capabilities with the simple concept of microservices, you can greatly enhance the agility with which you build, deploy, and maintain complex data pipelines. Pipelines are constructed by simply chaining together multiple microservices, each of which listens for the arrival of some data, performs its designated task, and optionally publishes its own messages to a topic.
For example, an online shopping application's item rating functionality, as shown below,
could be decomposed into the following microservices:
With event-driven microservices, new functionality can easily be added by deploying new services.
Development teams can deploy new services or service upgrades more frequently and with less risk, because the production version does not need to be taken offline. Both versions of the service simply run in parallel, consuming new data as it arrives and producing multiple versions of output. Both output streams can be monitored over time; the older version can be decommissioned when it ceases to be useful.
Combining event streams with machine learning can handle the logistics of machine learning in a flexible way by:
Architectures for these types of applications are discussed in more detail in the ebooks Machine Learning Logistics, Streaming Architecture, and Microservices and Containers.
A container image packages an entire runtime environment: an application, plus all its dependencies, libraries and other binaries, and configuration files needed to execute the application. Compared to virtual machines, containers have similar resources and isolation benefits, but are more lightweight, because containers virtualize the operating system instead of the hardware. Containers are more portable and efficient; they take up less space, use far fewer system resources, and can be spun up in seconds.
DevOps and Containers
Similar to how the agile software development movement broke down the handoff between business requirements, development, and testing, DevOps breaks down silos between developers and operations with a collaborative process.
Containers provide greater efficiency for developers: instead of waiting for operations to provision machines, DevOps teams can quickly package an application into a container and deploy it easily and consistently across different platforms, whether a laptop, a private data center, a public cloud, or hybrid environment.
Containers and Microservices
Containers are perfect for microservices; each service can be packaged, and each instance deployed as a container, providing the following benefits:
Container and Cloud
The National Institute of Standards and Technology defines a cloud as access to a pool of computing resources that can be rapidly provisioned and made available with four deployment models: private, community, public, and hybrid. With containers, developers can deploy their microservices directly into production without porting efforts. This ability to deploy across different platforms is destined to become much more important in the emerging hybrid IT environment, in which infrastructure is a combination of existing legacy systems, on-premises and off-premises, private cloud and public cloud.
Orchestration of Containers and Cloud
Kubernetes has been a big step toward making containers mainstream. Kubernetes automates "container orchestration": deployment, scaling, and management of containerized applications.
Kubernetes introduced a high-level abstraction layer called a "pod" that enables multiple containers to run on a host machine and share resources without the risk of conflict. A pod can be used to define shared services, like a directory or storage, and expose it to all the containers in the pod.
This simplifies the management of machines and services, enabling a single administrator to manage thousands of containers running simultaneously.
Kubernetes allows you to orchestrate across on-site deployments to public or private clouds and to hybrid deployments in-between. On-premises computation also is moving quickly to containerized orchestration, and when you can interchangeably schedule services anywhere, you have real revolution.
Just as the broader IT world has embraced the concept of DevOps, which uses new technologies and processes to brings application developers and operations together in a cohesive and mutually beneficial manner, the data world today is moving toward DataOps. DataOps is an emerging practice utilized by large organizations with teams of data scientists, developers, and other data-focused roles that train machine learning models and deploy them to production. The goal of using a DataOps methodology is to create an agile, self-service workflow that fosters collaboration and boosts creativity while respecting data governance policies. A DataOps practice supports cross-functional collaboration and fast time-to-value. It is characterized by processes as well as the use of enabling technologies, such as the MapR Data Platform.
Combining microservices, containers, and event streams with DataOps makes managing and evaluating multiple models and easily deploying new models more efficient and agile.
IoT, Edge Computing, Machine Learning, and the Cloud
According to CIO magazine, the Internet of Things (IoT) will breakout in 2018, with businesses incorporating IoT technologies into their products and processes. From automobile manufacturers to oil and gas companies, businesses across the globe seek to derive real business value from outcomes like predicting equipment failures, avoiding accidents, improving diagnostics, and more. There is a growing requirement for edge computing, which brings analytics and machine learning models close to IoT data sources. What makes Edge different is the ability to enable real-time analytics, leveraging local compute for running and feeding machine learning models. In the world of IoT, fast analytics is essential for anomaly detection, fraud detection, aircraft monitoring, oil rig monitoring, manufacturing monitoring, utility monitoring, and health sensor monitoring, where alerts may need to be acted upon rapidly. Imagine how, if machine learning had detected the BP valve pressure anomaly before the Deepwater Horizon explosion in the Gulf of Mexico, the largest environmental disaster in U.S. history could have been avoided.
Cloud to the Edge, also called Fog, is one of the Gartner's top technology trends for 2018, in which a cloud service-oriented model is combined with edge computing for distributed processing that spans the continuum between the cloud and edge. Ted Dunning, Chief Application Architect at MapR, predicts that we will see a full-scale data fabric extend right to the edge next to devices, and, in some cases, we will see threads of the fabric extend right into the devices themselves.
A confluence of several different technology shifts have dramatically changed the way that applications are being built. The combination of machine learning, event-driven microservices, containers, DataOps, and cloud to edge computing is accelerating the development of next-generation intelligent applications, which are taking advantage of modern computational paradigms, powered by modern computational infrastructure.The MapR Data Platform integrates global event streaming, real-time database capabilities, and scalable enterprise storage with a collection of data processing and analytical engines to power this new generation of data processing pipelines and intelligent applications.
Blog: Demystifying AI
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.