7 min read
With the advent of the Internet of Things, organizations are constantly on a mission to track the whereabouts of their assets in real time, whether on the move or stationary. On the other end of the spectrum, they also want to be able to look at the historical data accumulated over time to project the future growth of business and make the right investments based on good analytical results.
Traditionally, it has not been easy to achieve the desired outcomes described above because it involved significant investment and careful planning of a system that has the hardware and software pieces capable of monitoring the movement and capturing the data generated in the wake of thousands, if not tens of thousands of sensors, moving in all directions at the same time.
The good news is that now organizations have various modern tools on hand to make it all happen – cloud, microservices, software as a service (SaaS), big data analytics, etc. Open source tools such as Kubernetes to orchestrate containers for microservices, Apache Spark for real-time analytics and machine learning to project future growth, and an advanced modern data platform such as the MapR Data Platform that combines the power of cloud and open source software and supports the strictest of privacy regulations like GDPR, are all at your disposal.
MapR Data Fabric For Kubernetes
Azure Kubernetes Service (AKS) manages your hosted Kubernetes environment, making it quick and easy to deploy and manage containerized applications without container orchestration expertise. It also eliminates the burden of ongoing operations and maintenance by provisioning, upgrading, and scaling resources on demand, without taking your applications offline. Customers who look to have a tightly integrated platform to scale their data analytics needs can truly benefit by combining AKS with running MapR on Azure.
Below is a list of the benefits of MapR Data Fabric for Kubernetes:
The microservices running as containers are designed as ephemeral and are highly disposable as they only provide compute. This means they are not stateful if the data is stored in local storage in the containers and if they died for any reason. The MapR Data Fabric for Kubernetes is a solution to address this challenge. By securely persisting data to MapR XD, containers are becoming stateful and can easily pick up their tasks from where they left off, after they were restarted.
By decoupling the storage (MapR Data Fabric) and compute (Kubernetes) infrastructure, organizations can scale their data independently without having to worry about going over budget one way or another, as in a system where storage and compute resources are coupled.
Persisted data in MapR XD will be managed by MapR for disaster recovery, data protection, access control, auditing, etc.
The proposed architecture has the containers orchestrated by Kubernetes; the containers communicate with MapR through the fuse POSIX client, running on the Kubernetes workers; the client inherits the security features that MapR offers, including wire-level security for encryption and MapR tickets for authentication. See https://mapr.com/whitepapers/security-and-big-data-governance-mapr/.
Static and dynamic volume provisioning are both supported; however, if you have a large number of containers that need volumes, then dynamic provisioning will automatically and effortlessly handle the creation/deletion of the volumes, according to the policies defined in a storage class.
By leveraging the benefits described above, the result is a system in the cloud that can scale, compute, and store independently and does not require an army of in-house IT professionals to ensure the system's uptime and software update/maintenance.
The graph below describes the architecture of this demo. There are three containers orchestrated by AKS, and a MapR Sandbox is also created in the same subnet where the Kubernetes workers are located, so the containers can mount the MapR volumes.
Lambda Architecture in the SJC Flight Tracking Demo
The first container is a microservice that grabs the flight data in realtime; the data includes the flight paths of all the arrival flights at SJC as well as the airline, flight number, altitude of the planes, etc. The second container hosts a time-series InfluxDB microservice that mounts a MapR dynamic volume (similar to MySQL, PostgreSQL, or MSSQL, where a mount point in the OS is used to store the database). The third container is a Grafana service that visualizes the time-series data.
The ingested data tees off upon entering the system, following two paths: one in which the data is processed, persisted, and visualized in real time; another in which raw data is persisted into a MapR XD volume, where the airport planners can look back at the operational efficiency of the airport to determine if they should expand its capacity, according to traveler increase in the future. Here is how they can do it: the raw historical data can be analyzed by the various open source tools that come with the MapR Data Platform, such as Yarn, Spark, Drill, Hive, Zeppelin, etc. Note that these tools can also be containerized via the MapR Persistent Application Client Container (PACC) and moved into the Kubernetes land.
Visualization of SJC Flight Tracking Demo with Grafana
Analyzing Raw Data with Apache Drill
You can also use the MCS (MapR Control System) portal to manage the dynamic volumes you just created for DR, quota, access log, and more.
The step-by-step instructions are available for those of you who want to try out this hands-on demo:
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.