6 min read
Dale Kim, Sr. Director of Industry Solutions at MapR, describes the monitoring capabilities of the MapR Data Platform, which easily give you a single view of all cluster operations. Leveraging popular open source technologies, the monitoring system is customizable and extensible to address the challenges of your big data deployment requirements.
Here's the undedited transcription:
Hi, I'm Dale Kim of MapR Technologies, and welcome to my Whiteboard Walkthrough. In this session, I'd like to talk about the monitoring capabilities in the MapR Data Platform. MapR has always had advantages in the areas of ease of use, performance, scalability, reliability, and security. The monitoring capabilities can address ease of use in two ways.
One is that if you get a complete picture of all cluster operations, you're better able to expand your cluster to handle more users, more data sets, more workloads, and so on. Of course, since we're an open source company, it makes sense that we adopt open source technologies to address the monitoring capabilities. Using some of these popular and common technologies that already exist, it's easier for you to plug in your own capabilities as well as use existing skill sets to monitor what's going on in your MapR cluster. You might have seen a Whiteboard Walkthrough from my colleague, Prashant Rathi, about the architecture of our monitoring capabilities. Some of the key principles behind the architecture include the convergence, customizability, and extensibility of the system.
Let me first talk about convergence. In the MapR Data Platform, we're trying to address a problem that we see in big data all the time. If you think about all the the different technologies that you have in a big data environment that necessarily include different compute engines, different data sets, and many users, having all of those different workloads, some of which are batch, many of which are real time, you necessarily have different types of technologies handling those different workloads. As a result, if you use a traditional approach, you'll have many different silos leading to cluster sprawl.
What MapR is trying to address there is this notion of a converged data platform, so you do have these different engines that handle different workloads all in a single operationally efficient cluster. That's the key. If you have expensive hardware and a lot of CAPEX and OPEX that you want to minimize, you want to make sure that you get the most value and the highest efficiency out of your platform. When it comes to convergence, the monitoring capabilities in MapR allow you to get that big picture view of all the different workloads and all the different technologies that are running in your stack.
Not only do we get information about all your nodes, but we also incorporate information from some of the platform services in the MapR Data Platform, including MapR XD or the file system, and our high performance NoSQL database, MapR Database. In addition to that, we'll monitor things like YARN and Spark and Drill so you get that single view of all your cluster operations and, again, this helps you to identify what changes or what resources you'll need to be able to handle the additional workloads as your big data requirements grow. When it comes to customizability, on this far right in the visual side, you see that we have Grafana and Kibana as two options for visualizing the metrics and logs that we collect.
They provide a lot of options for creating different dashboards and creating different views of metrics. That information can actually be shared on our MapR Community so that you can work with other members that are monitoring their MapR systems, so that you get best practices around what type of information you should be monitoring and viewing when you're building out your clusters and maintaining them.
Finally, in terms of extensibility, there are a couple of areas where we address that. On the collect side, we have collectors and shippers that can be swapped out or added to additional compute engines that you might add into the platform. So as you add a new engine on MapR, you might have a plugin that talks to the collector or shipper so that you have additional metrics that you can easily include as part of your overall system. Then on the visualization side, because we use OpenTSDB and ElasticSearch as storage engines, those have open APIs so you can plug in whatever visualization tools you want. So you can go back to using some of the tools that you were already familiar with—you don't have a steep learning curve and have to learn new technologies, and you can take advantage of the things that you already know. That plugs in easily with all the monitoring requirements you have of your big data environment.
That's my presentation on the monitoring capabilities in the MapR Data Platform. If you have any comments or questions, feel free to comment below. Of course, if there are any other topics you want to hear more about, you can comment below as well. Thanks for watching.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.