5 min read
At MapR, we have built a converged data platform, which is the underlying technology for a variety of mission-critical, 24x7 customer applications. We realized very early that testing Hadoop in development environments is dramatically different from deploying Hadoop into the data center with the rest of your enterprise architecture.
Managing production Hadoop clusters can be complex. As the use-case evolves from POC to testing to production and new users, data sources are added, the need for comprehensive monitoring becomes critical to diagnose problems faster. Administrators and users are often faced with questions like:
Today we are proud to announce the Spyglass Initiative focused on easy management, deep visibility and full control.
With this first release, MapR Monitoring empowers administrators with cluster monitoring capabilities, including metric and log collection from nodes, services and jobs, with dashboards to display information in a useful way.
With the goal to build a robust monitoring infrastructure and allow customization and extensibility, architecture is built upon carefully chosen components from open-source community. Let’s take a look at how it comes together and possible places to integrate:
This includes operational data from all the nodes in your cluster infrastructure in addition to MapR core services (MapR XD, MapR Database) and ecosystem services (e.g., YARN, Drill, Hive, Spark, etc.).
The system collects metrics and logs from all these sources using open-source collection plugins. This is an essential part of capturing data at configurable intervals before starting to analyze it.
The storage engine is suited to the type of data that is getting aggregated. OpenTSDB is the time-series database running on top of MapR Database to store and aggregate metrics. With metric tags, you can filter the metrics at cluster level, node level, application queue level, or even user level
Elasticsearch provides the storage for all logs. With the power of centralized search, you can search for any combination of nodes, services, or message severity levels.
There are some awesome front-ends, utilities, libraries, and resources that are supported by the OpenTSDB community and Elasticsearch community. Given the rapid pace of adoption and rich UI feature set, we have chosen Grafana and Kibana as the default tools in our bundle.
These visualization tools are very popular for the level of customization they offer and users benefit from creating dashboards that are intuitive, actionable and specific to them. MapR also announced the MapR Exchange Community where users can share their dashboards and take monitoring to the next level. Below is an example of what a node dashboard may look like.
Based on our experience over the years, we wanted to make it easy for customers who want a single pane of glass into their entire infrastructure. You can achieve this in couple ways by integrating into your storage infrastructure (e.g. Elasticsearch) or using the OpenTSDB, Elasticsearch APIs to visualize with the tool of your choice.
The first phase of the Spyglass initiative is built to:
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.