MapR Monitoring

MapR Monitoring enables a complete metrics tracking and log analytics system on cluster operations in the MapR Platform

Download the PDF

The MapR Data Platform

Get complete visibility of the cluster operations in your big data deployment in a customizable and extensible framework.

MapR Monitoring enables a complete metrics tracking and log analytics system on cluster operations in the MapR Data Platform.

MapR Monitoring Key Features

  • Node/infrastructure monitoring with metrics such as CPU utilization and I/O throughput
  • Cluster space utilization monitoring, including trends
  • YARN/MapReduce application monitoring
  • Service daemon monitoring
  • Customizable and extensible modular layers for collection, aggregation/storage, and visualization

The MapR Community Exchange provides a forum for big data professionals to share custom visualization dashboards and best practices.

MapR Monitoring is included in both the MapR Enterprise Edition and the freely downloadable MapR Community Edition.

MapR Monitoring provides the capabilities you need to monitor all cluster operations in a MapR Data Platform deployment. It collects, stores, and displays cluster metrics and system log files to help you understand the utilization in your MapR cluster. It leverages popular open source technologies with which you are likely already familiar. This helps to lower the learning curve, while also offering the flexibility for you to customize and extend the environment.

MapR Monitoring is included as a standard feature in all editions of the MapR Data Platform, including the Community Edition.

Converged, Customizable, and Extensible

Successful big data deployments continue to get bigger and more complex. With new data sources, new use cases, new workloads, and new user groups, managing that growth requires a complete understanding of what is currently happening in the system. MapR Monitoring helps you to manage successful big data deployments by giving you a converged, customizable, and extensible platform for cluster-wide visibility.

Convergence is increasingly an important objective in big data deployments. With the use of more compute engines (such as Apache Spark, Apache Drill, NoSQL) to handle big data, you run the risk of increasing operational complexity through cluster sprawl. The MapR Data Platform integrates key big data capabilities (Hadoop/Spark, SQL, webscale storage, NoSQL, event streams) into a common platform with architectural optimizations for high efficiency. MapR Monitoring then gives you a converged view of all cluster operations so you can best manage your deployment.

MapR Monitoring is highly customizable, giving you the flexibility to collect and view data to meet your specific requirements. This is especially important in multi-tenant environments where you have a specific set of operations that you want to track in a way that gives you the best insights. This will help you address potential resource contention issues between different applications and user groups.

Extensibility is also critical for growing environments, especially when new innovative technologies emerge. MapR Monitoring provides the APIs to let you track any new compute engines you add to the platform, and also allow you to plug in to existing tools to help

Log Analytics
Log Analytics
Customizable Dashboards for Visualizing Metrics
Customizable Dashboards for Visualizing Metrics

Dashboards and log analytics visualizations allow complete visibility into your cluster.

MapR Monitoring leverages popular open source technologies

Product Spotlight

MapR Monitoring is a complete system for monitoring your MapR clusters. It also provides an extensible architecture with public APIs for you to plug in your own components at the collect, storage, and/or visualization layers.

Data Sources

The monitored data sources include metrics and logs at many levels, including node/infrastructure, cluster space utilization, YARN/ MapReduce applications, and service daemons. Examples of available data include:

Node/infrastructure

  • Global aggregates (e.g., average, minimum, maximum) of node operations, including CPU and disk utilization
  • Per-node metrics such as I/O throughput by disk
  • MapR Distributed File and Object Store (MapR XD) reads and writes
  • NoSQL database (MapR Database) puts, gets, scans, and cache metrics

Cluster space utilization

  • Cluster-wide storage utilization
  • Storage utilization trends to predict what future storage requirements will be
  • Utilization per MapR volume, and per accountable entity (e.g., data, volume, snapshot, and total size)

YARN/MapReduce applications

  • Global YARN and Spark trend data
  • Pending and active containers
  • Allocation and use of vCores and RAM
  • Per queue data on containers, vCores, RAM

Service daemon

  • Per-service data such as CPU usage by type/memory
  • MapR core services logs (e.g., CLDB, MapR-FS, gateway, NFS, ZooKeeper)
  • Ecosystem services logs (e.g., Drill, YARN ResourceManager, YARN NodeManager, Hive, Hue, Oozie)

Collection

The collection layer is responsible for delivering data from the sources to the storage layer. The collection layer leverages open source technologies and is easily extensible. This provides the flexibility to add monitoring for new compute engines in MapR. With MapR advantages around interoperability (especially with NFS), you can run a wide variety of technologies in the same cluster and easily get a complete, converged view of all operations

Aggregation and Storage

The aggregation and storage layer consists of two popular open source storage technologies, OpenTSDB and Elasticsearch. OpenTSDB is used for storing metrics data, since it was built for storing and analyzing time series data (the format in which metrics data is delivered). As OpenTSDB is not truly a storage engine, it acts as the data aggregator on top of MapR Database, the integrated NoSQL database which is part of the MapR Data Platform. Elasticsearch is ideal for storing and searching log data. These engines have a portfolio of associated visualization tools, but with their open APIs, you have the ability to plug in the visualization tool of your choice.

Visualization

The visualization layer ties into the aggregation and storage layer to provide customizable dashboards on cluster operations. Leveraging community-based technical support, this layer is also based on popular open source technologies, Grafana and Kibana. The former provides visualizations for the metrics data managed by OpenTSDB, and the latter provides visualizations for log data in Elasticsearch. Although these are some of the most advanced open source visualization tools available today, you still have the option to swap these out for your preferred visualization tool.

About MapR

MapR enables organizations to create disruptive advantage and long-term value from their data with the industry’s only Data Platform, which delivers distributed processing, real-time analytics, and enterprise-grade requirements across cloud and onpremise environments — while leveraging the significant ongoing development in open source technologies including Spark and Hadoop. Organizations with the most demanding production needs, including sub-second response for fraud prevention, secure and highly available data-driven insights for better healthcare, petabyte analysis for threat detection, and integrated operational and analytic processing for improved customer experiences, run on MapR. A majority of customers achieves payback in fewer than 12 months and realizes greater than 5X ROI. MapR ensures customer success through world-class professional services and with free on-demand training that 50,000 developers, data analysts and administrators have used to close the big data skills gap. Amazon, Cisco, Google, HPE, Microsoft, SAP, and Teradata are part of the worldwide MapR partner ecosystem. Investors include Google Capital, Lightspeed Venture Partners, Mayfield Fund, NEA, Qualcomm Ventures and Redpoint Ventures.

MapR Monitoring Datasheet