Monitoring Nodes

You can check the health of the nodes on the cluster in MCS, organized by service or by topology, and using the CLI.

Note: The metrics collection infrastructure must be installed during installation to visualize the graphs and charts. If the metrics collection infrastructure is not installed, perform an Incremental Install to install the metrics collection infrastructure.

Monitoring Node Health Using the MapR Control System

To monitor the health of nodes:
  1. Log in to MCS and click:
    • Overview to view the health of the nodes in the Node Health pane.
    • Nodes to view the health of the nodes in the Node Health pane.
  2. Select one of the following from the dropdown menu in the Node Health pane.
    • By Service to organize the display of nodes by services.

      This is the default view in the Overview page. This view contains the list of services and the nodes on which the service is running () and is down ().

      Note: The color of the node (which reflects the status of the service) is even when a service is stopped (not running) on the node.
    • By Topology to view the display of nodes by topology.

      This is the default view in the Nodes page. This view contains the list of topologies and the health of the nodes (as shown in the following table) in the topology.

      Indicates the node is healthy.
      Indicates node is degraded and/or may need attention. A node is considered to be in degraded state if:
      • No heartbeat from MapR file system/NFS node for over 60 seconds.
      • One or more services are down on the node.
      • One or more alarms are raised on the node.
      Indicates node is in maintenance mode.
      Indicates critical issue(s) on the node. A node is considered to be in critical state if:
      • No heartbeat from node for more than 5 minutes.
      • All MapR Filesystem disks on the node are dead or are offline.
      • All containers on the node are being re-replicated because either the node was removed, unregistered, or no heartbeat from node for more than 1 hour.
      • File server is dead/inactive because no heartbeat for a long time.
      • NFS server on node is dead.
      • MapR install directory is full.
      • Node reported high MapR Filesystem memory usage.

Monitoring Node Resource Utilization in the MapR Control System

Log in to MCS and click Nodes to view the nodes that consumed the most CPU and memory (in percentage) in the Current Resource Utilization pane. The shade of the bubble indicates node resource utilization with the darker shade indicating the nodes that are nearing disk capacity.

Monitoring Node Health Using the CLI or REST API

You can check general health of the nodes by issuing the following command:

maprcli node heatmap -cluster <cluster>

This command displays a heatmap for the nodes on the specified cluster; a subset of the output can also be visualized in MCS. For complete reference information, see node heatmap.