MapR 5.0 Documentation : Volume Alarms

Volume alarms indicate problems in individual volumes. The following tables describe the MapR volume alarms.

Data Unavailable

UI Column

Data Alarm

Logged As

VOLUME_ALARM_DATA_UNAVAILABLE

Meaning

This is a potentially very serious alarm that may indicate data loss. Some of the data on the volume cannot be located. This alarm indicates that enough nodes have failed to bring the replication factor of part or all of the volume to zero. For example, if the volume is stored on a single node and has a replication factor of one, the Data Unavailable alarm will be raised if that volume fails or is taken out of service unexpectedly. If a volume is replicated properly (and therefore is stored on multiple nodes) then the Data Unavailable alarm can indicate that a significant number of nodes is down.

Resolution

Investigate any nodes that have failed or are out of service.

  • You can see which nodes have failed by looking at the Cluster Node Heatmap pane of the Dashboard.
  • Check the cluster(s) for any snapshots or mirrors that can be used to re-create the volume. You can see snapshots and mirrors in the MapR-FS view.

Data Under-Replicated

UI Column

Replication Alarm

Logged As

VOLUME_ALARM_DATA_UNDER_REPLICATED

Meaning

The volume replication factor is lower than the desired replication factor set in Volume Properties. This can be caused by failing disks or nodes, or the cluster may be running out of storage space.

Resolution

Investigate any nodes that are failing. You can see which nodes have failed by looking at the Cluster Node Heatmap pane of the Dashboard. Determine whether it is necessary to add disks or nodes to the cluster. This alarm is generally raised when the nodes that store the volumes or replicas have not sent a heartbeat for five minutes. To prevent re-replication during normal maintenance procedures, MapR waits a specified interval (by default, one hour) before considering the node dead and re-replicating its data. You can control this interval by setting the cldb.fs.mark.rereplicate.sec parameter using the config save command.

Inodes Limit Exceeded

UI Column

Inodes Exceeded Alarm

Logged As

VOLUME_ALARM_INODES_EXCEEDED

Meaning

The volume contains too many files.

Resolution

This alarm indicates that not enough volumes are set up to handle the number of files stored in the cluster. Typically, each user or project should have a separate volume.

Large Row

 

UI Label

Large Row

Logged As

VOLUME_ALARM_LARGE_ROW

Meaning

A row in a table within the specified volume has reached 75% of the maximum supported row size of 2 GB. The alarm provides the rowkey and the name of the table. If the rowsize exceeds 2 GB, subsequent MapR-DB operations on the corresponding table region will fail with an I/O error.

Resolution

Ensure that client applications that access the table are managing row data correctly, so that no row exceeds 2 GB. The method of resolving the alarm depends on the way in which client applications were managing row data.

For example, if client applications allowed too many versions of cell data, delete excess versions. If client applications neglected to remove old columns or column families, remove those manually.

 

Mirror Failure

UI Column

Mirror Alarm

Logged As

VOLUME_ALARM_MIRROR_FAILURE

Meaning

A mirror operation failed.

Resolution

Make sure the CLDB is running on both the source cluster and the destination cluster. Look at the CLDB log (/opt/mapr/logs/cldb.log) and the MapR-FS log (/opt/mapr/logs/mfs.log) on both clusters for more information. If the attempted mirror operation was between two clusters, make sure that both clusters are reachable over the network. Make sure the source volume is available and reachable from the cluster that is performing the mirror operation.

No Nodes in Topology

UI Column

No Nodes in Vol Topo

Logged As

VOLUME_ALARM_NO_NODES_IN_TOPOLOGY

Meaning

The path specified in the volume's topology no longer corresponds to a physical topology that contains any nodes, either due to node failures or changes to node topology settings. While this alarm is raised, MapR places data for the volume on nodes outside the volume's topology to prevent write failures.

Resolution

Add nodes to the specified volume topology, either by moving existing nodes or adding nodes to the cluster. See Node Topology.

Snapshot Failure

UI Column

Snapshot Alarm

Logged As

VOLUME_ALARM_SNAPSHOT_FAILURE

Meaning

A snapshot operation failed.

Resolution

Make sure the CLDB is running. Look at the CLDB log (/opt/mapr/logs/cldb.log) and the MapR-FS log (/opt/mapr/logs/mfs.log) on both clusters for more information. If the attempted snapshot was a scheduled snapshot that was running in the background, try a manual snapshot.

Topology Almost Full

UI Column

Vol Topo Almost Full

Logged As

VOLUME_ALARM_TOPOLOGY_ALMOST_FULL

Meaning

The nodes in the specified topology are running out of storage space.

Resolution

Move volumes to another topology, enlarge the specified topology by adding more nodes, or add disks to the nodes in the specified topology.

Topology Full Alarm

UI Column

Vol Topo Full

Logged As

VOLUME_ALARM_TOPOLOGY_FULL

Meaning

The nodes in the specified topology have out of storage space.

Resolution

Move volumes to another topology, enlarge the specified topology by adding more nodes, or add disks to the nodes in the specified topology.

Volume Advisory Quota Alarm

UI Column

Vol Advisory Quota Alarm

Logged As

VOLUME_ALARM_ADVISORY_QUOTA_EXCEEDED

Meaning

A volume has exceeded its advisory quota.

Resolution

No immediate action is required. To avoid exceeding the hard quota, clear space on the volume or stop further data writes.

Volume with Non-Local Containers

UI Column

Local Volume containers non-local

Logged As

VOLUME_ALARM_DATA_CONTAINERS_NONLOCAL

Meaning

This is a local volume and its containers should all reside on the same node. Some containers were created on another node, which may cause performance issues in MapReduce jobs.

Resolution

Recreate the local volume or contact support.

Volume Quota Alarm

UI Column

Vol Quota Alarm

Logged As

VOLUME_ALARM_QUOTA_EXCEEDED

Meaning

A volume has exceeded its quota. Further writes to the volume will fail.

Resolution

Free some space on the volume or increase the volume hard quota.