MapR 5.0 Documentation : fsck

Filesystem check (fsck) is used to find and fix inconsistencies in the filesystem.


Every storage pool has its own log to journal updates to the storage pool. All operations to a storage pool are done transactionally by journaling all operations to the log before they are applied to storage pool metadata. If a storage pool is not shut down cleanly, metadata can become inconsistent. To make the metadata consistent on the next load of the storage pool, fsck replays the log to recover any data before it does any check or repair. fsck walks the storage pool in question to verify all MapR-FS metadata (and data correctness if specified on the command line), and reports all potentially lost or corrupt containers, directories, tables, files, filelets, and blocks in the storage pool.

The local fsck visits every allocated block in the storage pool and recovers any blocks that are part of a corrupted or unconnected metadata chain.

fsck can be used on an offline storage pool after a node failure, after a disk failure, or after a MapR-FS process crash, or simply to verify the consistency of data for suspected software bugs.

Typical process flow:

  • Execute the fsck command on the storage pools (or disks) as discussed below.
  • Execute the gfsck command on the cluster, volumes, or snapshots that were affected.

The local fsck command can be run in two modes:

  • Verification mode - fsck only reports errors; it does not attempt to fix or modify any data on disk. fsck can be run in verification mode on an offline storage pool at any time, and if it does not report any errors the storage pool can be brought online without any risk of data loss. To run the fsck utility in verification mode, use any parameter except the -r parameter.
  • Repair mode - fsck attempts to restore a bad storage pool. If the local fsck is run in repair mode on a storage pool, some volumes might need a global fsck (gfsck) after bringing the storage pool online. There is potential for loss of data in this case. To run the fsck utility in repair mode, use the -r parameter.

Using the /opt/mapr/server/fsck utility with the -r option produces different results depending on the scenario. The fsck utility does not interpret the scenario nor does it have a safe mode.

  • If a disk is offline because of an imbalanced b-tree, using fsck -r may result in data loss from bad containers and data loss if additional replicas are unavailable.
  • If a disk is offline because of an I/O error, using fsck -r produces indeterminate results. A disk that is throwing I/O errors is questionable in terms of data content and reliability. For example, an operation that completed on the disk but was never returned may have partial data remaining on the disk. Using fsck -r retains any partial data.
  • If a disk is offline because of a slow I/O, using fsck -r does not produce data loss.

The most conservative usage of fsck -r is to run fsck without the -r option (verification mode) and check the output. If the output is ok, then run fsck with the -r option.

Syntax

/opt/mapr/server/fsck
    [ -d ]
    [ <device-paths> | -n <sp name> ]
    [ -l <log filename> ]
    [ -p <MapR-FS port> ]
    [ -N ]
    [ -P ]
    [ -h ]
    [ -j ]
    [ -m <memory in MB> ]
    [ -d ]
    [ -b ]
    [ -r ]

Parameters



Parameter

Description

-d

Performs a CRC on data blocks. By default, fsck will not validate the CRC of user data pages. Enabling this check can take quite a long time to finish.

<device-paths>

Paths to the disks that make up the storage pool.

-n

Storage pool name. This option works only if all the disks are in disktab. Otherwise the user will have to individually specify all the disks that make up the storage pool, using the <device-paths> parameter.

-l

The log filename. Default: /tmp/fsck.<pid>.out

-p

The MapR-FS port. Default: 5660

-NDisables the status bar.
-PPurges deleted containers in repair.

-h

Help

-j

Skips log replay. Should be set only when log recovery fails. Log recovery can fail if the damaged blocks of a disk belong to the log, or if log recovery finds some CRC errors in the metadata blocks. *Using this parameter will typically lead to larger data loss. *

-m

Sets the cache size for blocks (MB).

-bChecks database consistency.

-r

Runs in repair mode. USE WITH CAUTION AS THIS CAN LEAD TO LOSS OF DATA.