gfsck
The gfsck
(global filesystem check) command performs a consistency
check and repair operation on a volume or volume snapshot, including the following
entities:
- All cross container links (for example, from file to filelets, or from table to tablets)
- The tabletmap key range
- The attributes of filelets (uid/gid/mode)
This command identifies the unreachable files, directories, and tables in the volume and moves them to /lost+found to be repaired. It also identifies and fixes any unreachable DB inodes or dangling pointers to lost inodes.
Use the gfsck utility when local fsck repairs some containers at the highest epoch or when some containers are lost (either when a lower epoch container was promoted as master or when a container was permanently lost) at the highest epoch.
Permissions Required
You need root user permissions to run , unless you are checking tiering-enabled volumes, which requires you to be the mapr user.
Typical process flow
- Take the affected storage pools offline with the
mrconfig sp offline
command. - Execute the
fsck
command on the storage pools (or disks). - Bring the storage pools back online with the
mrconfig sp online
command. - Run the
gfsck
command on the cluster, volumes, or snapshots affected.Warning: If there are alarms, such as DataUnavailableAlarm or DataUnderReplicatedAlarm, do not run the gfsck command with the-r
(--repair
) option. Running the gfsck command with the-r
(--repair
) option, might result in data loss. If necessary, first run gfsck without the-r
(--repair
) option and attempt to repair only after analyzing the command output.
Syntax
/opt/mapr/bin/gfsck
[ -h|--help ]
[ -c|--clear ]
[ -d|--debug ]
[ -b|--dbcheck ]
[ -r|--repair ]
[ -y|--assume-yes ]
[ -Gquick|--check-gateway-consistency-only ]
[ -Gfull|--check-gateway-vcd-consistency ]
[ -Dquick|--check-object-presence ]
[ -DFull --check-object-data ]
[ -GdelLog|--delete-Gateway-Log ]
[ cluster=<cluster name> ]
[ rwvolume=<volume name> ]
[scanthreads=<inode scanner threads count> ]
[ snapshot=<snapshot name> ]
[ snapshotid=<snapshot-id> ]
Parameters
Parameter |
Description |
---|---|
-b|--dbcheck |
Checks that every key in a tablet is within that tablet's startKey and endKey range. Use this option only if you suspect database inconsistency, as this option is very I/O intensive. |
-c|--clear |
Clears previous warnings before performing the global filesystem check. |
-d|--debug |
Provides information for debugging. |
-Gquick|--check-gateway-consistency-only |
(mapr user). Checks if the entries in the metadata tables maintained internally for objects and tiers are consistent. If inconsistent, reports the error. |
-Gfull|--check-gateway-vcd-consistency |
(mapr user). Checks if the entries in the metadata tables maintained internally for objects and containers are consistent. If inconsistent, reports the error. |
-Dquick|--check-object-presence |
(mapr user). Specified with either -Gquick or
-Gfull . Checks and reports if the object in the metadata tables
exists in the tier or not. |
-GdelLog|--delete-Gateway-Log |
(mapr user). Deletes the metalog created by MAST Gateway if it crashes. |
-h|--help | Prints usage text. |
-y --assume-yes | Assumes that containers without valid copies (as reported by CLDB) can be
deleted automatically. If this option is not specified, gfsck
pauses for user input to verify that containers can be deleted. Enter yes to
delete, no to exit gfsck , or ctrl-C to quit. |
cluster | Name of the cluster (default: default cluster) |
rwvolume | Name of the volume (default: null) |
snapshot | Name of the snapshot (default: null) |
snapshotid | The snapshot ID (default: 0) |
Example (Debug mode)
In debug mode, run the gfsck
command on the read/write volume named
mapr.cluster.root
:
/opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -d
Sample output is as follows:
Starting GlobalFsck:
clear-mode = false
debug-mode = true
dbcheck-mode = false
repair-mode = false
assume-yes-mode = false
cluster = my.cluster.com
rw-volume-name = mapr.cluster.root
snapshot-name = null
snapshot-id = 0
user-id = 0
group-id = 0
get volume properties ...
rwVolumeName = mapr.cluster.root (volumeId = 205374230, rootContainerId = 2049, isMirror = false)
put volume mapr.cluster.root in global-fsck mode ...
get snapshot list for volume mapr.cluster.root ...
starting phase one (get containers) for volume mapr.cluster.root(205374230) ...
container 2049 (latestEpoch=3, fixedByFsck=false)
got volume containers map
done phase one
starting phase two (get inodes) for volume mapr.cluster.root(205374230) ...
get container inode list for cid 2049
+inodelist: fid=2049.32.131224 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.33.131226 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.34.131228 pfid=-1.33.131226 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.35.131230 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.36.131232 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.38.262312 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.39.262314 pfid=-1.38.262312 typ=1 styp=0 nch=0 dMe:false dRec: false
got container inode lists (totalThreads=1)
done phase two
starting phase three (get fidmaps & tabletmaps) for volume mapr.cluster.root(205374230) ...
got fidmap lists (totalFidmapThreads=0)
got tabletmap lists (totalTabletmapThreads=0)
done phase three
=== Start of GlobalFsck Report ===
file-fidmap-filelet union --
2049.39.262314:P --> primary (nchunks=0) --> AllOk
no errors
table-tabletmap-tablet union --
empty
orphan directories --
none
orphan kvstores --
none
orphan files --
none
orphan fidmaps --
none
orphan tables --
none
orphan tabletmaps --
none
orphan dbkvstores --
none
orphan dbfiles --
none
orphan dbinodes --
none
containers that need repair --
none
incomplete snapshots that need to be deleted --
none
user statistics --
containers = 1
directories = 2
kvstores = 0
files = 1
fidmaps = 0
filelets = 0
tables = 0
tabletmaps = 0
schemas = 0
tablets = 0
segmaps = 0
spillmaps = 0
overflowfiles = 0
bucketfiles = 0
spillfiles = 0
=== End of GlobalFsck Report ===
remove volume mapr.cluster.root from global-fsck mode (ret = 0) ...
GlobalFsck completed successfully (7142 ms); Result: verify succeeded
To verify if the object is present on the tier, run the
gfsck
command on the tiering-enabled read/write volume named
for_test5
:
/opt/mapr/bin/gfsck rwvolume=for_test5 -Gfull -Dquick
Sample output is as follows:
Starting GlobalFsck:
clear-mode = false
debug-mode = false
dbcheck-mode = false
repair-mode = false
assume-yes-mode = false
cluster = Cloudpool19
rw-volume-name = for_test5
snapshot-name = null
snapshot-id = 0
user-id = 2000
group-id = 2000
get volume properties ...
put volume for_test5 in global-fsck mode ...
get snapshot list for volume for_test5 ...
starting phase one (get containers) for volume for_test5(16558233) ...
got volume containers map
done phase one
starting phase two (get inodes) for volume for_test5(16558233) ...
got container inode lists
done phase two
starting phase three (get fidmaps & tabletmaps) for volume for_test5(16558233) ...
got fidmap lists
got tabletmap lists
completed secondary index field path info gathering
completed secondary index consistency check
Starting DeferMapCheck..
completed DeferMapCheck
done phase three
=== Start of GlobalFsck Report ===
file-fidmap-filelet union --
no errors
table-tabletmap-tablet union --
empty
containers that need repair --
none
user statistics --
containers = 6
directories = 6
files = 1
filelets = 2
tables = 0
tablets = 0
=== End of GlobalFsck Report ===
Putting volume for_test5 in GWgfsckMode . . . . .
Tier stats----
numVerifiedVcds 5719
numVerifiedObjects 32
numCorruptedVcd 0
numCorruptedObjects 0
Removing volume for_test5 from GWgfsckMode
remove volume for_test5 from global-fsck mode (ret = 0) ...
GlobalFsck completed successfully (37039 ms); Result: verify succeeded