MapR 5.0 Documentation : Working with Snapshots, Mirrors, and Schedules

Snapshots, mirrors, and schedules help you protect your data from user error, make backup copies, and in larger clusters provide load balancing for highly-accessed data. These features are available under the M5 license.

  • If you are working with an M5 virtual machine, you can use this section to get acquainted with snapshots, mirrors, and schedules.
  • If you are working with the M3 virtual machine, you should proceed to the sections about Getting Started with Hive, Pig, and Getting Started with HBase.

Taking Snapshots

A snapshot is a point-in-time image of a volume that protects data against user error. Although other strategies such as replication and mirroring provide good protection, they cannot protect against accidental file deletion or corruption. You can create a snapshot of a volume manually before embarking on risky jobs or operations, or set a snapshot schedule on the volume to ensure that you can always roll back to specific points in time.

Try creating a snapshot manually:

  1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
  2. Select the checkbox beside the volume MyVolume (which you created during the previous tutorial).
  3. Expand the MapR Virtual Machine window or scroll the browser to the right until the New Snapshot button is visible.
  4. Click New Snapshot to display the Snapshot Name dialog.
  5. Type a name for the new snapshot in the Name field.
  6. Click OK to create the snapshot.

Try scheduling snapshots:

  1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
  2. Display the Volume Properties dialog by clicking the volume name MyVolume (which you created during the previous tutorial), or by selecting the checkbox beside MyVolume and clicking the Properties button.
  3. In the Replication and Snapshot Scheduling section, choose a schedule from the Snapshot Schedule dropdown menu.
  4. Click Modify Volume to save changes to the volume.

Viewing Snapshot Contents

All the snapshots of a volume are available in a directory called .snapshot at the volume's top level. For example, the snapshots of the volume MyVolume, which is mounted at /myvolume, are available in the /myvolume/.snapshot directory. You can view the snapshots using the hadoop fs -ls command or via NFS. If you list the contents of the top-level directory in the volume, you will not see .snapshot — but it's there.

  • To view the snapshots for /myvolume on the command line, type hadoop fs -ls /myvolume/.snapshot
  • To view the snapshots for /myvolume in the file browser via NFS, navigate to /myvolume and use CTRL-L to specify an explicit path, then add .snapshot to the end.

Creating Mirrors

A mirror is a full read-only copy of a volume, which you can use for backups, data transfer to another cluster, or load balancing. A mirror is itself a type of volume; after you create a mirror volume, you can sync it with its source volume manually or set a schedule for automatic sync.

Try creating a mirror volume:

  1. In the Navigation pane, expand the MapR-FS group and click the Volumes view.
  2. Click the New Volume button to display the New Volume dialog.
  3. Select the Local Mirror Volume radio button at the top of the dialog.
  4. Type my-mirror in the Mirror Name field.
  5. Type the MyVolume in the Source Volume Name field.
  6. Type /my-mirror in the Mount Path field.
  7. To schedule mirror sync, select a schedule from the Mirror Update Schedule dropdown menu respectively.
  8. Click OK to create the volume.

You can also sync a mirror manually; it works just like taking a manual snapshot. View the list of volumes, select the checkbox next to a mirror volume, and click Start Mirroring.

Working with Schedules

The MapR Virtual machine comes pre-loaded with a few schedules, but you can create your own as well. Once you have created a schedule, you can use it for snapshots and mirrors on any volume. Each schedule contains one or more rules that determine when to trigger a snapshot or a mirror sync, and how long to keep snapshot data resulting from the rule.

Try creating a schedule:

  1. In the Navigation pane, expand the MapR-FS group and click the Schedules view.
  2. Click New Schedule.
  3. Type My Schedule in the Schedule Name field.
  4. Define a schedule rule in the Schedule Rules section:
    1. From the first dropdown menu, select Every 5 min
    2. Use the Retain For field to specify how long the data is to be preserved. Type 1 in the box, and select hour(s) from the dropdown menu.
  5. Click Save Schedule to create the schedule.

You can use the schedule "My Schedule" to perform a snapshot or mirror operation automatically every 5 minutes. If you use "My Schedule" to automate snapshots, they will be preserved for one hour (you will have 12 snapshots of the volume, on average).

Next Steps

If you haven't already, try the following tutorials: