MapR 5.0 Documentation : Compression

MapR provides compression for files stored in the cluster. Compression is applied automatically to uncompressed files unless you turn compression off. The advantages of compression are:

  • Compressed data uses less bandwidth on the network than uncompressed data.
  • Compressed data uses less disk space.

This page contains the following topics:

Choosing a Compression Setting

MapR supports three different compression algorithms:

  • lz4 (default)
  • lzf
  • zlib

Compression algorithms can be evaluated for compression ratio (higher compression means less disk space used), compression speed and decompression speed. The following table gives a comparison for the three supported algorithms. The data is based on a single-thread, Core 2 Duo at 3 GHz.

Compression Type

Compression Ratio

Compression Speed

Decompression Speed



330 MB/s

915 MB/s



197 MB/s

465 MB/s



14 MB/s

210 MB/s

Note that compression speed depends on various factors including:

  • block size (the smaller the block size, the faster the compression speed)
  • single-thread vs. multi-thread system
  • single-core vs. multi-core system
  • the type of codec used

Setting Compression on Files

Compression is set at the directory level. Any files written by a Hadoop application, whether via the file APIs or over NFS, are compressed according to the settings for the directory where the file is written. Sub-directories on which compression has not been explicitly set inherit the compression settings of the directory that contains them.

If you change a directory's compression settings after writing a file, the file will keep the old compression settings---that is, if you write a file in an uncompressed directory and then turn compression on, the file does not automatically end up compressed, and vice versa. Further writes to the file will use the file's existing compression setting.

Only the owner of a directory can change its compression settings or other attributes. Write permission is not sufficient.

File Extensions of Compressed Files

By default, MapR does not compress files whose filename extensions indicate they are already compressed. The default list of filename extensions is as follows:

  • bz2
  • gz
  • lzo
  • snappy
  • tgz
  • tbz2
  • zip
  • z
  • Z
  • mp3
  • jpg
  • jpeg
  • mpg
  • mpeg
  • avi
  • gif
  • png

The list of filename extensions not to compress is stored as comma-separated values in the mapr.fs.nocompression configuration parameter, and can be modified with the config save command. For example, you can add parquet to the default list:

maprcli config save -values '{"mapr.fs.nocompression":"bz2,gz,lzo,snappy,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png,parquet"}'

The list can be viewed with the config load command. Example:

maprcli config load -keys mapr.fs.nocompression

Turning Compression On or Off on Directories

You can turn compression on or off for a given directory in two ways:

  • Set the value of the Compression attribute in the .dfs_attributes file at the top level of the directory.
    • Set Compression=lzf|lz4|zlib to turn compression on for a directory.
    • Set Compression=false to turn compression off for a directory.
  • Use the command hadoop mfs -setcompression on|off/lzf/lz4/zlib <dir>.

If you choose -setcompression on without specifying an algorithm, lz4 is used by default. This algorithm has improved compression speeds for MapR's block size of 64 KB.


Suppose the volume test is NFS-mounted at /mapr/ You can turn off compression by editing the file /mapr/ and setting Compression=false. To accomplish the same thing from the hadoop shell, use the following command:

hadoop mfs -setcompression off /projects/test

You can view the compression settings for directories using the hadoop mfs -ls command. For example,

# hadoop mfs -ls /
Found 23 items
vrwxr-xr-x Z   - root root         13 2012-04-29 10:24  268435456 /.rw
               p mapr.cluster.root writeable 2049.35.16584 -> 2049.16.2  scale-50.scale.lab:5660 scale-51.scale.lab:5660 scale-52.scale.lab:5660
vrwxr-xr-x U   - root root          7 2012-04-28 22:16   67108864 /hbase
               p mapr.hbase default 2049.32.16578 -> 2050.16.2  scale-50.scale.lab:5660 scale-51.scale.lab:5660 scale-52.scale.lab:5660
drwxr-xr-x Z   - root root          0 2012-04-29 09:14  268435456 /tmp
               p 2049.41.16596  scale-50.scale.lab:5660 scale-51.scale.lab:5660 scale-52.scale.lab:5660
vrwxr-xr-x Z   - root root          1 2012-04-27 22:59  268435456 /user
               p users default 2049.36.16586 -> 2055.16.2  scale-50.scale.lab:5660 scale-52.scale.lab:5660 scale-51.scale.lab:5660
drwxr-xr-x Z   - root root          1 2012-04-27 22:37  268435456 /var
               p 2049.33.16580  scale-50.scale.lab:5660 scale-51.scale.lab:5660 scale-52.scale.lab:5660 

The symbols for the various compression settings are explained here:


Compression Setting








Uncompressed, or previously compressed by another algorithm

Setting Compression During Shuffle

By default, MapReduce uses compression during the Shuffle phase. You can use the
-Dmapreduce.maprfs.use.compression switch to turn compression off during the Shuffle phase of a MapReduce job. For example:

hadoop jar xxx.jar -Dmapreduce.maprfs.use.compression=false