6 min read
Central configuration has been around since the 2.0 release, but many people are not using this time-saving feature. This post explains briefly how to use it and how it can simplify the way you run MapR.
Customized Configuration Files – Why You Use Them
Each MapR service has a set of configuration files associated with it. Each configuration file has a set of default values that can be customized for your purposes.
For example, the TaskTracker service has a configuration file, hadoop/hadoop-0.20.2/conf/mapred-site.xml, that contains the parameter mapred.tasktracker.map.tasks.maximum. The default value is -1, which means that the number of map task slots is calculated by a formula. To override the default, you would assign a new value to this parameter and load the mapred-site.xml file to each node where you wanted to apply the new value. Without central configuration, this could be very time-consuming, especially for a large cluster with a lot of nodes.
Central Configuration to the Rescue
Customized configuration files are stored in a volume, mapr.configuration (mounted at /var/mapr/configuration), that is created just for central configuration. The directory structure looks like this:
These files are polled at regular intervals (every five minutes, by default) to check if they are more recent than the version stored locally in the /opt/mapr directory. If a more recent version of a configuration file is found, it is copied to the /opt/mapr directory.
The pullcentralconfig Script
At the heart of the central configuration feature is the pullcentralconfig script. Here’s how it works:
Example Central Configuration saves time in large-cluster scenarios like this:
Suppose you have a cluster with 120 nodes, and 100 of them are running the TaskTracker service. Now suppose that 90 of these TaskTracker nodes (named host1 – host 90 in this example) need to use the same customized version of mapred-site.xml. Instead of loading the customized file to each node individually, you can create the file and load it to the /var/mapr/configuration/default directory. The pullcentralconfig script does the rest.
Now suppose that the remaining 10 TaskTracker nodes (host 91 – host100) each use a different version of the mapred-site.xml file. These node-specific configuration files get stored under /var/mapr/configuration/nodes in a node-specific sub-directory.
To assign each customized configuration file to its corresponding node, follow these steps:
Now that you have the customized configuration files stored in /var/mapr/configuration/default and /var/mapr/configuration/nodes, the pullcentralconfig script does the rest. Just restart the service (TaskTracker in this case) for the changes to take effect. No more commands are necessary to load customized configuration files to each node!
For a full description of central configuration, go to doc.mapr.com/display/MapR/Central+Configuration.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.