MapR 3.x Documentation : Quick Installation Guide

 The MapR quick installer automates cluster deployment.

The nodes in a MapR cluster can be one of the following types: 

Node TypeDescription
Control Node

Control nodes manage the operation of the cluster. Control nodes host the ZooKeeper, CLDB, JobTracker, and Webserver services.

Data Nodes

Data nodes store and process data using Hadoop ecosystem tools such as MapReduce, Hive, or MapR Tables.

Dual Nodes

Dual nodes combine control and data node functionality.

Client Nodes

Client nodes provide controlled user access to the cluster.

For more information about node types, see Node Types.

Before You Start

  • Determine how many control nodes your cluster will have. The MapR installer supports one or three control nodes. Three control nodes are typically sufficient for clusters up to approximately 100 nodes.

  • Ensure that each node in your cluster has access to the internet. If each node does not have access to the internet, complete an advanced installation.

  • Determine which nodes in your cluster will perform as data or client nodes. The MapR installer supports an arbitrary number of data or client nodes.

  • For each node in the cluster, identify which disks you want to allocate to the MapR file system. If the same set of disks and partitions applies for all nodes in the cluster, you can use interactive mode for the installer. To specify a distinct set of disks and partitions for individual cluster nodes, you need to use a configuration file. The installer’s interactive mode and configuration files are discussed in depth later in this document.

For more information and guidelines about the MapR installation process, see About Installation.

Quick Installer Requirements

The quick installer runs the following operating systems

  • RedHat Enterprise Linux (RHEL) or Community Enterprise Linux (CentOS) version 6.1 and later that have the EPEL repository installed.

  • Ubuntu Linux version 12.04

The quick installer install MapR on nodes that meet the following requirements:

  • Python 2.6 or later must be installed.

  • The operating system must be one of the following:

    • Ubuntu 12.04

    • CentOS/Red Hat 6.1 or later

    • SuSE 11 or later

Icon

If you plan to launch the MapR installation from a SuSE node, you must issue the following command to create a symbolic link, named libssl.so.10, that points to libssl.so.0.9.8 under /usr/lib64 before you perform the installation:

cd /usr/lib64
ln -s libssl.so.0.9.8 libssl.so.10
  • The operating system on each node must meet the quick installer package dependencies. 

    Operating SystemPackage Dependencies
    Ubuntu
    • python-pycurl

    • libssl0.9.8

    • sshpass

    CentOS/Red Hat
    • python-pycurl

    • libselinux-python

    • openssl098e

    • sshpass

    • openssh-clients

     SUSE
    • python-pycurl
    • libopenssl0_9_8
    • sshpass

Before You Install

You can install the MapR distribution for Hadoop on a set of nodes from any machine that can connect to the nodes. The machine you install from does not need to be one of the cluster nodes. The following steps set up the installing machine:

  1. Download the mapr-setup file from one of the following URLs: For an Ubuntu installation, http://package.mapr.com/releases/v3.1.1/ubuntu/
    For a Red Hat or CentOS installation, http://package.mapr.com/releases/v3.1.1/redhat/
    The following example uses the wget utility to download the mapr-setup file for an Ubuntu installation:
    $ wget http://package.mapr.com/releases/v3.1.1/ubuntu/mapr-setup
  2. Navigate to the directory where you downloaded the mapr-setup file and enable execute  permissions with the following command: $ chmod 755 mapr-setup
  3. Run mapr-setup from the directory where you downloaded it to unpack the installer files to the /opt/mapr-installer directory. The user running mapr-setup must have write access to the /opt and /tmp directories. Alternately, execute mapr-setup with sudo privileges, as in the following command: $ sudo ./mapr-setup

You are now ready to install.

Using the MapR Quick Installer

 You can use the MapR quick installer in interactive mode from the command line or provide a configuration file. Details about the format and syntax of the configuration file are provided later in this document.

Before you begin installing, verify that all the nodes are configured to have the same login information. If you are using the quick installer in interactive mode, described later in this document, verify that all of the nodes have the same disks for use by the MapR Hadoop Platform.

Icon

This installer enables password-authenticated ssh login, which remains enabled after installation. Disable password authentication for ssh manually after installation by adding the following line to the sshd_config file and restarting ssh:
PasswordAuthentication no

Installing from the Command Line with Interactive Mode

The default invocation of the MapR quick installer requires the root user or sudo privileges, as in the following example:

# sudo /opt/mapr-installer/bin/install -K -s new

For more information on the syntax and options for the quick installer, see the Quick Installer Options section later in this document.

Interactive Mode Sample Session

The following output reflects a typical interactive-mode session with the MapR quick installer. User input is in bold.

Verifying install pre-requisites

... verified

===============================================================================
=                                                                             =
=  __  __                ____    ___              _          _  _             =
= |  \/  |  __ _  _ __  |  _ \  |_ _| _ __   ___ | |_  __ _ | || |  ___  _ __ =
= | |\/| | / _` || '_ \ | |_) |  | | | '_ \ / __|| __|/ _` || || | / _ \| '__|=
= | |  | || (_| || |_) ||  _ <   | | | | | |\__ \| |_| (_| || || ||  __/| |   =
= |_|  |_| \__,_|| .__/ |_| \_\ |___||_| |_||___/ \__|\__,_||_||_| \___||_|   =

=                 |_|                                                         =
=                                                                             =

===============================================================================

Version: 2.0.125

An Installer config file is typically used by experienced MapR admins to skip through the interview process.

Do you have a config file (y/n) [n]: n

Enter the hostnames of all the control nodes separated by spaces or commas []: control-host-01,control-host-02,control-host-03

Icon

Only 1 or 3 control nodes are supported.

Icon

Host name resolution of all nodes in the cluster must be consistent across cluster nodes and the multi-node installer's driver node (the node from which the installation is launched). For example, either all nodes must be specified with a fully qualified domain name (FQDN) or none of the nodes can be specified with their FQDN.

Enter the hostnames of all the data nodes separated by spaces or commas []:
Set MapR User Name [mapr]:
Set MapR User Password [mapr]:
Is this cluster going to run MapReduce? (y/n) [y]:
Is this cluster going to run Apache HBase? (y/n) [n]:
Is this cluster going to run MapR M7? (y/n) [y]:
Note: MapR Tables require the M7 license level.
Enter the full path of disks for hosts separated by spaces or commas []: 
/dev/sdb

Icon

The MapR quick installer uses the same set of disks and partitions for each node in the cluster. To specify disks and partitions individually for each node, use a configuration file.

Once you’ve specified the cluster’s configuration information, the MapR quick installer displays the configuration and asks for confirmation:

       Current Information (Please verify if correct)
       ==============================================

       Accessibility settings:

           Cluster Name: "my.cluster.com"
           MapR User Name: "mapr"
           MapR Group Name: "mapr"
           MapR User UID: "2000"
           MapR User GID: "2000"
           MapR User Password (Default: mapr): "****"

       Functional settings:

           WireLevel Security: "n"
           MapReduce Services: "y"
           MapR M7: "y"
           HBase: "n"
           Disks to use: "/dev/sdb"
           Client Nodes: ""
           Control Nodes: "control-host-01,control-host-02,control-host-03"
           Data Nodes: ""
           Repository (will download core software from here): "http://package.mapr.com/releases"
           Ecosystem Repository (will download packages like Pig, Hive etc from here): "http://package.mapr.com/releases/ecosystem"
           MapR Version to Install: "3.1.1"
           Java Version to Install: "OpenJDK7"
           Allow Control Nodes to function as Data Nodes (Not recommended for large clusters): "n"

       Metrics settings:

           Metrics DB Host and Port: ""
           Metrics DB User Name: ""
           Metrics DB User Password: ""
           Metrics DB Schema: ""

(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: m

At this point you are ready to continue with installation.

Icon

Before you proceed, you should change the default MapR user password (mapr) to make the cluster more secure. Select the p option from the modification menu shown below.

Here is the complete list of configuration properties you can change:

           Pick an option to modify
           ========================

           N] Cluster Name: "my.cluster.com"
           u] MapR User Name: "mapr"
           g] MapR Group Name: "mapr"
           U] MapR User UID: "2000"
           G] MapR User GID: "2000"
           p] MapR User Password: "****"
           S] WireLevel Security: "n"
           d] Disk Settings: "/dev/sdb"
           c] Client Nodes: ""
           C] Control Nodes: "control-host-01,control-host-02,control-host-03"
           D] Data Nodes: ""
           b] Control Nodes to function as Data Nodes: "n"
           v] Version: "3.1.1"
           L] Local Repository: "False"
           mr] MapReduce: "y"
           m7] MapR M7: "y"
           hb] HBase: "n"
           uc] Core Repo URL: "http://package.mapr.com/releases"
           ue] Ecosystem Repo URL: "http://package.mapr.com/releases/ecosystem"
           dbh] Metrics DB Host and Port: ""
           dbu] Metrics DB User: ""
           dbp] Metrics DB Password: ""
           dbs] Metrics DB Schema: ""
           cont] Continue
           : cont

(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: c
SSH Username: juser
SUDO Username: root
SSH password: ****
sudo password [defaults to SSH password]: ****

The quick installer first sets up the control nodes in parallel, then sets up data nodes in groups of ten nodes at a time. Pre-requisite packages are automatically downloaded and installed by the MapR quick installer.

Quick Installer Options

While all the options to the MapR quick installer are optional, if you use any options, you must follow them with either the new or the add parameters to specify a new installation or an addition to an existing installation.

Usage:

mapr-install [-h] [-s] [-U SUDO_USER] [-u REMOTE_USER]
                  
    [--private-key PRIVATE_KEY_FILE] [-k] [-K]
                  
    [--skip-checks] [--quiet] [--cfg CFG_LOCATION]
                  
    [--debug] [--password REMOTE_PASS]
                  
    [--sudo-password SUDO_PASS]
                  
    {new,add} ...

 

Option

Description

-h or --help

Displays help text.

-u or --user <remote user>

Specifies a user name that the MapR quick installer uses to connect to the cluster nodes.

-k or --ask-pass

Request the remote ssh password interactively.

--password

Specifies the remote ssh user’s password. Note: You cannot use this option if you are specifying a private key with the --private-key option.

--private-key <path to private key file>

Specifies a path to a private key file used to authenticate the connection. Note: You cannot use the --password option if you are specifying a private key.

-s or --sudo

Executes operations on the target nodes using sudo. If the user specified with the -u option is not root, you must use this option.

-U or --sudo-user <sudo user>

Specifies the user name of the sudo user. This user name is root on most systems.

-K or --ask-sudo-pass

Request the sudo password interactively.

--sudo-password

Specifies the sudo user’s password.

--skip-checks

Skips requirements pre-checks.

--quiet

Runs the installer in a non-interactive mode.

--cfg <path to config file location>

Install with the configuration file at the specified path.

--debug

Run in debug mode. Debug mode includes more verbose reports on installer activity.

 

The MapR Quick Installer Configuration File

The example file config.example in the /opt/mapr-installer/bin directory shows the expected format of an installation configuration file.

# Each Node section can specify nodes in the following format
# Node: disk1, disk2, disk3
# Specifying disks is optional, in which case the default disk information
# from the Default section will be picked up

[Control_Nodes]

control-01: /dev/disk1, /dev/disk2, /dev/disk3
control-02: /dev/disk3, /dev/disk9
control-03: /dev/sdb, /dev/sdc, /dev/sdd

[Data_Nodes]

data-01
data-02: /dev/sdb, /dev/sdc, /dev/sdd
data-03: /dev/sdd
data-04: /dev/sdb, /dev/sdd

[Client_Nodes]

client-01
client-02
client-03
client-04

[Options]

MapReduce = true
YARN = false
HBase = false
M7 = true
ControlNodesAsDataNodes = true
WirelevelSecurity = false
LocalRepo = false

[Defaults]

ClusterName = my.cluster.com
User = mapr
Group = mapr
Password = default_mapr_password
UID = 2000
GID = 2000
Disks = /dev/sdz
CoreRepoURL = http://package.mapr.com/releases
EcoRepoURL = http://package.mapr.com/releases/ecosystem
Version = 3.1.1
MetricsDBHost =
MetricsDBUser =
MetricsDBPassword =
MetricsDBSchema =

For a new installation, all of the sections must be present in the configuration file, though the [Data_Nodes] and [Client_Nodes] sections can be left empty. For additions to an existing installation, the [Control_Nodes][Data_Nodes], and [Client_Nodes] must be present, although they can be left empty. Other sections in the configuration file are silently ignored for additions.

The value of the Disks element of the [Default] section provides a fallback in the case that a node is specified in a previous [Control_Nodes],[Data_Nodes], or [Client_Nodes] section without any disk information.

You can omit specifying values for the keys in the [Default] section, but each of the keys must be present.

The Quick Installer Manifest File

The MapR quick installer generates a manifest file in the /opt/mapr-installer/var directory named manifest.yml. The manifest file stores your cluster’s installation state. When you specify the add option, the quick installer checks the manifest for the cluster’s current installation state.

Since the manifest file is generated on the node from which you installed MapR, you must run the quick installer from the same node if you are perfoming an addition to an existing installation. Since new installations do not reference a manifest file, new installations can be performed from any node.

Troubleshooting

The Quick Installer fails with permissions errors: Many Ubuntu systems disable the root login for security reasons.

Resolution: Start the quick installer with the following options:

# sudo /opt/mapr-installer/bin/install -u <user> -s -U root [--sudo-password <password> | --ask-sudo-pass] new

You can must use exactly one of the --sudo-password or --ask-sudo-pass options. The --sudo-password option requires you to type the sudo password in the command line. The --ask-pass option requests the sudo password interactively.

Client disconnection disrupts my installation processTo prevent issues with client disconnection from affecting the install process, run the MapR quick installer from a screen or tmux session.

Using the MapR Quick Installer on a cloud installation: Cloud computing services assign you a private key for use with your cloud computing nodes. Typically, private key files use the .pem extension. To use this private key with the MapR quick installer, verify that the permissions for the file are 0600 (-rw-------). You can use the chmod command to set the permissions, as in the following example:

$ chmod 0600 filename.pem

Once the file has the correct permissions, specify the path to the private key file with the --private-key option.

The installer hangs at the ‘Configuring MapR Software’ step: The installer reports its activity with output similar to the following example:

* 16:25:31 Install OpenJDK packages
* 16:27:42 MapR Repository Initialization
* 16:27:42 MapR Repository Initialization for RedHat
* 16:28:27 Install MapR Packages
* 16:29:04 Disable MapR Services until configuration
* 16:29:05 Configure MapR software

One potential cause of this error condition is that the MapR user specified already exists on one of the nodes. In this case, the installer does not overwrite the credentials for that existing user and cannot authenticate to that node.

Resolution: Examine the log files to determine the precise cause of the error.

The apt-get utility fails with a ‘cannot get lock’ error message: The MapR Quick Installer requires root privileges. When root privileges are not available, this error message can result.

Resolution: Check the sudo or sudo-user settings on the cluster nodes, then run the MapR Quick Installer with the -u <user> -s -U root -K new flags, as in the following example:

# sudo /opt/mapr-installer/bin/install -u <user> -s -U root -K new

Post Installation

To complete the post installation process, follow these steps:

  1. Access the MCS by entering the following URL in your browser, substituting the IP address of a control node on your cluster:
    https://<ip_address>:8443

    Compatible browsers include Chrome, Firefox 3.0 and above, Safari (see Browser Compatibility for more information) and Internet Explorer 10 and above. 

  2. If a message about the security certificate appears, click Proceed anyway.
  3. Log in with the MapR user name and password that you set during the installation.
  4. To register and apply a license, click Manage Licenses in the upper right corner, and follow the instructions to add a license via the web.
    See Managing Licenses for more information.
  5. Create separate volumes so you can specify different policies for different subsets of data. See Managing Data with Volumes for more information.
  6. Set up topology so the cluster is rack-aware for optimum replication. For more information, see Setting Up Topology.