How to Install and Generate Reports with Clush

Contributed by

10 min read

Clush is an open source tool that allows you to execute commands in parallel across the nodes in your cluster. This blog describes how to install clush and use it to generate a report detailing the configuration of every node in your cluster (what MapR support classifies as a “cluster audit”). Completing the audit successfully will greatly simplify your efficient deployment of the MapR Distribution of Hadoop.

The clush utility need only be installed on one node, usually the primary node in the cluster or an edge node. The package (“clustershell”) is available from the primary Ubuntu repositories, or as part of the CentOS EPEL repositories.

  1. Ubuntu apt-get install clustershell
  2. RedHat/CentOS yum –enablerepo=epel install clustershell

Completing the configuration of clush requires two additional steps :

  1. Define the node groups against which clush will run. These groups are defined in /etc/clustershell/groups Make sure the “all” group matches the nodes in your cluster. A common definition will be something like:
    all: node[1-8]
  2. Deploy ssh keys such that passwordless ssh is supported between the clush node and every node in the “all” definition created in step 1. For environments in which passwordless ssh is prohibited for the root user, it is best to select a user that has sudo privileges on the nodes to be managed.

Once the keys are in place, you should be able to test clush with a simple command: clush –a date
The “-a” flag specifies that the command should be executed on every node in the “all” group.

Helpful Hint: It is sometimes time-consuming to add acknowledge the ssh identities of all nodes in a large cluster. Using the following extra option to clush on the first execution will automatically add those identities to the user’s ssh configuration : -o “-oStrictHostKeyChecking=no”

Don’t worry: you need only use that extra option the first time you connect to the remote nodes with ssh.

Another useful flag for the clush command is “-b” … which merges common output from the command executed on the remote nodes. Check out the difference between Clush –a date
and Clush –ba date

The “-B” option performs the same merging for both stdout and stderr for the invoked commands.

With clush now configured, the following script can be used to gather details from all the nodes planned for your MapR cluster. Resolving the anomalies reported by cluster-audit will ensure that the MapR installation proceeds smoothly. script:

# 2013-Oct-06  vi: set ai et sw=3 tabstop=3:

# A sequence of parallel shell commands looking for system configuration
# differences between all the nodes in a cluster.
# The script requires that the clush utility (a parallel execution tool)
# be installed and configured for passwordless ssh connectivity to
# all the nodes under test.

scriptdir="$(cd "$(dirname "$0")"; pwd -P)"
distro=$(cat /etc/*release | grep -m1 -i -o -e ubuntu -e redhat -e 'red hat' -e centos) || distro=centos
[ $(id -u) -ne 0 ] && SUDO=sudo
shopt -s nocasematch

# Arguments to pass in to our clush execution
clcnt=$(nodeset -c @all)
parg="-B -a -f $clcnt"
parg2="$parg -o -qtt"

echo ==================== Hardware audits ================================
date; echo $sep
# probe for system info ###############
clush $parg2 "${SUDO:-} dmidecode | grep -A2 '^System Information'"; echo $sep
clush $parg2 "${SUDO:-} dmidecode | grep -A3 '^BIOS I'"; echo $sep

# probe for cpu info ###############
clush $parg "grep '^model name' /proc/cpuinfo | sort -u"; echo $sep
clush $parg "lscpu | grep -v -e op-mode -e ^Vendor -e family -e Model: -e Stepping: -e BogoMIPS -e Virtual -e ^Byte -e '^NUMA node(s)' | awk '/^CPU MHz:/{sub(\$3,sprintf(\"%0.0f\",\$3))};{print}'"; echo $sep

# probe for mem/dimm info ###############
clush $parg "cat /proc/meminfo | grep -i ^memt | uniq"; echo $sep
clush $parg2 "echo -n 'DIMM slots: '; ${SUDO:-} dmidecode |grep -c '^[[:space:]]*Locator:'"; echo $sep
clush $parg2 "echo -n 'DIMM count is: '; ${SUDO:-} dmidecode | grep -c '^[[:space:]]Size: [0-9]* MB'"; echo $sep
clush $parg2 "${SUDO:-} dmidecode | awk '/Memory Device$/,/^$/ {print}' | grep -e '^Mem' -e Size: -e Speed: -e Part | sort -u | grep -v -e 'NO DIMM' -e 'No Module Installed' -e Unknown"; echo $sep

# probe for nic info ###############
#clush $parg "ifconfig | grep -o ^eth.| xargs -l ${SUDO:-} /usr/sbin/ethtool | grep -e ^Settings -e Speed" 
#clush $parg "ifconfig | awk '/^[^ ]/ && \$1 !~ /lo/{print \$1}' | xargs -l ${SUDO:-} /usr/sbin/ethtool | grep -e ^Settings -e Speed" 
clush $parg2 "${SUDO:-} lspci | grep -i ether"
clush $parg2 "${SUDO:-} ip link show | sed '/ lo: /,+1d' | awk '/UP/{sub(\":\",\"\",\$2);print \$2}' | xargs -l sudo ethtool | grep -e ^Settings -e Speed"
#clush $parg "echo -n 'Nic Speed: '; /sbin/ip link show | sed '/ lo: /,+1d' | awk '/UP/{sub(\":\",\"\",\$2);print \$2}' | xargs -l -I % cat /sys/class/net/%/speed"
echo $sep

# probe for disk info ###############
clush $parg2 "echo 'Storage Controller: '; ${SUDO:-} lspci | grep -i -e raid -e storage -e lsi"; echo $sep
clush $parg "dmesg | grep -i raid | grep -i scsi"; echo $sep
case $distro in
    clush $parg2 "${SUDO:-} fdisk -l | grep '^Disk /.*:'"; echo $sep
    clush $parg "lsblk -id | awk '{print \$1,\$4}'|sort | nl"; echo $sep
   *) echo Unknown Linux distro! $distro; exit ;;
clush $parg -u 30 "df -hT | cut -c23-28,39- | grep -e '  *' | grep -v -e /dev"; echo $sep
#clush $parg "echo 'Storage Drive(s): '; fdisk -l 2>/dev/null | grep '^Disk /dev/.*: ' | sort | grep mapper"
#clush $parg "echo 'Storage Drive(s): '; fdisk -l 2>/dev/null | grep '^Disk /dev/.*: ' | sort | grep -v mapper"

echo ==================== Linux audits ================================
echo $sep
clush $parg "cat /etc/*release | uniq"; echo $sep
clush $parg "uname -srvm | fmt"; echo $sep
clush $parg date; echo $sep
clush $parg2 "${SUDO:-} sysctl vm.swappiness net.ipv4.tcp_retries2 vm.overcommit_memory"; echo $sep
echo -e "/etc/sysctl.conf values should be:\nvm.swappiness = 0\nnet.ipv4.tcp_retries2 = 2\nvm.overcommit_memory = 0"
echo $sep

case $distro in
      # Ubuntu SElinux tools not so good.
      clush $parg2 "${SUDO:-} apparmor_status | sed 's/([0-9]*)//'"; echo $sep
      clush $parg "echo -n 'SElinux status: '; ([ -d /etc/selinux -a -f /etc/selinux/config ] && grep ^SELINUX= /etc/selinux/config) || echo Disabled"; echo $sep
      clush $parg2 "echo 'Firewall status: '; ${SUDO:-} service ufw status | head -10"; echo $sep
      clush $parg2 "echo 'IPtables status: '; ${SUDO:-} iptables -L | head -10"; echo $sep
      clush $parg2 'echo "NTP status "; ${SUDO:-} service ntp status'; echo $sep
      clush $parg "echo 'NFS packages installed '; dpkg -l '*nfs*' | grep ^i"; echo $sep
      clush $parg "ntpstat | head -1" ; echo $sep
      clush $parg "echo -n 'SElinux status: '; grep ^SELINUX= /etc/selinux/config" ; echo $sep
      clush $parg2 "${SUDO:-} chkconfig --list iptables" ; echo $sep
      #clush $parg "/sbin/service iptables status | grep -m 3 -e ^Table -e ^Chain" 
      clush $parg2 "${SUDO:-} service iptables status | head -10"; echo $sep
      #clush $parg "echo -n 'Frequency Governor: '; for dev in /sys/devices/system/cpu/cpu[0-9]*; do cat \$dev/cpufreq/scaling_governor; done | uniq -c" 
      clush $parg2 "echo -n 'CPUspeed Service: '; ${SUDO:-} service cpuspeed status" 
      clush $parg2 "echo -n 'CPUspeed Service: '; ${SUDO:-} chkconfig --list cpuspeed"; echo $sep
      clush $parg 'echo "NFS packages installed "; rpm -qa | grep -i nfs |sort' ; echo $sep
      clush $parg 'echo Missing RPMs: ; for each in make patch redhat-lsb irqbalance syslinux hdparm sdparm dmidecode nc rpcbind nfs-utils dstat redhat-lsb-core git gcc openjdk-devel; do rpm -q $each | grep "is not installed"; done' ; echo $sep
   *) echo Unknown Linux distro! $distro; exit ;;
shopt -u nocasematch

#clush $parg "grep AUTOCONF /etc/sysconfig/network" ; echo $sep
clush $parg "echo -n 'Transparent Huge Pages: '; cat /sys/kernel/mm/transparent_hugepage/enabled" ; echo $sep
clush $parg "stat -c %a /tmp | grep -q 1777 || echo /tmp permissions not 1777" ; echo $sep
clush $parg 'java -version; echo JAVA_HOME is ${JAVA_HOME:-Not Defined!}'; echo $sep
clush $parg 'java -XX:+PrintFlagsFinal -version |& grep MaxHeapSize'; echo $sep
echo Hostname lookup
clush $parg 'hostname -I'; echo $sep
echo DNS lookup
clush $parg 'host $(hostname -f)'; echo $sep
echo Reverse DNS lookup
clush $parg 'host $(hostname -i)'; echo $sep
clush $parg "ls -d /opt/mapr/* | head" ; echo $sep
clush $parg2 "echo -n 'Open file limit(should be >=32K): '; ${SUDO:-} su - mapr -c 'ulimit -n'" ; echo $sep
clush $parg2 "echo 'mapr login for Hadoop '; getent passwd mapr && { ${SUDO:-} echo ~mapr/.ssh; ${SUDO:-} ls ~mapr/.ssh; }"; echo $sep
clush $parg2 "echo 'Root login '; getent passwd root && { ${SUDO:-} echo ~root/.ssh; ${SUDO:-} ls ~root/.ssh; }"; echo $sep

This blog post was published June 08, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now