5 min read
Recently, MapR launched the MapR Data Science Refinery, a novel way to deliver data science functionality and connectivity for your MapR Data Platform.
One of the great advantages to this product is the ability to deploy this workspace from wherever you choose to do your work: an edge node, a cloud instance, or even your personal laptop!
Below are the steps that are required to run the MapR Data Science Refinery from a Mac.
First, you need to install and start the Docker Environment for your operating system. You'll be given a choice between Docker Community Edition (CE) and Docker Enterprise Edition (EE), and either work for this purpose.
There are some basic commands for Mac here:
If you want to enable Shell completion, for example, you need to create symlinks to these files:
ln -s /Applications/Docker.app/Contents/Resources/etc/docker.bash-completion /usr/local/etc/bash_completion.d/docker ln -s /Applications/Docker.app/Contents/Resources/etc/docker-machine.bash-completion /usr/local/etc/bash_completion.d/docker-machine ln -s /Applications/Docker.app/Contents/Resources/etc/docker-compose.bash-completion /usr/local/etc/bash_completion.d/docker-compose
Once you have this installed, you need to pull the image into your local Docker image repository. Our Docker Hub is located here, and the pull command that you should use from your Mac terminal to pull the most recent version of the CentOS image is:
docker pull maprtech/data-science-refinery:v1.0_6.0.0_4.0.0_centos7
After you've run this, you can see that this image now exists in your registry by running:
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/maprtech/data-science-refinery v1.0_6.0.0_4.0.0_centos7 <IMAGE ID>
The only piece that you have to have in place at this point, for a secure cluster, is your MapR-SASL ticket, available somewhere on this host. For steps for generating this ticket, please see this document:
Next, you simply use the Docker Run command, passing in the highlighted variables as needed. For more information on this command and options, please visit this document:
docker run -it -p 9995:9995 -e HOST_IP=<docker-host-ip> -p 10000-10010:10000-10010 -e MAPR_CLUSTER=<cluster-name> -e MAPR_CLDB_HOSTS=<cldb-ip-list> -e MAPR_CONTAINER_USER=<user-name> -e MAPR_CONTAINER_PASSWORD=<password> -e MAPR_CONTAINER_GROUP=<group-name> -e MAPR_CONTAINER_UID=<uid> -e MAPR_CONTAINER_GID=<gid>-e MAPR_TICKETFILE_LOCATION= </path/to/ticket/file> -e MAPR_MOUNT_PATH=/mapr --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse -e MAPR_HS_HOST=<historyserver-ip> -e ZEPPELIN_NOTEBOOK_DIR=<path-for-notebook-storage> -e MAPR_TZ=<time-zone>-v </path/to/ticket/file>:/tmp/mapr_ticket:ro maprtech/data-science-refinery:latest
That's it! Now you can log into Zeppelin by visiting the UI at the following address:
And you log in using the credentials that you provided in the Docker Run command. The authorization for the jobs themselves–whether Spark, POSIX, or JDBC–is provided by your MapR-SASL ticket.
In addition, you can peruse the file system from inside the container using POSIX or Hadoop syntax from the CLI or Zeppelin. This is made possible by the MapR POSIX Client For Containers, which allows MapR customers to mount their global namespace to their Docker container.
$ ls -la /mapr/my.cluster.com/ total 3 drwxr-xr-x 10 mapr mapr 9 Nov 27 08:55 . dr-xr-xr-x 3 root root 1 Dec 16 17:43 .. drwxr-xr-x 3 mapr mapr 1 Nov 27 08:51 apps drwxr-xr-x 2 mapr mapr 0 Nov 27 08:48 hbase [...] $ hadoop fs -ls / SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Found 8 items drwxr-xr-x - mapr mapr 1 2017-11-27 08:51 /apps drwxr-xr-x - mapr mapr 0 2017-11-27 08:48 /hbase [...]
After running the Docker Run command, you see the following error:
Started service mapr-posix-client-container [FAILED]
This error can be safely ignored as it is a remnant of an issue with the MapR Persistent Application Client Container (PACC).
You're prompted to go to an unsafe site by your web browser when visiting the Apache Zeppelin UI:
This is okay and expected behavior if you haven't installed an SSL certificate for this instance.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.