11 min read
In the ever-expanding world of IoT, with so many choices and places to start, it’s easy to get decision paralysis and be distracted by all the options, architectures and platforms. Often, just getting started with an idea and trying a real-world example is a good way to develop more ideas and eventually create something that helps your customers or enables a new data-driven project. Whether your needs revolve around software, sophisticated electronics, hundreds of millions of data points, or “all of the above,” one way to cut through the clutter is just to try something and see if it works, then build on it as more ideas surface and you get a sense of how to move forward and what needs improvement.
In this post, we’ll show you how to get started with an example IoT project using Hadoop that is relatively simple to create, based on the BeagleBone Black and the MapR Sandbox for Hadoop. It lays the groundwork for an architecture that can scale to many millions of data points. (A good discussion of this and other boards as IoT building blocks is here).
In this project, you get the best of both worlds: the barriers to getting started are low, yet there are none of the typical PoC tradeoffs of agility or scale. We’ll build a one-node data gathering platform that sends data back to a MapR cluster, stores the data in OpenTSDB tables within MapR Database, and enables further analytics/dashboarding with SQL, Tableau, or your front end of choice. This makes a great launching pad for other IoT projects you may envision down the road.
It’s worth pointing out that using MapR for your architecture gives you some serious advantages right from the beginning. For example, once you’ve grown a large dataset and are using it in production, you will want to take consistent snapshots, which will be important since the data is, by its nature, changing fast.
Software and Hardware Prerequisites
Configuring the MapR Sandbox
First, download and install the sandbox VM according to the instructions. We’ll need to do a little configuration to get things talking to each other. The steps you take to connect the board here may differ slightly depending on your mix of host OS, board software revision, and any other preexisting configuration done to the board, so it's best to start from scratch.
After following the instructions to download and import the VM into VirtualBox, we need to enable USB on the VM, as it is disabled by default. To do this, go to the settings for the VM, and check the box “Enable USB Controller.”
Boot the VM normally. You should see a welcome screen that shows the URL for configuration. Press Alt-F2 to bring up another console and login as ‘mapr’ with password ‘mapr’.
Since the Sandbox runs Linux, we can use the handy mkudevrule.sh script (from beagleboard.org) to create udev rules to make the Sandbox automatically configure the board interface when it connects. Download and run that script on the Sandbox with root privileges:
chmod +x udevrule.sh
(enter same password)
Before moving to the next step, boot the BeagleBone board via the getting started instructions. Note that since you’re using the MapR Sandbox (which is based on Linux), you will be following the Linux instructions. If you are powering the board over USB, simply plug it into the host computer. Once the board has booted, you will need to connect the USB device to the VM (and disconnect it from the host OS) by right-clicking on the symbol in the lower right corner of VirtualBox, as shown below.
You should now have a new Ethernet interface for communicating with the BeagleBone. Bring it up and give it an address (the board runs a DHCP server by default). As root,
ifconfig eth1 up
Check that you have an IP address with'ifconfig eth1’.
After this happens, you will see a new Ethernet interface on the sandbox. Note that this interface might have been assigned to a different subnet than what you see in this example, so you may have to substitute your own values as necessary.
At the end of the OpenTSDB configuration tutorial you should have started ‘tsdb’ with some command-line options or edited a configuration file so that it contains similar options. An example command line invocation from the OpenTSDB root directory is:
./build/tsdb tsd --port=4242 --staticroot=build/staticroot
--cachedir=/tmp/opentsdb_tmp --zkquorum=10.10.101.50:5181 --table=/tsdb
It’s worth pointing out here that the ‘--auto-metric’ flag should not be used in production, but is OK in a private testing environment.
Configuring the BeagleBone Black
We’ll use the open-source tcollector software, written in Python, loaded onto the board to collect data and send it back to OpenTSDB. This is where you could easily add your own sensors to the board and write your own “collector”, which is very simple with tcollector, just write your data to stdout with a timestamp. In this case we’ll use the builtin collectors that come packaged with tcollector.
On the Sandbox, clone the tcollector sources and transfer them to the BeagleBone.
git clone git://github.com/OpenTSDZB/tcollector.git
tar cf tc.tar tcollector
scp tc.tar firstname.lastname@example.org:/tmp
Open an ssh session into your BeagleBone and untar the sources:
tar xvf /tmp/tc.tar
Go into the tcollector directory and edit the file ‘startstop’. Change the line with the variable TSD_HOST to the name of the sandbox. An easy way to do this is to use ‘thost’ for the name, and add an entry to /etc/hosts for ‘thost’.
Since we are going to be sending timestamps back to OpenTSDB, for testing purposes it is a good idea to set the clock on the BeagleBone by running ‘date’, for example, to set the date to February 27th 2015 5:00pm, use ‘date -u 022717002015.00’. For production environments you may want to use something like ‘ntpdate’. There are even third-party real-time clocks that can be used with the board, like the Dallas DS1307.
Firing it All Up
Let’s get all this running and look at some data points.
From the BeagleBone, run the command:
Check the log to see if you’re sending data and/or if anything is wrong:
tail -f /var/log/tcollector.log
Now let’s check the OpenTSDB web interface to verify that we are getting data. Doing this requires adding a port forwarding rule in VirtualBox, in the Settings->Network->Advanced section for the single Ethernet interface. Add a port forwarding rule to port 4242 on the localhost to the address of the sandbox VM, as shown below.
After completing this step, enter ‘http://127.0.0.1:4242’ in your browser on the host machine. You should see the OpenTSDB web interface.
Enter a recent time range for viewing data, and you should start to see graphs. This means that data points are being recorded. A few good example metrics built in to tcollector are proc.loadavg.1min and proc.loadavg.5min, if you put one of these as a metric, you should start to see some data. OpenTSDB has a relatively simple graphing interface that is more for testing than production (it makes the charts internally with gnuplot) but its a handy way to make sure the data you expect is hitting the database.
Adding More Functionality
Now you have an end-to-end setup running with a single board. The tcollector instance sends back all kinds of data about the board itself and you can use this as a platform to start sending more.
At some point, after you are collecting enough data or want an operational dashboard, adding a frontend is essential for showing the value. One such frontend is Grafana but there are a few different ones out there. At the recent Strata event in San Jose, we had a demo that collected temperature sensor from this same environment, using a live Grafana dashboard to show the temperature of the booth.
Many different kinds of sensors are available. Once you have the right set of sensor hardware connected to the board, you can add a ‘collector’ for tcollector with just a few lines of Python.
Lastly, if you’re involved in the design of an architecture of a time-series data framework, check out the completely free eBook, Time Series Databases, published by my colleagues Ted Dunning and Ellen Friedman here at MapR.
_(Image Credit: Main IoE image: Cisco)_
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.