6 min read
To justify the project, I needed to better define the problem; i.e., I needed to be able to quantify how "off" my arrays were: I needed to gather data. I knew I would be working with this data in my Jupyter notebooks, and since I have written a module for connecting to Apache Drill with Jupyter Notebooks, I decided that a simple way to log the data would be to use JSON; this would allow me the ability to work with my data, while utilizing visualization tools in Jupyter Notebooks.
I created a GitHub repo at https://github.com/johnomernik/solarpi. Much of the work I am doing can be found there, included test scripts that allowed me to test the individual components (data gathering, motor control, solar calculations, etc.). Working with sample code for the sensors, and a neat module for Python I found called Pysolar, I was off to the races.
For each array, I collected all of the sensor data, and the solar altitude and azimuth at the time of data collection. This is in the sample_data.py script. It’s simple: you configure some basic information in the
env.list file, and you run sample data to get basic results written to a JSON file. I then copied those daily JSON files from each array back to a directory in my MapR Data Platform and began to explore.
For this project, and all screenshots contained here showing data and graphs, I am using Jupyter Notebooks with Apache Drill for my queries. I am using a module I wrote called "jupyter_drill" (https://github.com/johnomernik/jupyter_drill) that allows me to interact with Apache Drill easily and then run queries on the JSON data produced by the Raspberry Pis.
Drill was a great choice as I did not have to do any ETL work in order to work with the data, just copy the JSON files to a directory in MapR XD and run the queries you see here. The jupyter_drill module returns results both to a table (if less than 1000 results) and a Pandas DataFrame. That is how I am using the graph modules from
plot.ly to display the data. The variable
prev_drill always has a data frame of the results of my last query in Drill. While this may seem to gloss over some important ETL facets, it actually just shows how easy it is to work with data with Apache Drill, and why I chose Drill for this project: more time in the data, less time trying to make heads or tails of the data and processes to load the data.
After settling on using the X-axis reading on the accelerometer (see my previous post on the hardware I used for my reasoning on that). I started collecting data on the north array (left in the pictures) on Nov 2nd, and on the south array on November 3rd. Unfortunately, November is NOT a great month in Wisconsin for solar performance, so finding days when the optical sensor is working well was difficult. I also found that the sensor returns values from approximately 1250 (fully turned east for morning sun) to -1250 (fully turned west for western sun). I have done further calibration on this, described later in this blog post.
Even with the poorly calibrated sensors and poor weather days, I found some interesting things. Here is a query on a single day (November 3rd) showing the tracking position for both arrays. Blue is the north array (left in pictures); orange is the south array (right in pictures).
Another example of a confused array tracking poorly is here (for the north array only):
Even without knowing what was causing the obvious errors, I wanted to understand how close the tracker gets to the ideal angle when things are working well. I’ll cover this topic in Part 4 of my blog post series.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.