7 min read
If I had to use one word to describe my internship at MapR, it would be “transformative.” As a rising senior at the University of Southern California, majoring in Computer Science I (incorrectly) felt like I knew quite a bit about the field. I was looking forward to an opportunity to learn all about distributed computing, and Hadoop in particular. I did gain a lot of experience with Hadoop, but the technical skills and knowledge I gained during my internship turned out to be trivial in comparison to what I learned about team management, project planning, company organization and team work. In the end, it's not the newly acquired technical skills that I'm most thankful for, but the opportunity to interact with such sharp, inspirational individuals.
My semester prior to starting my summer internship at MapR was spent studying the French language and French culture during a semester abroad. After months away from computer work, I was excited to finally get back to work on such exciting technology, only to discover on my first day that I was once again in a place where I couldn't speak the language! HDFS, JobTracker, TaskTracker, YARN, Sqoop, Mahout, Flume, Hive and many more were all foreign words to me, and MapR has their own improved versions (or whole replacements) of all these concepts. During my first days at MapR, the skills I gained abroad proved to be extremely useful: inferring as much meaning as I could from the few words I knew, as well as identifying words I should ask someone about ASAP. Fortunately, I found myself comfortable discussing all these concepts in no time, thanks to much assistance from the many helpful MapR employees.
I was expecting to be a lone intern for the duration of my stay, working independently on a single project. Instead, I was assigned several tasks throughout my internship, each allowing me to collaborate with various individuals and teams and challenging me in entirely different ways.
During my first week, I got my feet wet with MapR and Hadoop through the use of Mahout's recommendation library. Adam Bordelon sent me some research papers he had found useful while working on Amazon's recommendation system and after using Ted Dunning's book, Mahout in Action, as well as reading through Ted's many answers on several troubleshooting communities, I sat in on a private presentation from Ted Dunning himself on the latest work being done in Mahout. The simple exercise of getting to know the product demonstrated to me early on the caliber of employees with whom I would be working.
Having familiarized myself with MapR and Hadoop, I next began an investigation of Apache YARN to run the new YARN Map/Reduce components on an underlying MapR filesystem. This was a challenge for me, as it required a deep understanding of the new YARN system (and I just barely understood the old system), as well as an understanding of how the MapR filesystem is used to replace Apache's HDFS. After several weeks of investigating and even a little code-writing, I finally had the YARN Resource Manager and Node Manager running TeraSort on MapR XD. I then had to present my work to a room of upper-management figures including CTO and co-founder M.C Srivas. That's something that would never happen at most software engineering internships! Srivas' deep technical understanding of the product allowed him to pose countless questions to me during my presentation that I had never considered.
I greatly appreciated the opportunity to participate in big-picture product planning as we began to plan the beta release of MapR's full integration with the YARN components. I learned first-hand the challenges of accurately planning a project that's expected to last months while also trying to satisfy deadlines of other projects. It was my first exposure to true agile development. I was able to observe the nuances of team management from my manager Seshu Adunuthula; after watching him sift through debug logs and code as well as handle day-to-day personnel management, I learned that the managerial position requires both more technical work than I expected, as well as a much more human understanding of the team than I previously thought.
I also completed assignments involving feature additions to Hive as well as modifying Apache Hadoop's Test Suite to work with MapR's products. Each of these projects allowed collaboration with other parts of the engineering team, namely QA and the Hadoop ecosystem team. It also involved another presentation to upper-management.
All in all, my internship at MapR changed the way I think about Computer Science. I learned both the architecture and use of Hadoop, Hive, MapR XD and Mahout. I also learned more about C++ and Java, particularly how they can work together through JNI calls and shell execution. I wrote my first bash shell scripts and my first python scripts. I learned Vim, and gained a much better understand of the Linux command line. I also gained experience with Mercurial, Git, Make, Ant, and Maven. I found the start-up environment to be dynamic and fast-paced and loved the opportunity to watch a company grow. I learned so many lessons about start-up strategy in addition to tech industry standards. But I still maintain that my biggest takeaway of the summer was from the employees—there were so many role models at MapR who pushed me to redefine my own “limits” as far as what I am capable of now and, more importantly, what I will be capable of later in life. I am sure that years from now, I'll still be grateful for my summer at MapR.
Spencer is currently completing his degree at USC.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.