Top 10 Hadoop Blogs of 2014

Contributed by

5 min read

As we close out the year, here is a look back at our 10 most popular blogs of 2014. Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.

  1. Five Steps to Avoiding Java Heap Space Errors By Aaron Eng. Keeping these five steps in mind can save you a lot of headaches and avoid Java heap space errors.
  2. Deep Learning vs. Neural Network Learning: Whiteboard Walkthrough By Sungwook YoonData. Scientist Sunwook Yoon talks about deep learning because he finds that some data scientist who know a lot about machine learning think that deep learning is the same thing as a neural network.
  3. This video clarifies the differences. It’s About Time: Time Series Databases By Ellen Friedman. Recording the time at which a measurement was made or an event occurred can make data much more useful for revealing valuable insights.
  4. Time Series Databases: New Ways to Store and Access Data, published by O’Reilly, examines the fundamental concepts and practical methods for implementation of scalable, cost-effective time series databases.
  5. Let Spark Fly: Advantages and Use Cases for Spark on Hadoop - Webinar Follow Up By Michele Nemschoff. Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and there’s been plenty of hype about it in the past several months. In the latest webinar from the Data Science Central webinar series, titled “Let Spark Fly: Advantages and Use Cases for Spark on Hadoop,” we cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal.
  6. Loading a Time Series Database at 100 Million Points Per Second By Jim Scott. There are many use cases for time series data, and they usually require handling a decent data ingest rate. Rates of more than 10,000 points per second are common and rates of 1 million points per second are not quite as common, but not outrageously high either.
  7. Comparing MapR XD and HDFS NFS and Snapshots By Bruce Penn. Having been at MapR for 2.5 years, a common question that I get from customers is, “Isn’t HDFS going to eventually catch up to MapR XD?” The simple answer is a resounding “NO”, and the reasons lie in the foundations of the two architectures. I will first describe these differences and then outline how the implementations vastly differ in their value to customers.
  8. 14 Benefits and Forces That Are Driving The Internet of Things By Dr. Kirk Borne. The Internet of Things (IoT) will be huge in several ways. The forces that are driving it and the benefits that are motivating it are increasingly numerous, as more and more organizations, industries, and technologists catch the IoT bug.
  9. Finding the Zebra in a Herd of Ponies - A new look at anomaly detection By Ellen Friedman. The second publication in the O’Reilly Practical Machine Learning series, A New Look at Anomaly Detection by Ted Dunning and me we look at finding the outlier, the zebra in a herd of ponies, the fish swimming against the school of fish, the rare event.
  10. Top 10 Reasons for Using Apache Drill - Now as Part of MapR Distribution Including Hadoop By Neeraja Rentachintala. Since Apache Drill 0.4 was released in August for experimentation on the MapR Distribution, there has been tremendous interest in the customer and partner community on the promise and potential of Drill to unlock the new types of data in their Hadoop/NoSQL systems for interactive analysis throughout the organization.

Note: Read Apache Drill Carries Momentum into 2015 for the latest news on Drill.Top 10 Big Data Challenges – A Serious Look at 10 Big Data V’s By Dr. Kirk Borne. About 13 years ago, Doug Laney of the META Group (now Gartner) wrote an amazing report that showed both great insight and great foresight. The paper’s title was “3D Data Management: Controlling Data Volume, Velocity, and Variety.” The 3 V’s of big data were born on that day—February 6, 2001. My only not-so-serious quibble with the paper is that he should have started the title this way: “3V Data Management…” Nevertheless, from that point forward, the big data game was officially on!

This blog post was published December 31, 2014.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now