6 min read
Information wants to be free. Open source is free. Moore’s law is making computing free. Free, Free, Free.
Enough already with the free. In the real world, computing costs money. Making great products costs money. More efficient computing saves money. If you’re running a serious big data infrastructure, you must first focus on getting value, but once that’s done, you must make sure you are not bleeding money in a variety of ways.
Is your cluster too big? Are you wasting storage? Do you spend too much time on admin? Are you building up technical debt? How much is downtime wasting? If you aren’t asking these questions, you are surely wasting money.
Here’s why. In the world of open source, especially one increasingly powered by the cloud, complexity and inefficiency drive up costs. Open source projects are great at breaking new ground and encouraging a community of innovation. Open source projects are not great at productizing and optimizing code for use by nonprogrammers. Here are some points to consider:
The world of big data is being stretched by the tension between the dynamic innovation and freedom of open source and the cost to forge a product that is efficient, powerful, and easy to use. The Hadoop ecosystem is breaking new ground and filling whitespace for developers in a breathtaking way. But is the Hadoop ecosystem creating efficient products that are easy to use?
All the major Hadoop providers, of course, say that they are creating a fabulous way to use the Hadoop ecosystem. But which ones have done the work so solve the hardest problems? You don’t have to guess. You can ask vendors for references, compare costs, and do the math.
Here are some of the dimensions to look at. (For a deeper dive on these issues, see the recent CITO Research white paper, “Five Questions to Ask Before Choosing a Hadoop Distribution”):
There is no such thing as a free lunch. There is no escaping the strengths and weaknesses of open source projects. In my view, free does not mean efficient, secure, or easy to use. Some Hadoop providers essentially say, “Don’t worry. We are getting better. Hadoop’s open source ecosystem will eventually be just like a great commercial product.”
MapR takes a different approach, which recognizes that getting a product right requires extra work, some of which simply doesn’t happen inside an open source community. MapR argues as follows, “We are going to take responsibility for everything needed to create great commercial products for enterprise use in a way that is compatible with APIs. It won’t be free, but it will be worth it.” (And, recognizing that free does have value, MapR has a free Community Edition that allows customers to get started, without commercial support and lacking some enterprise features such as high availability.)
Free may be good enough if efficiency, system uptime, and ease of use really don’t matter to you. Going without these traits costs money. You can do the analysis and figure out how much. Then you can determine how free free really is and make a decision that’s right for you.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.