9 min read
This blog post is the second in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer (Read Part 1). In his book, Davi outlines the rationale and key challenges of securing big data systems and applications, and he’s included some terrific anecdotes to make the entire book a quick and insightful read.
In this chapter, Davi discusses why big data security is so necessary. Here is an excerpt:
Heavy data, shortened to HeavyD, started off as sort of a joke. Humor supposedly helps with difficult subjects, and creates new insights. Whereas big is relative to volume, heavy relates to force required. It is sort of like saying high security, but even more scientific.
We are working to capture and interpret an infinite amount of analog data with our highly limited digital tools. From that perspective, today's big data tools in about five years' time no longer would be considered big, given Moore's law of progress in compute power.
Managers today seek "enhanced insight and decision making" but they will not escape the fundamentals of data like integrity or availability. This is clear. At the same time, we are entering a new world in security where the old rules may no longer apply. We simultaneously need a different approach than we have used in the past to handle new speeds, sizes, and changes while also honoring concepts we know so well from where we started. Petabytes already no longer are an exception or considered big.
Our concept of what is really heavy already should be headed into an exabyte range (1000 petabytes). Some of our present security tools will survive these transitions from light to heavy data, and some may not.
This explanation of Heavy-D illustrates the importance of designing security controls to maintain consistency across this growing scope of data.
While newcomers to big data may consider these new systems as non-production, or experimental, there may be some doubt as to the necessity of securing big data; that the size and variety of the data itself provides an intimidating barrier for potential intruders. According to Davi, “The reality is that big data environments presently require investment and planning beyond any default configuration (default safe configuration simply is not the way IT runs). Before we open up the power of big data, consider whether we have put in place the Confidentiality, Integrity and Availability we need to keep things pointed in the right direction.”
Data integrity is really about keeping data sufficiently trustworthy so we don't end up believing the wrong things, seeing the wrong data, making mistakes because false knowledge. After we ensure we won't lose access to the data we need - (availability and integrity), we can address the issue of making sure only the right people see the data that we worked so hard to preserve. Complicated systems can be fragile in unexpected ways if not tested properly for safety, and expensive to fix down the road. It’s better to think of them before you start the journey.
Davi provides another illustrative story:
“The image below is an ancient drawing of a Japanese rock garden or "karesansui" (æž¯å±±æ°´) with a kitten leaving tracks as it approaches the building.
How does one achieve true confidentiality in this world of billions of sensors picking up traces of your every move? If planned properly, the system in the garden requires an entire refresh to set it back to an undisturbed state. The pebbles or sand are a simple construct with lines pulled to reveal any intruders.
When you think in current technology terms and the stones you're "touching," it helps to look at apps running on your mobile phone and who is watching them for movement. Here's a simple illustration of who might be sitting on the wooden platform watching their stones in the tech garden:
The point is, if you know something about how the system works (as with the rock garden), you may be able to come up with a plan to disable it or change the outcome.
The people asking usually are those building big data tools and environments. We can turn the question around on them and ask, ”Why bother building anything?” In the normal course of building a new system, bugs are found, mistakes are made, systems fail. Should we even try when we know nothing is perfect, and know we always will be leaving some money on the table?
“The value of a properly functioning system should include security in its definition.”
Availability, for example, is an easy place to start and discuss shared values. Integrity usually has shared value as well; especially when you highlight that high availability of bad data may not be better than no availability at all.
A website forced offline because of resource exhaustion means something discrete can be added to relieve the pressure and the site brought back online. But when a customer sees someone else's credit card in their checkout cart, leaving a site off-line may be the best option until integrity of accounts can be re-established. The fix to integrity often gets more complicated.
Confidentiality has been the hardest sell to developers because it has less clear objective alignment with big data projects. Everyone knows what an outage does. Most people know what quality failures will do. Privacy sometimes confuses people as it sits opposed to the very purpose of big data projects to gather information for knowledge. Science projects, the birthplace of big data systems, tend to be about doing things faster, rather than keeping them private. They are working with public data by definition, such as looking up into space to record interesting light - but they can't make the night sky private. But a major shift has been happening with the adoption of big data tools into industries where privacy has some very important cost and benefit considerations. Security can help preserve value, if we can explain how it becomes an advantage for the business that thirsts for gathering as much data as possible to achieve knowledge.
So, does securing big data even work? The bottom line is yes; security really works and makes a positive difference. Proof, unsurprisingly, is in the data. The value and impact of secure big data systems will only become greater as customers look for competitive differentiation, seeking big data platforms that are designed with trust in mind, such as the MapR Data Platform. And the best way to demonstrate trust is through measurement, the same way people are demonstrating value in all their big data deployments.
“The difference is really just the measurements and values of security often are set externally by regulators, accountants or security managers, obscured by marketing and translation to engineering.”
In the next blog post on the subject, we’ll take a look at how big data changes security, as outlined in Chapter 3 of the book Six Elements of Securing Big Data by Davi Ottenheimer.
Compliments of MapR.
References and More Information:
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.