Securing NoSQL Databases: Use the Force

Contributed by

9 min read

With stories of the thefts of millions of credit card records and sensitive employee data at some of the world’s largest companies and government agencies dominating recent headlines, it’s not surprising that organizations are doubling down on security. Security is finally starting to get top management’s attention. Ponemon Institute’s 2015 security report found that 55 percent of top executives rated security as a significant concern, up from just 13 percent the year before. The rash of breaches in 2014 was no doubt responsible for this change.

Ponemon also reported that negligent employees are the greatest source of endpoint risk. Organizations can have the best security technology in the world, but if it’s not implemented with proper policies and best practices, it’s as good as worthless.

Computer security is becoming more complex as the scope and sophistication of attacks increases. Security experts agree that a multi-layered approach that combines perimeter preventions, access controls, and encryption works best.

As the use of NoSQL databases have grown with the popularity of big data, a common theme is that NoSQL technology is less secure than the more mature relational architecture. But this doesn’t universally apply to all NoSQL databases. And most come with a suite of robust security mechanisms. The trick is to understand and apply them.

Think security first

It’s easy for developers to fall into the trap of setting security procedures aside for the sake of expediency, or believing that existing perimeter defenses are sufficient. However, no fence is impenetrable. The Verizon 2015 Data Breach Investigations Report found that attackers can lurk for months or even years inside a compromised system, evading detection while steadily scraping up information.

That doesn’t mean that you need to secure everything. Applications using public data, for example, don’t require the same protections as those housing sensitive employee or financial information. But think ahead to how the application might be used in the future. It’s more effective to bake security into the software from the beginning then to layer it on top later. You still need to have the processes and rules in place to prevent the equivalent of placing your house key under your welcome mat. But at least the security controls in your software can get you started in protecting your data. Here are some basic measures all users of NoSQL databases can employ.

Controlling data access

Authentication is the meat and potatoes of information security. It validates a user’s identity and matches it with profiles that specify access privileges. It’s a required starting point for all other security measures.

There are many authentication protocols. Kerberos is the most well-known, and it is supported by nearly every NoSQL implementation. Authentication can usually be integrated with user registries like LDAP and Active Directory to make management easier.

Underlying authentication is access control. Strongly secure NoSQL databases like Apache HBase, Apache Accumulo, and MapR Database give you the ability to restrict access to specific parts of a record. For example, you may make data like ZIP code information public, but restrict the visibility of street addresses to a much smaller group of users and limit Social Security number access to just a select few.

If your database doesn’t permit this level of control, you will have to implement a work-around like distributing customer data across multiple records, which introduces complexity and security issues of its own. That’s why granular-level controls are important factors to consider in the choice of a NoSQL database.

You can also restrict access through your use of query tools. One example is Apache Drill, a query engine that enables relational-like SQL operations to be performed on NoSQL databases. Drill supports views that enable you to restrict the output of values in a record to specific users, roles, or groups. If you’re considering using NoSQL in an analytical environment, as many companies now are, Drill is a good open source options to consider. It currently works with a few NoSQL databases, and the list will certainly grow.

Know what’s happening

Audits won’t prevent bad guys from attacking, but they can help you spot and contain suspicious activity quickly. Auditing tools provide a wealth of benefits beyond just security by revealing useful information like which data sets are used most frequently accessed and which operations are creating bottlenecks. This is useful for performance tuning.

For security purposes, audit logs can identify unusual spikes in activity, access from unknown IP addresses, or repeated attempts to access secure information. These may indicate that attacks are in progress. By reading access logs in real time, you can identify and contain these anomalous situations quickly. A growing number of analytics-based security tools are improving the power and sophistication of this type of analysis.

Some NoSQL databases have automatic auditing capabilities built in, giving you all the benefits of auditing without worrying about adding that capability via application-level code. If your database doesn’t have native auditing capabilities, you will have to roll your own, which you probably don’t want to do.

Encrypt data in motion and at rest

Some people mistakenly believe encryption is a form of access control, but that’s not its primary purpose. Encryption scrambles data so that it becomes unreadable to anyone who tries to access data directly from the storage device. This includes protecting data even if drives are physically stolen. There are many forms of encryption you can use, depending on the sensitivity of the data and the required speed of access, but the key thing to understand is that encryption is available both when data is on a storage medium (“data-at-rest”) and when it’s traversing the computer bus or network (“data-in-motion”).

A good technique for deploying data-at-rest encryption is to use self-encrypting drives or the Linux Unified Key Setup (LUKS) for Linux-based deployments. There are many other hardware- and software-based options available. All have trade-offs between security and performance, especially when it comes to key management, so understand your current needs and growth plans when you choose.

Another type of data-at-rest encryption preserves the appearance of legitimate data while masking the real information. For example, Social Security numbers in a customer record still have nine digits, but they’re the wrong digits. These technologies include masking, tokenization, and format-preserving encryption.

Why would you want to do this? In some cases, formats or field lengths need to be preserved after information is encrypted. There is also value in analytical applications that can identify patterns without requiring the precise underlying data. There’s more than you probably ever want to know about this topic here.

Encryption was invented to protect data-in-motion (ever see The Imitation Game?), and that is still its most common use. It’s important to have this capability automatically enabled whenever sensitive data is traversing a network across data centers, between servers in a cluster, and between applications and servers.

In summary…

Security is hard, and with the proliferation of big data deployments today, it looks like it can get harder. But, with Ponemon estimating the average cost of a data breach at nearly $3.8 million, it isn’t a “nice to have.” The good news is that existing controls within NoSQL databases provide ample security for the vast majority of use cases, as long as you supplement those controls with the right business processes.


This blog post was published August 11, 2016.
Categories

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.


Get our latest posts in your inbox

Subscribe Now