Recent reports suggest hackers are actively compromising insecure Hadoop deployments. These attacks appear to be targeting a service port (NameNode: 50070) that is not used by MapR, making instances of MapR not susceptible to this specific exploit. The underlying attack methodology, however, is simply to find and exploit internet-accessible ports that do not require authentication. I strongly encourage MapR users to lock down open ports – as you would for any software deployment – and to follow a few additional security best practices to minimize risk of compromise. The remainder of this blog outlines a few of these best practices and suggests how to prepare for and respond to a potential breach.
MapR Security Best Practices
Before exploiting target systems, attackers typically “scan” them in an effort to better understand which instances are accessible and which ports/services on those instances are available or vulnerable. Open ports – especially those for which authentication is not enforced – represent extremely low-hanging fruit to an attacker. Steps should be taken immediately to secure these ports.
- Security Best Practice #1: Lock Down Your Ports. Ensure open ports are only accessible from the network subnet(s) that require access to your MapR instance. Ports that are open and accessible to the internet are at greater risk, but ports that are internally-accessible are susceptible as well – for example, to insider attacks. Talk to your network administrator to lock these ports down. There are several ways to achieve this lockdown, including shutting down the port/service entirely if you know it’s not used, putting in place a firewall and defining the appropriate rules, and authorizing inbound access via “security groups” if using AWS or “network security groups” if using Azure. Determining which ports are open/accessible is beyond the scope of this article, but note that tools such as Nmap or commercially available network/vulnerability scanners can easily do the job.
Figure 1. Open ports should be locked down using firewall rules or other means.
- Security Best Practice #2: Require Authentication for All Services. While it’s important for ports to be accessible exclusively from the network segment(s) that require access, you need to go a step further to ensure that only specific users are authorized to access the services running on these ports. All MapR services — regardless of their accessibility — should require authentication. A good way to enforce this for MapR platform components is by turning on security. Note that MapR is the only big data platform that allows for username/password-based authentication with the user registry of your choice, obviating the need for Kerberos and all the complexities that Kerberos brings (e.g., setting up and managing a KDC). MapR supports Kerberos, too, so environments that already have it running can use it with MapR if preferred.
Services typically don’t allow for all-or-nothing capabilities. For example, as a user of the cluster and a member of the Marketing group, perhaps I should only have access to the Marketing data. That is, my access should be limited. You can set permissions on data as well as various administrative capabilities in MapR, which brings us to our next security best practice.
- Security Best Practice #3: Set Permissions. Authenticated users should not have carte blanche against cluster data or administrative capabilities. Restrict users’ access based on need-to-know or need-to-do. In particular:
- Set MapR XD File Permissions. Set these permissions via POSIX mode bits or, alternatively, MapR’s highly expressive Access Control Expressions (ACEs). ACEs allow for the assignment of permissions via Boolean expressions, so you can say things like “Only the folks belonging to both the Marketing group and Headquarters group should have access to this file.” This is represented concisely by the Access Control Expression: “g:Marketing&g:HQ”. You don’t have this level of expressiveness when using POSIX mode bits or POSIX ACLs for that matter. [See “Using ACEs for MapR XD.”]
- Set Field Level Access Control Using Apache Drill Views. Drill users have an additional advantage of configuring field level access control via Drill views, and then setting the MapR XD File Permissions via ACEs on the view file.
- Set MapR Database Table Permissions. Set granular permissions via ACEs on MapR Database columns, column families, documents, and elements, so that only the users and groups that need access to this data have it. [See “Enabling Table Authorizations with ACEs.”]
- Set MapR Event Store Permissions. MapR Event Store allow you to specify via ACEs who can read messages from a topic, who can publish messages to a topic, and much more. [See “Streams Security.”]
- Set Permissions at Volume Level. Take advantage of MapR’s whole volume ACE feature to easily restrict permissions to data at the volume level. This feature is especially useful for clusters running multi-tenant environments as a safeguard to ensure tenants do not improperly get access to another tenant’s data. By leveraging whole volume ACEs, access to all files, tables, and streams in a given volume can be restricted to a single group (ACE) in one fell swoop — dramatically simplifying permissions management.
Figure 2. In this illustration, we show how whole volume ACEs can be used by MapR Administrators to forcefully restrict access to volume contents – in this case to members of the Finance group.
- Set Permissions on Administrative Actions. Finally, you should set restrictions on the actions users can take on a cluster. [See our “Volume Capabilities” page, for example, for information on setting permissions on actions that can be performed on a volume.]
We’re making great progress securing the cluster, but we’re not done yet. Although we’ve just set permissions on all our data, unauthorized users can still conceivably gain access to sensitive data. The vectors of attack in this case are much more difficult to pull off, but you should be aware of the possibilities and take precautionary measures as necessary.
- Security Best Practice #4: Protect Your Data. Broadly, there are two ways attackers can read data that they are not authorized to see. One way is to simply sniff network traffic between nodes, for example, in the hopes of capturing sensitive information, file data, etc. This is very hard to pull off, given the type of access an attacker would need on your network, but it is possible assuming your network traffic is unencrypted. The second way for an attacker to read unauthorized data is to steal the drives that data is sitting on. Again, this is very hard to pull off, given the type of access the attacker needs to your data center or wherever your cluster is hosted. Nevertheless, if you’re interested in protecting yourself against these types of attacks you should:
- Encrypt Data on the Wire. When you turn on security (as described above), node-to-node communications are encrypted using AES-256, which is one of the most secure ciphers available. In addition, web-accessible UIs are secured using HTTPS/TLSv1.2. You can optionally ensure your sensitive files are encrypted in-transit by enabling encryption at the file level.
- Encrypt Data at Rest. MapR supports Linux Unified Key Setup (LUKS) or self-encrypting drives (SEDs) for encrypting data at rest. Many of our partners also have encryption offerings, supporting higher levels of granularity, different encryption schemes, etc.
From time to time, the MapR security team comes across vulnerabilities that might impact our software. We come across these vulnerabilities from a variety of possible sources, including but not limited to: internal testing, customer-driven penetration tests, and publicly-available JIRAs. We take these (potential) vulnerabilities very seriously and work quickly to evaluate them for applicability and severity. Patches, if necessary, are developed, tested, and communicated to our customers; all of this is done within time frames dictated by the severity of the issue at hand.
- Security Best Practice #5: Stay Up-To-Date on Patches. MapR issues security vulnerability advisories to our customers. You can stay up-to-date on known vulnerabilities for version 5.2 here. Make sure you take the recommended action(s) to mitigate these vulnerabilities, including patching your MapR software.
They say there are two types of companies: companies that know they’ve been hacked, and companies that don’t know they’ve been hacked yet. It’s a rather alarmist statement, but most security professionals know there is a grain of truth to this. Now that I’ve got your attention, please...take a deep breath. Relax. And turn your attention to best practices #6 and #7. In both of these tips, we emphasize detection and response to supplement the preventive measures described above.
- Security Best Practice #6: Turn On Auditing. Audit logs will not prevent attack, but they can help to alert you (quickly) to a potential compromise and to aid in investigation afterward. Find instructions for managing MapR auditing here. We recommend that you enable auditing of:
- All cluster administrative actions.
- Access to data (Files, DB Tables) that is deemed especially sensitive or critical.
Figure 3. MapR’s audit feature allows for logging of data access, cluster operations, and authentication requests.
Figure 4. MapR audit logs are stored in JSON format, which means you can easily query the log files with Apache Drill. In this illustration, we show how Tableau can be used to visualize MapR audit logs (after connecting via Apache Drill) and to spot anomalies such as after-hours data access.
- Security Best Practice #7: Have a Plan for Disaster Recovery. Don’t wait until disaster strikes to figure out what you’re going to do. Your business continuity plan should include steps to take in case your MapR data gets corrupted or compromised. MapR’s built-in disaster recovery capabilities (such as snapshots and mirroring) are on par with any premium enterprise software distribution. Ensure you have backup copies of your data by taking advantage of these features. Click here for information on MapR snapshots, for example.
Bonus Best Practice
Bruce Schneier, noted cryptographer and security technologist, popularized the notion a while back that security is about the combination of “people, process, and technology.” The tips above fall primarily under a mix of process and technology only. For example, when you “require authentication,” you are leveraging underlying technology components, such as a user registry, and relying on the process of configuring your platform to use these authentication technologies. When you “turn on auditing,” you are leveraging an underlying technology feature (MapR audits), and relying on the process of turning on audits, monitoring the log entries, visualizing logs for anomalies, etc. But people – your people in particular – also represent a very important line of defense against hackers. Make sure your organization’s employees are educated about current attack vectors and know how to spot and report basic attacks, such as phishing attempts. One inadvertent click or opening of an email attachment could lead to disastrous consequences.
These guidelines are not meant to be comprehensive. Following them is no guarantee against exploit. Customers with specific questions or concerns about the security of their deployment are encouraged to speak directly with their account representative. MapR’s trained professionals are here to help.
We offer several tips for securing your MapR deployment in this blog. No method is foolproof, but following the above best practices will help minimize your risk and speed up recovery efforts should disaster strike. While the recent attacks targeting open NameNode ports are not likely to affect MapR, all big data installations represent big targets for unsavory actors, and we recommend taking every precaution possible to protect your investment.
 ThreatGeek. Revenge of the DevOps Gangster: Open Hadoop Installs Wiped Worldwide
 MapR employs a no-NameNode architecture, as discussed here.
 The term “vulnerability” here is defined as any flaw in our distribution that affects the confidentiality, integrity, or availability of a MapR deployment.
This blog post was published January 25, 2017.