Using Hadoop as an enterprise-level tool demands the data protection and disaster recovery capabilities provided by the MapR distribution for Hadoop. As the amount of enterprise-critical data that resides in the cluster increases, the need for securing access becomes just as critical. The enterprise-ready capabilities of the MapR distribution for Hadoop enable the long-term storage of large data sets and warehousing of archived data, which can be selectively re-processed for specific analyses to reveal insights useful to the enterprise.
Since data must be shared between nodes on the cluster, an intruder can access the data as it is transmitted between nodes and from the client to the cluster. Networked computers are also vulnerable to attacks where an intruder successfully pretends to be another authorized user and then acts improperly as that user. Networked machines also share all the security vulnerabilities of a single node.
A secure environment is predicated on the following capabilities:
- Authentication: Restricting access to a specified set of users. Robust authentication prevents third parties from representing themselves as legitimate users.
- Authorization: Restricting an authenticated user's capabilities on the system. Flexible authorization systems enable a system to grant a user a set of capabilities that enable the user to perform desired tasks, but prevents the use of any capabilities outside of that scope.
- Encryption: Restricting an external party's ability to read data. Data transmission between nodes in a secure MapR cluster is encrypted, preventing an attacker with access to that communication from gaining information about the transmission's contents.
The Security Overview provides a high level summary of the specific methods used by MapR to implement authentication, authorization, and encryption in a cluster.
The Configuring MapR Security section provides specific procedures on how to configure and manage particular aspects of the MapR security infrastructure.