Big Data and Apache Hadoop for the Banking and Securities Industry

Big Data and Apache Hadoop for the Banking and Securities Industry

Solution Overview

The 3 V’s—Volume, Variety & Velocity—commonly associated with big data don’t necessarily have equal impact on the banking and securities industry. Investment banks have been dealing with high velocity for a long time, but volume is a relatively new factor and emerging as the strongest driver for banks to look at big data and Apache™ Hadoop®.

Banks are looking for solutions to workloads that can’t be easily handled by existing technologies or are too expensive. The global recession in 2007 resulted in sweeping changes for the entire banking industry, with banks having to comply with more stringent capital requirements. Risk regulation is an important topic in the Basel III norms to ensure that appropriate exposure is maintained across asset classes and counterparties.

Given this backdrop, it should not come as a surprise that almost 50% of projects in banks are driven by regulatory or legal requirements. Improving agility through faster response times for analytics, overcoming rigid data integration processes, and automation of processes such as SOX compliance account for approximately 25% of projects. IT cost saving projects round out the final 25% of initiatives at banks as they continue to seek ways to not only boost efficiency but also to conserve capital, given the new stringent capital requirements.

The industry-leading MapR Distribution including Hadoop is ideally suited for a wide variety of use cases in the banking and securities industry, some of which are detailed below.

Client and Counterparty Credit Risk Analytics

Increased regulations have put enormous pressure on banks to demonstrate risk analysis to auditors. Inability to show risk analytics can result in banks facing regulatory fines. Central to this is the ability to calculate client and counterparty risk more precisely with credit value adjustments. Intense computational power and lots of storage is required to perform these calculations. The data also lives in numerous places and is owned by separate trading desks. A leading bank is using the MapR Distribution to bring together this disparate data and provide the requisite power and storage to achieve the following benefits:

  • Compute trillions of credit value calculations per day on a cost-effective, parallel compute platform
  • Enable fast and easy access to all the data sources through a high-performance, distributed NFS storage architecture

Regulatory Compliance

Regulations such as Dodd-Frank and the Volcker Rule have resulted in banks having to capture all relevant data in a central repository and perform analytics to understand exposure and perform stress tests. Failure to meet these requirements can result in steep fines for banks.

The MapR Distribution allows banks to meet their regulatory compliance objectives and achieve the following benefits:

  • Establish an enterprise data hub that contains all relevant compliance data (structured & unstructured) and can be securely accessed
  • Perform intensive stress testing using a high performance, distributed computing platform comprising of fewer servers
  • Enable analysts to use existing BI & visualization tools to develop needed dashboards for end-user consumption

Real-Time Securities Fraud and Rogue Detection

Incidents like the London Whale trading one in 2012, which resulted in a $6.2 billion loss for a large global bank, have heightened the need for stronger realtime monitoring of trades in order to prevent rogue trading. Banks need to stay ahead of patterns such as front-running and market manipulation. Doing so requires running extensive and expensive simulations to find new patterns of rogue activities and in real time. By using the MapR Distribution, a major regional bank is able to realize the following benefits:

  • Perform real-time anomaly detection on known patterns of activities and use learned patterns from prior modeling and simulations
  • Correlate transaction data with other streams (chat, email, etc.) in a cost-effective parallel processing environment
  • Reduce query time from hours to minutes on large volumes of data
  • Build a single platform for operational applications and analytics that reduces total cost of ownership (TCO)

Marketing Analytics

Marketing organizations in banks are inundated with the increasing variety of data as a result of the explosion of channels such as the web, social media and mobile. They have to understand not only individual consumer behavior but also institutional behavior for their larger customers in order to offer more tailored products and services. Using MapR, a major bank is able to analyze institutional client behavior and realize the following benefits:

  • Provide data scientists in the marketing function with a platform that provides high-speed data access and supports multiple tools (Python, Pig, R) for faster time to value
  • Deliver the right upsell/recommendations on products directly to consumers at the right time by making use of all the relevant data including market data, clickstream, and social graphs

MapR Distribution including Hadoop Highlights

  • Direct Access NFS. Direct data ingestion, familiar access methods, existing tools/libraries continue to work.
  • Integrated security. Built-in data access controls.
  • Volume support. Disparate user groups and data by logical volumes.
  • Job placement control and resource management. Jobs run simultaneously in the same cluster.
  • High availability and disaster recovery. Business continuity and higher business-level service level agreements
  • Data protection. Consistent snapshots with point-in-time audits and recovery
  • Support for structured, semistructured, and unstructured data. All data in the enterprise data architecture.
  • High performance. Fast, responsive access to data, and higher throughput.

Key Benefits

  • Simplified architecture with easy data access to all enterprise data in a single repository
  • Fast, responsive access to data to enable real-time operations
  • Low cost storage along with the benefits of high-end storage platforms
  • High uptime for the reliability to meet stringent SLAs and avoid costly downtime
  • Support for big data-driven operational applications
  • Built for extreme scalability at low costs

About MapR

MapR delivers on the promise of Hadoop with a proven, enterprise-grade platform that supports a broad set of mission-critical and real-time production uses. MapR brings unprecedented dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified big data platform. MapR is used by more than 500 customers across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies. Investors include Lightspeed Venture Partners, Mayfield Fund, NEA, and Redpoint Ventures.