Machine Learning at American Express: Benefits and Requirements

Contributed by

13 min read

Curious to know how American Express uses machine learning successfully, in production, at very large scale? An audience of over 300 recently got a peek into this big data story thanks to a presentation by Chao Yuan, SVP at American Express who heads their Modeling and Decision Science Team for US Consumer Business, and by co-presenter Ted Dunning, Chief Application Architect at MapR Technologies, at an event organized by the Hive Data Think Tank in Palo Alto. Chao talked about a collection of production big data use cases in which American Express has seen big benefits from using machine learning to improve decisions and better leverage their data. Ted then explained what is required of a big data platform in order to support large-scale machine learning projects such as these in productions settings.

Data from both sides of business

American Express is used to operating at large scale. In business for 165 years, it has continued to transform itself to keep up with changing demand. It has gone from being primarily a shipping company, then a travel business and now a major credit card issuer, handling over 25% of US credit card spending. And in 2014, the company reached a milestone: one trillion dollars in transactions. The nature of the company gives it the opportunity to see data from both the customer and merchant side of business, in fact, from millions of sellers and millions of buyers. As Chao says, one thing American Express is never short of is data. But the question is, how can they best leverage this data to improve the decisions they make?

About five years ago, American Express recognized that traditional databases would not be enough to effectively handle the level of data and analytics needed for their projects, and decided that a big data infrastructure would be the solution. They began to employ a Hadoop platform for their infrastructure and to turn to machine learning experts such as co-presenter Ted Dunning to help them learn how to get inside the data in order to become “more intelligent”. They began to leverage machine learning techniques in a wide range of key interactions.

Machine learning American Express

Data volume is not only increasing, but data sources are also changing. More people do business online or via their mobile devices. Chao explained that as part of American Express’s ongoing journey, they must keep up with these changes in style of interactions as well as with the increasing volume. Part of that involves making a huge number of decisions, millions every day. If American Express can become just a little bit smarter in these decisions, it can have a huge advantage to customers and to the company. That’s why they are expanding how they use machine learning at large scale. With access to big data, machine learning models can produce superior discrimination and thus better understand customer behavior.

Machine learning implemented in production

Chao talked in particular about three classes of big data machine learning use cases that American Express has implemented in production: Fraud detection, new customer acquisition and recommendation for better customer experience.

Use Case 1: Fraud Detection

In the case of fraud detection and prevention, machine learning has been helpful to improve American Express’s already excellent track record, including their online business interactions. To do this, modeling methods make use of a variety of data sources including card membership information, spending details, and merchant information. The goal is to stop fraudulent transactions before substantial loss is incurred while allowing normal business transactions to proceed in a timely manner. A customer has swiped their card to make a purchase, for instance, and expects to get approval immediately. In addition to accurately finding fraud, the fraud detection system is required to have these two characteristics:

  • Detect suspicious events early
  • Make decisions in a few milliseconds against a vast dataset

Large-scale machine learning techniques done correctly are able to meet these criteria and offer an improvement over traditional linear regression methods, taking the precision of predictions to a new level.

Use Case 2: New Customer Acquisition

Finding new customers is a widespread need for business, and American Express is no exception. For example, when a prospective customer visits their website, there are many products (different credit card plans) from which to choose. Previously, around 90% of new customers came from direct mail campaigns, but now, with the web and with the advantage of targeted marketing through machine learning models, the amount of new customer acquisition via online interactions has risen to 40%. This is advantageous especially because the costs involved online are less than direct mail contact.

Use Case 3: Recommendation for Improved Customer Experience

Chao mentioned that one of his favorite uses of machine learning at American Express is to build a machine learning mobile phone application to provide customized recommendations for restaurant choices. When the customer gives permission, the machine learning application uses recent spending histories and card member profile data for a huge number of transactions to train a recommendation model. The model predicts which restaurants a particular customer might enjoy and makes those recommendations (The technical basis for this approach was further explained by co-presenter Ted Dunning, as described below.)

The level of success of this improved customer experience is not only of interest to the card issuer but also to restaurant merchants who get feedback on how good a particular offer may be.

What Are the Requirements for the Data Platform?

Doing these types of successful large-scale machine learning in production puts certain requirements on the big data platform that supports them, which was the main focus of Ted’s presentation. Machine learning applications need to work with large amounts of data from a wide range of sources that has been prepared and staged in specific ways. The MapR data platform is well suited to store, stream and facilitate search on data that is big and needs to move fast.

MapR data platform storage and processing

MapR’s real-time read/write file system, integrated NoSQL database and large array of Hadoop ecosystem tools meet the needs of large-scale machine learning applications. MapR’s ability to use legacy code directly, to make consistent snapshots for data versioning and to use remote mirroring for applications synchronized across multiple data centers are especially useful.

Ted explained that recommendation systems similar to the mobile application Chao described can leverage large amounts of data on user behavior histories to train a machine learning model that predicts what items each user is likely to prefer. The model identifies recommendation indicators based on historical co-occurrence of users and items (or actions). The beauty of this type of design is that the computational heavy lifting in which the learning algorithm is used in training the model can be done ahead of time, offline. Then conventional techniques such as a search engine can be put to work to easily deploy the system, making it able to deliver rapid online recommendations in real-time.

recommendation systems

Although the specific design and choice of algorithms differs with different types of machine learning, these applications do share some commonalities in the needs that they place on the big data platform that supports them.


The quality of recommendations, as with other machine learning applications, depends in part on the quality and quantity of available data. Models learn from motifs observed across a large number of historical actions, so one requirement for the data platform is scalability of storage. This is true for different types of use cases from fraud detection at secure websites to predictive maintenance in large industrial settings.


Another need placed on a data platform by machine learning applications is to handle large-scale queries fast. Take the example of detecting anomalies in the propagation of telecom data. When special events occur, large groups of people may suddenly put a localized burden on telecommunications, such as tens of thousand of people in a sport arena who are using their phones for tweeting. To avoid having such a situation overload the communication system, it’s useful to temporarily activate localized higher bandwidth to serve this “flash mob”. If you can detect these anomalies quickly, you can prioritize service appropriately including maintaining service for first responders in an emergency. This need for speed is similar to the requirement for rapid response when validating a credit card transaction – either way, the machine learning system must be able to rapidly query millions of records and return a response in a second or less.

Compatible non-Hadoop File Access

Hadoop is a bit of a revolution, but in order to make the best use of existing experience as well as new ideas, it’s helpful for a data platform to seamlessly support both Hadoop and non-Hadoop applications and code, particularly with other machine learning systems. Ted explained that the fact that MapR has a real-time fully read/write file system that supports NFS access makes it possible to use legacy code along with Hadoop applications, a particular advantage in large-scale machine learning projects.

Data Versioning

One difference between a machine learning project being an interesting “arts and craft” study to being a serious work of engineering is the capability for repeatable processes and, in particular, for version control. It’s a challenge to do version control at the scale of terabytes and more of data, because it’s too expensive in space and time to make full copies. What is needed instead are transactionally consistent snapshots for data versioning, such as those available with the MapR data platform. Consistent snapshots let you understand and reference the state of training data from an exact point-in-time, compare results for old data and new models or vice versa and ultimately see what is really causing changes in observed behavior.

Federation Across Multiple Data Centers

With some large companies, machine learning projects need to be deployed across multiple data centers, either to share results or more often to share a consistent collection of data for use in independent development projects. This requirement for the data platform is met by MapR mirroring, which creates a consistent remote replica of data that is useful for geographic distribution as well as to provide the basis for disaster recovery if needed.

Additional Resources

For more on related topics, download these short O’Reilly eBooks provided as a courtesy by MapR. Two are in the series “Practical Machine Learning”, a third deals with open source tools for building high performance time series databases, and the most recent looks at other Hadoop-based production use cases, based on the experience of MapR customers.

This blog post was published April 03, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now