Discovering Stolen Data with Hadoop

Contributed by

4 min read

Every week, there are reports about new data breaches at organizations ranging from retailers to government agencies to, ahem, “dating services.” In fact, the theft of sensitive data costs global industry over $445 billion each year. Even the most robust security can’t guarantee stopping sophisticated attackers and insider threats. Given the fact that the average data breach takes over 200 days to discover, adversaries have months or even years to exploit a security incident.

Given these scary statistics, if your company has sensitive data, it’s time to shift from a purely defensive posture to a proactive one. You’ll need to plan ahead for those inevitable breaches, meaning that you’ll need to be able to discover and remedy any data breaches as quickly as possible. One such company that can help with this daunting task is Terbium Labs, a new security startup that is harnessing Hadoop and big data to counter these threats.

Terbium’s technology, called Matchlight, is quite fascinating to me. It proactively discovers when information stolen from companies shows up on hidden criminal websites. To accomplish this, they use their technology to create unique “fingerprints” of a company’s data. By registering these fingerprints of a company’s most valuable data and comparing them to ones gathered from across the Internet, this technology can be used to discover unexpected appearances of sensitive information, and will alert companies immediately and automatically if its data appears in unexpected places on the Internet, the Dark Web, or in competing products.

This solution operates on all types of digital assets, from code to images to text and documents, resulting in extremely complex datasets. In fact, Terbium’s database currently contains 340 billion data fingerprints, and is growing by tens of billions every day.

In order to process all of this data, Terbium needed a Hadoop platform that is highly efficient, extremely stable, and more reliable than the traditional Hadoop distributions. Terbium chose an enterprise-grade Hadoop distribution that is implemented in native code, rather than through a Java virtual machine, which makes it much more resource efficient. “We are only as good as the data we collect, and our ability to collect more data depends on this key piece of technology,” says co-founder Danny Rogers. I’m sure you have a good guess as to which vendor’s Hadoop distribution he was referring to.

The startup is piloting its technology with half a dozen customers at Fortune 500 companies in financial services, healthcare, manufacturing and technology. Early results from their pilots look promising. In a single day, Terbium Labs identified— in seconds—30,000 new stolen credit cards and 6,000 newly compromised email addresses for sale on the Dark Web.

By taking advantage of this new type of large-scale, cloud-based automation that’s paired with an enterprise-grade Hadoop distribution, their customers will be now be able to discover stolen corporate and customer data in mere seconds instead of weeks or months. Pretty cool.

This blog post was published August 27, 2015.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now