“We can query an entire day’s worth of files in minutes instead of hours...There are all kinds of things we can do, now that we have the data in a compact format. It’s a central part of our architecture.”
-- Chris Kiernan Chief Technology Officer at RiskIQ
RiskIQ’s external threat management platform gives customers a unified way to discover, detect, investigate, and remediate threats that occur outside a company’s firewall. Their security management technology inventories and monitors the entire web from the outside in and can identify events such as someone attempting to impersonate a company’s website, mobile apps, or executive social profiles.
RiskIQ scans, analyzes, and stores entire websites, mobile app stores, and social media outlets across the entire internet. As the volume of data they were collecting and sophistication of security threats grew, RiskIQ wanted to be able to provide their clients with faster and more comprehensive threat detection. The volume of data they were collecting was becoming so large they needed an entirely new system that could analyze the data in an economical way
RiskIQ has been using the MapR Converged Data Platform as a distributed storage system for several years. “The MapR file system architecture is something we appreciated from the start,” says Chris Kiernan, RiskIQ CTO and cofounder. “We knew we could leverage MapR’s strength for almost anything we wanted to do. With the node management and the way clustering is done in MapR, we always knew it was built in the right way if we needed to do analysis.” RiskIQ’s 100 web crawlers collect about 10-20 TBs data each day from across the internet. However this number is growing rapidly as RiskIQ adds depth and new datasets to their crawl data. To reduce the size of the data, they’ve developed a technique to create Parquet files from raw crawl data that are 10 times smaller, which they send via NFS to their warehouse for analysis. “We can query an entire day’s worth of files in minutes instead of hours,” says Kiernan. “There are all kinds of things we can do now that we have the data in a compact format. The MapR Converged Data Platform is a central part of our architecture.” The main technologies included in their solution include the MapR Converged Data Platform, Spark, Hive, Parquet, and Oozie.
RiskIQ has cost-effectively added extremely large datasets, developed new products and powerful new analytics for threat detection capabilities utilizing their existing, reliable MapR cluster.
“The fact that MapR makes sure that everything is compatible has worked really well. If we want to try a new technology, we can install it and it’s ready to use. It’s so much easier than trying to solve it on your own.”
-- Adam Hunt Chief Data Scientist at RiskIQ
MapR architecture increases efficiency and reduces costs. RiskIQ was able to keep costs down by building the new data analysis use case on top of their existing MapR cluster. “We continue to use the cluster as a production file system while at the same time we’ve built an entire warehouse using the same infrastructure for a very small price point,” says Kiernan. “We have been able to cut Capex and Opex in half. We would have had to pay twice as much to build a vanilla Hadoop cluster. If we had built this in Cloudera, we would have needed separate clusters for production and analytics. It wouldn’t be a dual-purpose system.” With MapR, they also don’t have to worry about the size of the files. “We’ll throw big or little files in there and it’s bulletproof. You can mount from any machine in our cluster,” says Kiernan. “And from the NFS perspective, the most important thing is ingestion. You have to get source data into cluster. We are able to leverage our source machine, transfer data, and push into NFS. That saves a huge amount of time.”
Comprehensive and reliable platform. The completeness and stability of the MapR Converged Data Platform are crucial for RiskIQ’s business. “The fact that MapR makes sure everything is compatible has worked really well. If we want to try a new technology, we can install it and it’s ready to use. It’s so much easier than trying to solve it on your own,” says Adam Hunt, RiskIQ’s Chief Data Scientist. The MapR Converged Data Platform is extremely reliable. “We’ve never had an issue with MapR. We love it. It’s rock solid. We don’t see performance degradation no matter what we do to the cluster, and upgrades are seamless,” says Hunt. They also feel assured knowing that the MapR support team is there to help if they have questions. “MapR support has been awesome. Anytime I call support, I get a knowledgeable person. For our upgrade, a support person stayed on the phone with me while I did the upgrade,” says Hunt.
New capabilities provide competitive advantage. RiskIQ has developed new product offerings that would have been impossible without the MapR Converged Data Platform. They are now able to push new types of data into their application, which helps them understand things about websites they may not have understood before. “We have built all new parts of our products based on this new analysis, so it has been absolutely instrumental to our host reputation service. We can now answer all of the ad hoc questions we could never answer before. We are able to provide even more advanced detection for our clients. It’s improving the way we run the business,” says Kiernan. “You literally could not have built this product without a system like this. Time to market for this new set of features built on analytics was significantly less because everything just worked with MapR,” says Kiernan. “In terms of agility, it’s a huge win. I honestly think we’re just starting to scratch the surface of what’s possible. It leaves our competition in the dust.”