6 min read
Over the last 5 years of shipping product we’ve watched our customers get enormous value out of storing and processing big data. The use cases are far and wide, from performing predictive maintenance on oil rigs to building fraud and risk models on financial transactions. When we stepped back and looked at the commonalities that exist among these use cases one thing jumped out–nearly all “big” data is generated one event at a time. There are many examples of event-based data sources, from IT sources like web logs and application metrics to IoT (Internet of Things) sources like smart devices, biometrics, and sensors.
When data is generated one event at a time, companies can get even more value by collecting and processing it in real-time. That’s why we built MapR Event Store. With MapR Event Store, we’re building global, IoT-scale publish-subscribe event streaming directly into our platform–alongside our distributed file and object store (Distributed File and Object Store) and NoSQL database (MapR Database)–creating the industry’s most robust Data Platform.
Why is converging all of these services into a single platform important? Let’s look at two examples from of our customers, comScore and Liaison Technologies, who are particularly eager to build breakthrough applications using a mix of real-time and batch analytics, database, and streaming technologies:
Without a converged platform, companies are forced to deploy these types of applications on at least three data silos–a messaging cluster, a Hadoop cluster, and a NoSQL database cluster. Silos mean independent clusters that need to be provisioned, managed, and secured using different tools and methods, which means more servers and more overhead. Worse, silos require data to constantly be moved, introducing delays, duplication, and inconsistency between systems. With the MapR Platform, data movement and duplication is avoided because MapR Event Store data is available not only to stream-oriented tools, but also batch-oriented tools like MapReduce and Hive.
What is IoT-scale? IoT implies two things–globally distributed endpoints and enormous volumes of data. MapR Event Store effortlessly scales to billions of events per second due to its linear scalability and ability to handle over 1 million events per second per node in reliable mode. When endpoints are distributed globally, so must the application infrastructure to minimize communication delays. MapR Event Store can replicate event data between thousands of geographically-distributed clusters interconnected arbitrarily–in a tree, a ring, a star, or a mesh–with built-in loop prevention. Further, event metadata like message offsets and consumer cursors are carried alongside the data, allowing endpoints to move between clusters when appropriate. For example, this is critical in powering smart city initiatives, where cars need to consume a continuous stream of data from road sensors and other cars, switching between clusters as they move around to minimize latency.
How is this possible? Again, the answer is convergence. We’ve spent over six years solving the hard problems of distributed data systems. Initially, we focused on writing data reliably with synchronous replication between multiple nodes, distributing metadata between all nodes in the cluster so there isn’t a single point-of-failure, recovering and rebalancing after node failures, and replicating data between multiple clusters for disaster recovery. We built a foundation on which we could add new data services. Just two years after releasing version 1.0 of our platform, we added MapR Database, a NoSQL database, that leveraged these capabilities and soon added a new one, master/master real-time table replication. Now, two years later, we are adding a publish-subscribe interface to build the industry’s first global, IoT-scale big data event streaming service.
We can’t wait to see what you build with it.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.