Introducing MapR Edge: for Internet-of-Things Devices that Create a Ton of Data

Contributed by

6 min read

Any Internet-of-Things (IoT) environment can be a data management challenge because of the huge volumes of data that are created and the latencies inherent in having global distribution. The challenges of aggregating data from consumer-oriented devices, like wearable technologies and smart thermostats, are fairly well understood. For those types of devices, the volume of data is due to the large number of devices, and each individual device doesn’t necessarily create much data. However, there are a new set of challenges for IoT “devices” that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Vehicles, medical devices, and oil rigs are perfect examples of sources of data that need a much more powerful architecture than those needed by consumer-oriented devices.

Act Locally, Learn Globally

Most IoT applications benefit greatly from having data and compute services close to the “things.” It keeps round-trip latencies and time-to-action low for an interactive experience (“act locally”). More importantly, it allows for filtering and summarization of data so that it can more practically be delivered to a central location for large-scale, aggregated analytics and machine learning (“learn globally”) with insights and models delivered back to the remote sites to be operationalized. Often times these remote sites have space constraints, so you can’t simply deploy a full suite of data management and analytics solutions.

This is where MapR Edge comes in. MapR Edge is a small footprint edition of the MapR Data Platform that addresses the need to capture, process, and operationalize IoT data close to the source. It is designed to run on small computers such as mini PCs that are about the size of a small book. Despite the limited size, you still get a lot of computing power in these MapR Edge clusters, but also can take advantage of the power of your core MapR deployment as part of an overall IoT infrastructure. A MapR Edge cluster is between 3 to 5 nodes, requires a minimum of 16GB RAM on each node, and has restrictions on overall disk capacity.

Let’s look at some of the key features of MapR Edge:

  • Distributed data aggregation. With clusters close to the data, you can run applications and analytics close to the source to reduce the time-to-action. You also protect data, comply with data location regulations, and reliably deliver data back to a core MapR cluster. All of this is possible in a very small form factor by leveraging mini PCs.
  • Bandwidth-awareness. MapR Edge delivers data reliably over slow or occasionally-connected lines.
  • Global data plane. MapR provides a view of global data with a single namespace.
  • Converged analytics. All data services (MapR XD, Database, and Streams) and analytical capabilities (like Spark) of the MapR Data Platform are available in MapR Edge.
  • Unified security. You can protect your data at the edge and at the core cluster with the strong security controls in MapR. Transported data from the MapR Edge clusters and back are encrypted to avoid eavesdropping and tampering.
  • Standards-based. MapR Edge adheres to standards including POSIX and HDFS API for file access, ANSI SQL for querying, Kafka API for event streams, and HBase and OJAI API for NoSQL database.
  • Enterprise-grade reliability. MapR Edge runs as a 3-5 node cluster to ensure high availability at the remote sites.

Some examples on how customers are benefitting from MapR Edge include:

  1. In bandwidth-limited environments, the costly and time-consuming practice of physically traveling to the remote data sources to collect data is no longer necessary. Delays on insights at the data sources are also addressed. MapR Edge allows immediate analytics while also automating the data transport by sending summaries/samples of the data to a central analytics cluster.
  2. Patient data like MRI scans need to be stored at the hospitals, but they don’t want to set up a full computing cluster for analysis. With a MapR Edge cluster, they can efficiently and reliably send data to a high-powered, central cluster for analysis, and return the results within minutes.
  3. Sensor data in cars for testing analysis are typically extracted after long runs, and if any errors occur along the way, they will not be detected, which could typically negate the results of the entire test run. With a MapR Edge cluster in the car, errors could be detected immediately and alert engineers that fixes must be made prior to continuing the test run.

These are just a few examples of how MapR Edge can change the way IoT analytics are performed. If you have an environment where significant volumes of data are generated at remote sites, talk to us about how MapR Edge and the MapR Data Platform can help.

This blog post was published March 14, 2017.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now