How Machine Learning Can Help Governments Build Better Solutions for Civilians

Contributed by

10 min read

This blog post is an excerpt from the MapR Guide for Big Data in Federal Agencies and the Public Sector. To read more, you can download it here.

I should start by saying that MapR recognizes the public sector as being split into 3 separate segments: defense, intelligence, and civil services. This blog, though, will focus on civil services and how government agencies can benefit from data analytics and machine learning technologies.


First, let’s compare the public and the private sectors, purely from a data standpoint, and try to draw some similarities between the two. Unlike the private sector, where the key to the existence of corporations is to generate profits, the public sector has their mandate in constantly responding to its constituents, providing services, and maintaining infrastructure. In doing so, it is fair to say that the latter often trails the former in adopting the latest technologies. That being said, the enormity of data that the public sector is exposed to is very similar to what the private sector experiences. So, one would think that the public sector should also benefit from a cutting-edge data platform that helps to acquire, store, process, and analyze data as it happens, just as the private corporations normally would look for. It is also important to note that the public sector, again very much like the private sector, is exposed to wide varieties of datasets – images, text, videos, voice, social media – that it has to store, process, analyze, and derive meaningful insights out of.

The picture below provides an overview of a few key federal agencies and some high-level directives given to these agencies.

Federal agency directives

Note that the directives each do not correspond to a single agency or a government institution, so the picture above is not meant to provide a one-to-one mapping; these are thought of more as higher-level outcomes that may require multiple agencies to work together, in many cases.

Now let us take some of the directives, think about some use-cases that most of us have witnessed in recent years as they relate to governments and federal agencies, and try to break them down into the following 4 categories:

Federal agency categories

Some Pressing Use Cases Already Gaining Traction

While there are many more use cases in the public sector, let us consider a few examples and quickly understand what some of the key requirements are to build a robust solution to potentially address those use cases.

Crime Prediction and Prevention

According to a UNODC (United Nations Office on Drugs and Crime) report, criminals laundered close to $1.6 trillion – or 2.7% of the global GDP – in 2009. The Financial Crimes Enforcement Network (FinCEN), a bureau of the U.S. Treasury Department, uses an analytics tool that can collect and analyze large numbers of bank transactions in order to combat domestic and international money laundering, terrorist financing, and other financial crimes. In addition, local agencies, such as police departments, can leverage advanced, real-time analytics to provide actionable intelligence that can be used to understand criminal behavior, identify crime/incident patterns, and uncover location-based threats. Electronic surveillance requires video feed analysis, real-time theft detection, and alerting, all of this while protecting citizens’ personal data privacy. The MapR Data Platform provides capabilities such as machine learning and anomaly detection that allow for identification of patterns that can reduce crimes.

Pharmaceutical Drug Evaluation

The McKinsey Global Institute estimates that applying big data strategies to better inform decision-making could generate up to $100 billion in value annually across the U.S. healthcare system by optimizing innovation, improving the efficiency of research and clinical trials, and building new tools for physicians, consumers, insurers, and regulators to meet the promise of more individualized approaches. In addition, researchers can use the MapR Data Platform to analyze a much larger patient population, decide what treatments are most effective, and identify patterns in side effects of drugs.

Traffic Optimization

Public sector agencies need to have the ability to analyze traffic flow data on different roads or in different parts of the city. Reducing traffic congestion requires understanding busy routes, toll plazas, and volume distribution of traffic tickets handed out as well as encouraging citizens to use public transport – all of this while managing the cost of additional infrastructure required. The MapR Data Platform helps in aggregating real-time traffic data, gathered from road sensors, GPS devices, and video cameras, and provides traffic managers with the ability to identify potential problems in a public bus network. Adjusting public transportation routes in real time can prevent these potential traffic problems in dense urban areas.

Connected Smart Cities

City, state, and other local government entities have been swift to improve constituent services and control costs. Specific use cases include automated responses to FAQs that previously required human intervention; early identification of infectious diseases to better arrest broader outbreaks; predicting criminal activity that triggers optimized police patrol presence; analyzing citizens’ feedback on city, state, and federal budgets as well as on ballots as they come up for voting; and anticipation of water, electric grid, and gas infrastructure failures while keeping support staff on high alert for failures.

Identify Fraudulent Behavior

Fraud is expensive and wasteful. Using big data analytics, the U.S. Department of Health and Human Services (HHS) is using predictive analytics techniques to spot anomalies in various entitlement programs. Meanwhile, the IRS is constantly looking to combat tax fraud by deploying big data analytics to comb through structured and unstructured data, identify suspicious behavior, and actually help to find fraudulent tax filings.

Disaster Relief

The government’s various disaster relief agencies are using big data analytics and visualization solutions to speed relief to victims to rebuild homes and businesses. These tools permit a faster, more efficient analysis of loan applications and performance data to get funds distributed more quickly while minimizing fraud with improved, real-time scrutiny of loan applications.

Key Technology Underpinnings Required to Get the Implementation Right

It can be argued that every use case stated above is complex, given the volume and varieties of data that would need to be dealt with. If I had to break down the solutions, though, it would probably result in the following key essential technologies needed to build those solutions:

  • Each of these require storage, processing, and analysis of several varieties of data – voice, video, text, and image.
  • A data platform that maintains a streaming system of record, equipped with data mirroring capability (in other words, replication) for high availability and disaster recovery, with the ability to go back to a specific data snapshot to allow applications to recover from user errors and data corruptions.
  • Flexibility of choosing toolkits of the practitioners’ choice to ingest data, create a data pipeline, provide limited volumes of training datasets for machine learning (ML), develop ML models on historical data, and deploy them in-place on real-time data streams in order to produce actionable insights for a government official to act on – all of this without doubling or tripling the hardware required for such advanced implementations, thereby keeping the total cost of operation (TCO) under control.
  • Flexibility of deployment in the cloud, on-prem, or at the edge, at the same time allowing the data science and IT teams to iterate on the use cases seamlessly to achieve perfection in results while in production. For example, a proven edge computing device that can sit right next to, say, a surveillance video camera, and detect theft or robbery using a neural network deployed at the edge, allowing for much-needed real-time analysis and action for the local police.
  • Data security, data governance, data lineage, and multi-tenancy built into a platform to ensure citizens’ personal data privacy.

A combination of MapR Database, MapR XD, MapR Analytics and ML Engines, MapR Event Store, and MapR-Edge – all constituted within the MapR Data Platform – is helping our customers implement many of these use cases and achieve real results for their constituents.

MapR Architecture for Governments and Federal Agencies

Lastly, the diagram below describes the architecture of the MapR Platform, catered to use cases pertaining to federal agencies. This architecture shows how MapR helps civilian agencies achieve 3 important objectives: (i) allows the agencies to augment existing applications with newer data-centric capabilities, (ii) build interactive analytical applications to improve decision-making, and (iii) future-proof the IT investments by being able to constantly deploy new(er) intelligent applications – all of these simultaneously by using the best platform for AI and analytics.

MapR Data Platform

This blog post was published May 25, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now