In Search of Data Agility: What It Actually Means and How to Attain Data Agility

Contributed by

10 min read

Editor’s Note: This is an excerpt from the book, “A Practical Guide to Microservices and Containers: Mastering the Cloud, Data, and Digital Transformation” – you can download the ebook here.

Agility as it relates to digital transformation is what businesses are in search of. There are three areas of agility–data agility, application agility, and infrastructure agility–for us to focus our efforts. The aim here is to separate agility from the many other buzzwords that flood the IT and business worlds and demonstrate the intimate link between agility, digital transformation, and enterprise success.

Digital transformation is a phrase frequently overused and misused throughout the organization today by managers of all stripes. Converged infrastructure is a key on-ramp to digital transformation. The most coveted result or ‘output’ of this newly forming infrastructure is agility, which also happens to be one of the more overused terms in both the IT and business suites. Stripped of all else, agility describes how quickly an enterprise can respond to new opportunities and new threats. Do you want your business to be able to be steered like a cruise ship or like a speed boat, which can turn on a dime?

In a recent major cloud study, respondents were asked why cloud solutions were used on a variety of workloads, including email, analytics, big data, application development, and several others. For virtually every workload, the top one or two reasons selected were ‘responding faster to changing business needs.’ In other words, organizations are seeking greater agility from cloud as well as other advanced technologies, including big data analytics and containers. Cost savings, which for several years was the top justification offered to the C-level for cloud investments, is fading in importance as those executives grasp the business value of agility.

Data Agility

Organizations have traditionally been hamstrung in their use of data by incompatible formats, rigid database limitations, and the inability to flexibly combine data from multiple sources. Users who needed a new report would submit requirements to the IT organization, which would place them in a queue, where they might sit for a month or more. Even worse is that users have had to know in advance precisely what data were needed. Ad hoc queries were only permitted within the confines of an extract database, which often contained incomplete and outdated information. Queries were limited to structured data.

Data agility encompasses several components:

  • Business users are freed from rigidity and given the freedom to combine data from multiple sources in an ad hoc manner without long cleansing or preparation times.
  • The path between inquiries and answers is shortened so that decisions can be made on current data.
  • Structured and unstructured data can be combined in meaningful ways without extensive transformation procedures.
  • Data can be combined from both operational and analytical (historic) sources to enable immediate comparisons and to highlight anomalies.
  • Data can be combined from both streaming and static data sources in real time.
  • Users can create their own integrations using visual programming tools without relying on time-consuming extract/transform/load procedures.
  • New data sources can be quickly integrated into existing analytical models.
  • Schemaless data is supported in flexible formats like JSON.
  • Support for combinations of complex structures, such as JSON documents, is provided with simple key-value constructs and tabular formats.
  • Block-level, file-level, and object data can be combined in the same model.
  • Rich visualization tools enable business users to create graphical representations of data that reveal trends and relationships that would be otherwise hidden.
  • Instead of specifying which data they need, users can access all available data for experimentation and discovery.
  • Users can create and share their own analytical models without disturbing production data.

In a nutshell, data agility is about removing barriers to data usage. The rigid structure of yesterday’s data warehouses made data a precious asset that could cost upwards of $10,000 per terabyte. With Hadoop, those costs fell by more than 90%, which removes many of the cost and technical barriers to enabling data agility.

Another example of data agility is given by a major European-based telecommunications giant, which similarly collects veritable mountains of data from its far-flung network operations. The mountains are not important in and of themselves. Rather, it is the mother lode of information locked within them, as forward-thinking organizations have come to realize. According to an IT manager there:

“Our applications allow mobile operators to proactively monitor and improve customer experience. We collect data from the mobile network, then process the collected data to come up with information about individual subscriber experience with mobile services like video usage, web browsing, file transfer, etc. We also create individual subscriber profiles from this data. All of the data is stored in our MapR Data Platform. We want to enable mobile operators to do interactive ad hoc analysis and build reports using BI tools. This is where Apache Drill comes into the picture. We’re using Drill to enable interactive ad hoc analysis using BI tools like Tableau and Spotfire on the data we produce. Currently, we’re building canned reports using Drill to show how the data we produce can be used to derive insights.”

The most formidable barriers to data agility at many organizations aren’t technical but rather cultural. Functional managers may jealously guard data within their groups trying to maintain their own little fiefdom, believing it to be a source of power, or the IT organization may see itself as data stewards and tightly limit access. Appropriate protections should always be applied to sensitive data, of course, but the difference between agility and rigidity often comes down to the organization’s willingness to trust its people to use data responsibly and strategically. As always, if you have any questions or comments, please put them in the comments section below.

Data Agility Benefits

Data Agility lets you get a handle on the broad range of data types and sources that deliver business value

Data Agility refers to the high-scale processing and streaming of converged data assets—files, tables, documents, and streams—for mission-critical applications that integrate operational workloads with analytics and deep learning to impact business directly. Supports dynamic data models and schemas that grow with fast-moving businesses.

Unified Files, Tables, and Streams

In modern data architectures, using the right tools for the job means leveraging specific technologies for the wide range of data types you have. A converged data platform that unifies data in files, tables, and streams lets you store and manage data for all of your business activities.

For example:

  • Files are popular for large data sets that are imported from other systems
  • Tables are ideal for storing data from operational business applications
  • Streams are ideal for reliably storing event-based data

Support for schemas that change constantly

Decades ago, when relational databases were first introduced, business data had fixed schemas and were rarely altered. Much has changed since that time, and we often find data being changed into different structures to optimize the many processing and analytical workloads. Your platform must necessarily handle schema-less data, such as in JSON format—from both a storage and querying standpoint—to help you get the most value out of all of your data.

Multi-model support in a DBMS

Business applications store data in a variety of ways, not only in fixed tabular formats. With the popular use of evolving, hierarchical, and nested data, you need a database management system that can handle these different data structures. Having support for complex structures with JSON documents, simple key-value constructs, or tabular formats is critical because you can then closely model your data to fit your business operations. This support allows you to focus on business logic, not data structures.

Streams with persistence

Event streaming data lets you granularly capture the changes in your environment. Not only does this ability allow you to respond more quickly in real time to your rapidly changing environment, but it also helps to ensure data consistency across different user groups and applications. A stream-first architecture uses event streaming data as the system of record to drive all outputs, which can be directly and reliably traced back to the events that caused the changes. A microservices architecture also benefits from streams as a means of lightweight communication between applications.

This blog post was published February 05, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now