6 min read
Editor's Note: This is the first blog post in a 5-part series that describes how modern enterprises are struggling with the handling of data, making it available to applications without creating new silos, and how MapR solves these challenges by introducing a new layer of abstraction called dataware.
Data is only powerful if it can be understood and used. Otherwise, it's like a book no one reads — information locked away from utility. This is becoming a big problem for enterprises. A recent Wall Street Journal article points out that many CFOs are now facing the problem of having more data about their business than they can actually process, analyze, and put to use, which creates risk. It's understandable. Most companies are struggling to know what data they have and then transform and deliver it to where it is needed. This is the challenge of data availability.
Data lakes do not solve this problem. While they allow companies to store vast amounts of data, they do nothing to help them operationalize it.
This is a difficult problem, and it has become increasingly clear that a new layer of abstraction is needed to make data available in the correct form with the right level of performance. Dataware is that layer of abstraction. Dataware allows data to be managed as a first-class enterprise resource decoupled from any other dependencies. It allows companies to take data from wherever it exists and put it in a platform that can deliver the data to workloads that are used for production. With advanced dataware in place, organizations can even bring all operational, historical, real-time data in files, tables, and streams together in one place that is always available.
Most companies have no problem handling data availability for a single application. But when a company expands and wants data available across a number of applications or within a platform, they inevitably face an integration mess that leads to performance problems.
Every additional application multiplies administrative complexity because multi-tenancy problems emerge when applications try to pull data at the same time. If a company isn't using dataware, down the road they will be duplicating data, which results in scalability challenges. There are two possible solutions to this problem of making data operational.
The first option is to create a multi-technology point platform approach that requires a lot of attention. The second option is to use dataware to seamlessly manage, integrate, and make data available across all applications.
In most cases, modern applications exist at the end of a data supply chain. The data used by applications comes from a variety of systems of record and other sources and is collected and distilled into a repository made ready for that app. In many ways, you can think of a modern data supply chain as having three layers: data that's landed, data that's made into models and reusable objects, and data that is purpose-built and formed into a typical analytics app.
The app needs integrated data from all these sources to work best. The challenge is determining who is going to do that work, where the work gets done, and how the apps benefit from it. In earlier days of app development, data was loaded and transformed for each app. Now, in the modern environment, companies would be overwhelmed by the amount of data and latency issues if they did not eliminate the special processing required for each application and instead provide an underlying abstraction layer that can be leveraged by multiple applications. This provides a more efficient common layer to focus security, protection, and governance activities while also eliminating delays, data duplication, and inefficient processes.
Dataware solves this by offering a way to capture all this data and support access by applications as they need it. Dataware can deliver data to not just one single app but to many — it can replicate that image of data through a global network and multiple data centers so that applications worldwide can have access to the same data repository. In addition, dataware can support multiple APIs to access the same data.
In other words, applications using standard file access such as NFS, big data applications using the Hadoop Distributed File System API, and new applications using an S3 API can all leverage the same underlying data repository. This eliminates the need to duplicate the data and coordinate updates across three different data stores. Dataware speeds and simplifies the process of making data available without having to make it a huge integration project.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.