7 min read
Organizations have struggled with critical performance and scalability shortcomings of conventional data integration for years, leading many to push heavy data integration workloads down to the data warehouse. As a result, core data integration experienced a shift from extract, transform, and load (ETL) to extract, load, and transform (ELT).
Although this worked in the short term, it has also created a whole new set of problems for the IT organization with the onset of Big Data. Nightly processing extends far beyond its window, causing resource contention and delaying functional reporting. Data retention periods are drastically cut, causing missed analytical opportunities and operational insights. Even worse, database costs are spiraling out of control merely to keep the lights on. The need to analyze more data from a more diverse set of sources in less time, while keeping costs reasonable, is straining existing data integration architectures.
Not only is ELT a Band-Aid solution that costs more and offers less, but it is also a hindrance to creating an effective Data Governance strategy. In a Big Data world, this is no small matter. Organizations’ success – or failure – will be determined by how well they choose technologies and solutions to address data integration, security, data lineage and auditing throughout the lifecycle of data.
Many of our customers are turning to Hadoop to relieve the tension between the evolving needs of the business and the growing costs of IT infrastructure. Hadoop is not only economically feasible, but also provides the required levels of performance and massive scalability. For these organizations, Hadoop is quickly showing its potential as the ideal data hub to store and archive all structured and unstructured data. It can then be processed directly on Hadoop and distributed to other pieces of the IT infrastructure. By effectively offloading data and ELT workloads from the data warehouse into Hadoop, organizations can significantly reduce nightly batch processing, retain data as long as they need, and free up significant data warehouse capacity.
Out of the box, Hadoop offers powerful utilities and massive horizontal scalability; but does not provide the set of functionality users need to deliver enterprise ETL capabilities. Offloading data and ELT workloads to Hadoop forces users to find the right tools to close functional gaps that exist between enterprise ETL and Hadoop. Where do you begin? How do you know which workloads to move? Do you have all the tools necessary to access and move your data and processing? How do you keep up with the ever-evolving Hadoop ecosystem? How can you optimize processing once it’s inside Hadoop? These challenges can get intimidating very quickly.
Now, I don’t think the data warehouse is going away anytime soon. The goal of offloading is to free up database resources to reduce costs, improve query response time, and use the premium database resources more wisely. To that end, our customers have addressed the challenges noted above and found success following a three-step approach.
Syncsort provides targeted solutions to address the challenges of offloading data and workloads from the data warehouse to Hadoop. Our DMX-h enterprise software is deployed with the MapR Enterprise Data Hub to:
Syncsort is a MapR certified technology partner with successful joint customers achieving success with Hadoop like comScore. To learn more about Syncsort and our solutions, visit us at www.syncsort.com/hadoop.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.