8 min read
If you're not familiar with the concept of data warehouse optimization (DWO), it's a strategy for identifying the "right" workloads for your data warehouse. In other words, it's making sure you're not allocating data warehouse resources on tasks that other technologies can handle more cost-effectively. It's partially about cost savings from expensive data warehouses, but it's also about unlocking new insights more efficiently with the modern data types you can collect. The latter point is particularly important because the rules have changed on which workloads are best in which platforms. The choice of technology is no longer a question about "high value interactive analytics" versus "cheap and deep storage" versus "long-running batch processing." You can get more "data warehouse capabilities" with some of the advanced modern technologies today (especially with MapR and Arcadia Data).
If you use a data warehouse but haven't explored DWO strategies, then you might be weighing down your data warehouse and limiting it from providing the most value for the things it does best. This isn't only an IT problem, as this can affect your entire business community if they don't get the self-service, agility, and responsiveness they want because your warehouse is inefficiently utilized.
Part of a DWO strategy should entail moving existing data and workloads to a MapR cluster. Oftentimes, the data sets that are ideal candidates for moving to MapR are the "less frequently used" data sets. These could entail older data, archived data, or data that is only used by a small user base. An important point is that these data sets can't be merely thrown away (due to historical analysis purposes or for regulatory compliance). And you don't want all of this data and the associated workloads taking up resources in your data warehouse and thus slowing it down for the users who use the data warehouses for suitable workloads. After all, not all data is equal in value, so why keep all data in an expensive platform? Instead of burdening your data warehouse with data that is not frequently accessed, use MapR to move them to low-cost tiers and free your data warehouse for interactive queries on the more recent data sets. MapR has introduced a few key features as part of the MapR 6.1 release to help IT administrators with exactly that. Policy-based data tiering allows customers to take advantage of the policy-driven automatic data placement across performance-optimized, capacity-optimized, and cost-optimized tiers, on-premises or in cloud, with Data Tiering. In addition, compatibility with the S3 API on top of the MapR Data Platform is the most cost-effective way for our customers to keep pace with the volatility and out of control state of data growth and management.
Another part of a DWO strategy is making sure you don't put ill-fitting data sets into your data warehouse in the first place. Internet of Things (IoT) data, log file data, and other time-series data are top-of-mind for data sets that should instead be stored and analyzed in MapR. These types of data grow at a rapid rate and can only be efficiently stored in a MapR cluster that can accommodate the large volume and fast growth, not to mention the unstructured formats.
Keep in mind that DWO is a good starting point with tangible ROI when it comes to your enterprise data architecture, and there are more opportunities for value. For example, you can collect large data sets from numerous sources to build a data lake on MapR, where you can correlate and enrich your data sets. (If you've heard that data lakes carry some baggage and have a history of issues, then hopefully some recent discussions and insights like in this webinar will provide relief.) You can run machine learning algorithms within MapR to create advanced models that help you get better insights. And you can also gain significant agility by first loading data into MapR and exploring the data without a lot of IT intervention. After you've created curated data sets, you can then continue with BI-style analytics on that data or upload it to a data warehouse for interactive querying. All of this without requiring multiple copies of the same data across the clusters.
One capability we alluded to above that has not been significantly exploited in DWO strategies is deploying a data lake that delivers BI-style analytics and production dashboards to a large business user community. This is where a "big data-ready" BI/analytics platform like Arcadia Data can help. When using traditional tools with MapR, you get limited user concurrency because of a lack of in-cluster query acceleration. Most tools in the market today accelerate queries by requiring the movement of data from MapR to a separate, dedicated cluster, which is clearly not a scalable process. The tools that can avoid data movement suffer from the fact that they require significant IT intervention for performance modeling. Arcadia Data is different because it provides the query acceleration services within MapR, and also defines the accelerators automatically, which means no time-consuming data modeling is required to make dashboards run fast.
Arcadia Data provides a full-featured BI platform that runs on MapR. It has a powerful, easy-to-use interface to create compelling visuals and dashboards – and also accelerates dashboards on MapR data to support thousands of users concurrently. In addition, the analytic lifecycle is simplified with capabilities such as the semantic layer in Arcadia Data that lets analysts capture "business meaning" on top of the underlying data tables. It allows analysts to share data sets with other analysts to quickly build different use cases on the data sets without coordinating with IT. And with the recently announced search-based BI, which allows end users to type natural language queries and get an internet-search-like experience on all your MapR data sets, more end users can participate in analyzing data in MapR.
Check out this demo video to see an example of what MapR and Arcadia Data can do for you when it comes to DWO and large-scale analytics. Data warehouses aren't going away any time soon, so as you continue to use them, be sure to use them the best way possible and implement a MapR/Arcadia Data solution to get value from all your data.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.