Using the Oracle Database for an Analytic Warehouse
Dan and Tim Vlamis, Vlamis Software Solutions, Inc.
November 10, 2016
Data Warehouses have evolved since their heydays in the 1980s. Many people have given up on the dream of capturing and codifying into one place all the internal data for an organization and consider the task to be too difficult. In the quest to capture everything, they have focused almost exclusively on how to get data into a data warehouse through ETL processes. Instead, we believe the value lies not in storing data, but rather in organizing it, using it to calculate new measures, and facilitating its presentation to users. By focusing on results and uses, you can achieve what organizations truly care about: fast, deep, meaningful analyses.
Comparison between Data Warehouse and Analytic Warehouse:
A traditional data warehouse focuses on storing as much business data as possible. We propose identifying the data deemed most useful for affecting business decisions and focusing on analyzing data. We call a purpose-built warehouse of data for analysis an analytic warehouse. Most processing in a traditional data warehouse is done externally in ETL (Extract Load Transform). Most processing in an analytic warehouse is done internally by processes that:
define relationships and structures to organize data elements
create new measures, fields, and structures through analytical processing
Additional differences between traditional data warehouses and analytic warehouse sometimes include the following:
In traditional data warehouses, one of the most important metadata facts about a particular data element is its source or provenance. The data element’s importance was directly related to how recently it was stored and sometimes, to how much it was replicated. In analytic warehouses, the most important information about a particular data element is its use, either in a subsequent analytic process or a call as a raw data element. The more times an element is used in a calculation or is called for presentation to a user, the more important it is.
In the case when traditional data warehouses are significantly larger than their sources, it’s almost always because they are replicating data and storing it multiple times. Analytic warehouses can also be larger than their sources, but when they are, it’s almost always because they are creating new data elements and structures through descriptive, diagnostic, predictive, and prescriptive analytic workflows and machine learning techniques.
Traditional data warehouses were defined by the inclusion of data. Analytic warehouses often make calls on external data stores for analytic workflows. There is often no need to store the data in the warehouse; just use it to produce new data elements and structures. This emphasis on data federation and analytic processing is a key difference between traditional data warehouses and analytic warehouses.
In short, traditional data warehouses are about getting all available data in for storage; analytic warehouses are about getting valuable data out for presentation.
Description of Analytic Warehouse:
Analytic warehouses are characterized by the following:
Organization around logical structures designed for analysis
A distinction between the processing/query engine (which can reach out to many sources) and the storage layer (which contains the most valuable and frequently accessed data)
Lots of derived measures, comparative values, and the generation of new data elements and structures
Emphasis on relationships, hierarchies, and structures (both discovered and assigned) within and between data elements
Emphasis on the fast processing and delivery of queries
Ability to federate data and execute queries and analytic processes in external data storage systems
Ability to perform complex statistical, graphical, and high mathematical processes in parallel
Analytic warehouses often contain dimensional structures that organize data into hierarchical taxonomies and include aggregations at different levels. Facilitating the fast presentation of data at different levels of aggregation is a primary purpose of analytic warehouses. While counting and aggregation are important internal processes, newer machine learning, statistical analytics, and data mining functions are now critical to delivering value to business users.
By focusing on a subset of an organization’s data, analytic warehouses can be surprisingly easier to build than full data warehouses. Compared with a full data warehouse, 80% of the value may be obtained with only 20% of the data. If data elements are not often used for analysis purposes, it’s better to leave them in their sources systems and use them through an external process on the rare occasion they are needed.
Use of Analytic Warehouse:
Whereas a traditional data warehouse supports historic reporting, an analytic warehouse focuses on predictive and prescriptive analytics. Analytic warehouses may be used for historic reporting also, but they are designed to predict the future. Often the emphasis is on additional calculated measures, forecasts, or predictions. Depending on the technology used, this may imply that measures in an analytic warehouse may not have the same precision as data in a data warehouse. For example, a data warehouse may report that a customer has bought $23,753.03 worth of gadgets in the past year, but in an analytic warehouse, we may compute next year’s gadget sales are between $24,000 and $28,000 with a 95% probability associated with that assertion. Since this is not truly known, we may end up calculating this value on the fly instead of storing the actual value. Even though the value is an estimate, the fact that the value is directionally higher than last year may be very valuable.
An important use for analytic warehouse is for comparison purposes. Historic reporting often emphasizes precision and absolute values in isolation. Analytic warehouses emphasize comparisons within a context. Sometimes the context is historical (e.g. sales are trending up) and sometimes the context is in the future (e.g. sales for California are likely to exceed Texas in two years).
Oracle Database and Options for the Analytic Warehouse
An Oracle database is an excellent foundation for an analytic warehouse. In addition to the traditional relational data store, modern Oracle databases have the ability to store spatial data, dimensions, cubes and can calculate analytics directly in the database, eliminating the need for moving data between servers. Specifically, the Advanced Analytics option of the Oracle Enterprise Edition database has a host of embedded analytic algorithms such as clustering, regression, attribute importance, and more. Moreover, Oracle R Enterprise brings the power of the language ‘R’ to the Oracle Database. With the Database In Memory option, tables can be placed in memory and special memory-optimized algorithms can aggregate data as it is being read for extremely fast aggregation on the fly. Analytic Views (new in release 12.2) enable OLAP-style calculations on fact tables, without having to store data in a multi-dimensional format. Oracle Database is capable of processing data in XML, JSON, and through Big Data SQL can interact with external data stores such as Hadoop, NoSQL, and integrate Apache Spark processes. This also allows Oracle Database’s advanced security features (e.g. VPD, redaction) to be used with these data sources. All of these capabilities make the Oracle Database the optimal vehicle for realizing the analytic warehouse.
Analytic Warehouses in the Oracle Cloud
The movement towards storing and processing data in the cloud makes analytic warehouses even more compelling. The High Performance Package of the Oracle Database Cloud Service contains all of the options necessary for an analytic warehouse. The Extreme Performance Package adds the Database In Memory option for additional performance. An Oracle Database in the cloud can reach out to other cloud data sources, as well as external databases, without having to replicate data. In addition, cloud databases enable solutions that can scale to any size without having to pre-determine storage and compute requirements for an analytic warehouse.