Online Analytical Processing And
Oracle
Dr. E. F. (Ted) Codd coined the phrase
online analytical processing (OLAP) in a 1993 white paper called
Providing OLAP (Online Analytical Processing) to User-Analysts: An
IT Mandate. Soon after the publication of this paper, OLAP became
the latest buzzword in the database arena and every IS professional
struggled to understand OLAP and how it fit into the paradigm of
decision support systems (DSS) applications. In addition to defining
OLAP, Dr. Codd also went on to create 12 rules for OLAP which are
similar in form to Codd?s 12 rules for relational databases. Given
the recent popularity of OLAP, it is very easy to view OLAP as a
nascent technology. However, OLAP, as defined by Codd, is not a new
technology, and some products, such as the IRI Express OLAP engine,
have been available for more than 20 years.
As more companies embrace the concept of
creating a historical data repository for OLAP and DSS applications,
new client/server issues are emerging as developers struggle to
create Oracle-based client/server applications. This chapter reviews
OLAP with a focus on the various techniques used to interface OLAP
applications with Oracle databases.
A great deal of interest has surfaced in the
application of data warehousing and multidimensional databases for
advanced systems. Advanced systems, such as expert systems and
decision support systems have been used for decades to solve
semi-structured and even unstructured problems. Traditionally, these
types of systems combine inference engines and relational databases
in order to store the knowledge processing components, and they have
all done these tasks without the benefit of having a huge data
warehouse..
OLAP News
The explosive interest in OLAP has fueled
the creation of a popular new Internet newsgroup[MM89] dedicated to
OLAP issues. You can find the newsgroup at comp.database.olap.
OLAP Fundamentals
Essentially, an OLAP system is any system
that captures summarized information and allows the summaries to be
displayed as cross-tabulations between two variables. In the
following example, we explore OLAP using the Excel spreadsheet pivot
table feature.
The terms OLAP and multidimensional database
have become synonymous, thus adding to the confusion surrounding the
two terms. Essentially, a multidimensional database is a database
architecture that stores summarized information such that all
salient data items (called dimensions) are cross-referenced with
each other (see Figure 5.1). For example, a multidimensional
database might store sales totals cross-referenced by month, product
line, territory, and salesperson. The multidimensional database
could then display the sales totals according to the possible
combinations of cross-referenced tables shown in Table 5.1.
Fact Dimensions
Total Sales Month by product line
Total Sales Month by territory
Total Sales Month by salesperson
Total Sales Product line by territory
Total Sales Product line by salesperson
Table 5.1 Possible plots of facts and
dimensions.
Figure 5.1 A sample OLAP display.
So, if this is a multidimensional database,
then what is OLAP? OLAP is a presentation front end that allows end
users to choose the dimensions and facts that will be
cross-referenced. The data does not necessarily have to come
directly from a multidimensional database. The possible data sources
for an OLAP application include:
* Multidimensional
databases
* Relational
databases (via ROLAP tools)
* Spreadsheet data
(extracted from any database architecture)
To illustrate, let?s take a close look at a
simple OLAP application. Note that while the following example may
seem overly simplistic, many Oracle shops are using this very
technique to deliver OLAP applications to their end users. Excel
spreadsheets can hold very large datasets, and in many cases,
managers can get their data loaded entirely into their spreadsheet,
thereby alleviating the need to create a client/server Oracle
application. Instead of using a client/server application, the
summary data for a spreadsheet is refreshed from Oracle each day,
ensuring that the manager has current data for the OLAP application.
Using Excel Pivot Tables For OLAP
The Microsoft Excel spreadsheet product is
an excellent way to demonstrate the capabilities of an OLAP
application. Many companies extract de-normalized information from
their Oracle databases, FTP the data to a PC LAN, and load the file
into a very large Excel spreadsheet on a client PC (see Figure 5.2).
The manager can then use Excel?s Pivot table wizard to analyze the
data, just as if they were using a multidimensional database. Of
course, this approach requires a very powerful PC, but it is a very
fast, simple, and straightforward approach to simulating a
multidimensional database with an OLAP application.
Figure 5.2 Extraction of summarized
data for PC-based OLAP.
The Oracle data extraction is usually
invoked from a Unix crontab task, which invokes an SQL script to
extract the summarized data from Oracle, format the data, and pipe
it to a flat file. Using NFS, a mount on a PC LAN can be created so
that the extracted data is transferred directly to the PC disk. The
data can then be easily loaded into a spreadsheet, where the wizard
will allow the manager to perform complex cross-tabulations and
decision support queries.
Note: To get the maximum benefit from
this exercise, I recommend that you follow along on your PC. Simply
start Microsoft Excel, and load the file called pivot.xls from the
CD-ROM in the back of this book. This short exercise will only
take about ten minutes, and it has proven of be the best method for
understanding multidimensional databases.
Step 1: Load The Sample Spreadsheet
The first step of this exercise is to load
the sample spreadsheet named pivot.xls. You can find this
spreadsheet on the CD-ROM included with this book.
Please note that the input data that is
provided within pivot.xls is highly de-normalized and ready to use.
Each row represents a line item within an order, and there is a high
degree of duplicated data. Also, note that every field is a
potential key except for qty-sold and total-cost. Getting this file
onto a PC platform involves writing an SQL statement that will
extract the required columns while joining the salient Oracle tables
(see Figure 5.3). The SQL is generally executed by invoking Oracle?s
SQL*Plus from a cron task on Unix. The resulting flat file will then
be transferred to a PC LAN by using FTP, where it will be loaded
into the Excel spreadsheet. Figure 5.3 shows what the spreadsheet
should look like.
Figure 5.3 De-normalized Oracle data
after transfer into Excel spreadsheet.
Step 2: Invoke The Pivot Table Wizard
The second step in this exercise is to
invoke the pivot table wizard. To accomplish this, choose Data|Pivot
Table. The wizard should appear, and it will enable you to define
the fact and dimension columns for the pivot table. Perform the
following steps:
1. In the first wizard
screen, choose Data From Excel List, and press Next.
2. Verify the data range
(a1:l47), then press Next.
3. Finally, specify the
axis names by highlighting your choice, and then drag and drop them
onto the pivot table axis, as shown in Figure 5.4, then press
Finish.
Figure 5.4 Specifying the axis for a
pivot table.
Now, you should have a pivot table defined
and displayed on your screen which looks similar to the one shown
earlier in Figure 5.1.
Because the entire table is in addressable
memory, it is very simple to change the axis of the pivot table and
completely recompute the table. To accomplish this, simply press the
Pivot Table Wizard button, which appears immediately above the pivot
table. You can now redefine the dimensions of the pivot table and
create a new representation of your data, as shown in Figures 5.6
and 5.7.
Figure 5.6 Quantity with year-quarter
against region.
Figure 5.7 Quantity with region
against quarter.
Now, note that we can nest dimensions to
create a hierarchy of dimensions. This is a very powerful feature of
multidimensional databases and relates to our discussion of
attribute hierarchies presented in Chapter 4, Oracle Data Warehouse
Design. Figure 5.8 shows an example of nested dimensions, with the
state dimension nested within the region dimension. It is also
possible to quickly change a displayed fact. For example, in Figure
5.9, the fact is changed from total sales to total cost.
Figure 5.8 Quantity with region-state
against year.
Figure 5.9 Total cost with region
against salesperson.
Figures 5.10 and 5.11 show how you can
quickly rotate dimensions to display total sales against the
customer and item number dimensions. It should be obvious that there
is incredible power in this ability to change dimensions and quickly
recompute facts.
Figure 5.10 Total sales with month
against customer.
Figure 5.11 Total sales with month
against item number.
As you can see, a pivot table gives a
manager the ability to analyze information in any conceivable
combination.
Even data that does not appear directly can
be displayed. For example, let?s say that normally we only display
the sum of cost or the sum of sales, but today, we want to drag a
dimension into the fact area of the pivot table. Dragging a
non-numeric data attribute into the computation area of a pivot
changes the fact from a sum to a count. As you can see in Figure
5.12, the number of orders that have been placed within each city
are now counted and display by month. In this fashion,
and data attribute can be counted, and it is not always necessary to
include only numerical items in the body of the pivot table.
Figure 5.13 shows the resulting pivot table, showing the counts of
the number of orders within each region by month. We could
count the number of any data attributes including the number of
salespersons and customers. For example, we might want to
count the number of customers, plotting this count by region and by
month.
Figure 5.12 Counting within a pivot
table.
Figure 5.13 The resulting pivot table
of counts of orders.
Now that you understand how multidimensional
database front ends function for decision support applications,
let?s take a look at how data warehouses fit into multidimensional
database frameworks.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |