OLAP, ROLAP, And MOLAP
Now that we understand that OLAP is a tool
that displays summarized data, plotting one dimension against
another, let's look at the vendor implementations of this
technology. As you saw in the previous section, pivot tables are an
excellent way to display multidimensional data, but OLAP involves
more than just the multidimensional display of information. OLAP
tools also must be able to extract and summarize requested data
according to the needs of an end user, and there are two approaches
for this data extraction that need to be discussed.
When multidimensional OLAP was first
introduced, data was extracted from the relational engine and loaded
into a proprietary architecture called a multidimensional database.
The data was displayed quickly by accessing the pre-summarized data.
This type of OLAP utilizes a multidimensional database, which has
become known as MOLAP, or multidimensional OLAP. The other approach
to data extraction uses a mapping facility and extracts the raw data
from an operational relational database at runtime, summarizing and
displaying the data. Because this approach does not require a
multidimensional database, it has become know as ROLAP, or
relational OLAP.
There are many different types of OLAP and
MDDB products on the market today. As shown in Table 5.5, OLAP and
MDDB have their own relative advantages and disadvantages, and they
are both fighting to achieve recognition for their strengths.
Trait ROLAP MDDB
Speed Slow Fast
Queries Flexible Fixed
Disk cost Low
High
Table 5.5 OLAP versus MDDB.
Speed Vs. Flexibility
To the end user, ROLAP and MOLAP are
transparent. The front ends for these tools are similar, and the
types of decision support activities are roughly the same. There
are, however, significant differences between the operational
details of ROLAP and MOLAP that are of primary concern to the data
warehouse designer. The biggest difference between ROLAP and MOLAP
involves the tradeoff between speed and flexibility (see Figure
5.16). MOLAP engines, by virtue of their pre-summarization and
loading, have the data ready to display and give the end user
incredibly fast response times. ROLAP engines, by virtue of their ad
hoc data extraction and summarization, give end users incredible
flexibility in their choices of queries. This is the very heart of
the difference between ROLAP and MOLAP--speed versus flexibility.
Figure 5.16 MOLAP speed versus ROLAP
flexibility.
There is also the issue of economics. For
non-relational shops, such as IMS installations, MOLAP can be far
less expensive than ROLAP solutions. Whereas a MOLAP database can be
purchased and configured for as little as $200,000[TZ98], ROLAP
solutions have a much higher expense in terms of human resources for
setup and configuration, as well as increased processing demands on
the computer hardware.
On the other hand, shops that already have a
relational database such as Oracle can quickly extract de-normalized
data from their operational databases for downloading into MS-Excel
pivot tables. This is by far the cheapest approach to OLAP because
there is no investment in either hardware or software. Table 5.6
gives a listing of the most popular ROLAP/MOLAP products.
Vendor Tool Description
Oracle Express Excel spreadsheet
extension, true OO
Oracle Oracle 7.3 STAR query hints,
parallel query, bitmap indexes
Microstrategy DSS Agent MDDB queries
against RDBMS
D&B Pilot Lightship OLAP with custom and
spreadsheet GUI
IBI Focus/Fusion MDDB engine
VMark uniVerse NT-based MDDB engine
Kenan Accumate ES MDDB with PC-OLAP GUI
Arbor OLAP Builder Extracts data from
DW for Essbase
Arbor Essbase MDDB engine with Excel
spreadsheet GUI
Think Systems FYI Planner PC-GUI, with
MDDB and OLAP server
Table 5.6 OLAP/MOLAP product information.
Over the past 10 years, there have been
significant advances in ROLAP and MOLAP, but there remains a chasm
between these technologies. Let's take a look at the most salient
differences between the technologies.
Multidimensional OLAP (MOLAP)
Multidimensional OLAP is generally thought
of as the traditional multidimensional database (MDDB), and many of
the early offerings advertised themselves as ?pure? multidimensional
databases. As we have discussed, a multidimensional database is a
database structure optimized for storing facts categorized along
many dimensions. The MDDBs are far more effective for storing OLAP
data than relational databases because they were designed
exclusively with this purpose in mind. The other major consideration
with multidimensional OLAP is the fact that all of the data is
loaded, summarized, and stored in the MDDB prior to making the
database available to end users. Because all the calculations have
already been performed, multidimensional OLAP offers astounding
response times. For these reasons, multidimensional OLAP is the best
choice for applications with the following characteristics:
* Impatient end users--MOLAP engines
offer end users fast and predictable response times for their
queries. In some cases, end users need to be able to quickly create
new queries based on the responses from previous queries without
losing their train-of-thought. This speed differential is getting
smaller as the speed of relational databases improves, but there
remains a dramatic difference between the retrieval of
pre-summarized data from an MDDB and the runtime extraction and
summarization from a relational back end. It is not uncommon for an
end user to report a system outage when the ROLAP tool takes several
hours to roll-up summaries from a relational database extract.
* Sophisticated data analysis--MOLAP
engines provide a more robust analysis environment than ROLAP tools.
MOLAP engines support budgeting and forecasting functions and tend
to have a much more advanced statistical toolkit than their ROLAP
cousins. ROLAP, on the other hand, has the ability to provide ad hoc
groupings while MOLAP cannot aggregate on the fly.
* Ease of use--MOLAP engines are very
easy for end users to configure and use to set up scenarios for
decision support systems. Because the data is pre-summarized and
stored in the multidimensional database, all an end user needs to do
is specify the dimensions and groupings within dimensions. ROLAP, on
the other hand, requires an end user with knowledge of the mapping
of the operational databases, and it is much more difficult to
configure.
Relational OLAP (ROLAP)
The advent of the multidimensional database
led to an effort by tools vendors to create a method where data
could be extracted from a relational database and presented to end
users as if it were from a multidimensional database (see Figure
5.17). There are several methods for accessing a relational database
and presenting aggregated data as if it were from a multidimensional
database. These alternatives include ROLAP middleware tools and
downloading pre-aggregated data to local pivot tables. Another
common approach is to insert a metadata server between the OLTP
relational database and the query tool.
Figure 5.17 Overview of a ROLAP system.
In order to be considered a ROLAP product, a
tool must extract runtime data from a relational database, present
summarized data in cross-tabular format, and possess a mechanism for
translating the relational design into a multidimensional format.
Examples of popular ROLAP products include:
* DSS Agent by Microstrategies
* Metacube by Stanford Technology Group
* Holos by Holistic Systems
* AXSYS Suite by Information Advantage
* Red Brick Warehouse by Red Brick
Systems
* Prodea Beacon by Platinum Technology
ROLAP systems provide extremely flexible
query engines by making any number of operational data stores
available to end users. These back-end databases are usually
relational databases, but ROLAP tools can also extract from a
variety of different relational database.
ROLAP tools require the definition of the
mapping between the OLAP model and the relational database, and
generate SQL to extract the required data from the operational
databases in a very similar fashion to the SQL generators described
in Chapter 4.
Because the entire enterprise can be made
available to a ROLAP tool, it is not surprising to acknowledge that
ROLAP is far more flexible than its MOLAP cousin. Any data, on any
platform or database, can be mapped into a multidimensional format;
the ROLAP engine will obediently extract and summarize the data
according to the extraction specifications. Because ROLAP is far
more robust in this sense, it is the OLAP tool of choice for data
warehouses that support the following features:
* Data changes frequently--In a data
warehouse where data is very dynamic and end users require
up-to-the-minute summarizations, ROLAP is the only choice. MOLAP
tools must extract and summarize data offline for loading into their
multidimensional databases. To make matters worse, most
multidimensional databases require recalculation of the entire
database when a new dimension is added, an aggregation scheme
changes, or data is added. These overhead factors make MOLAP
inappropriate for decision support systems with highly volatile data
sources. Examples of these types of applications include stock
market DSS and weather forecasting tools.
* Large data volumes--For very large
database warehouses in the terabyte range, the cost of supporting
MOLAP tools can be exorbitant. The pre-summarization of data can
require hundreds of gigabytes of disk storage, and many companies
cannot afford the millions of extra dollars required to provide
sub-second response times for OLAP queries. ROLAP tools allow
companies to leverage their existing investment in OLTP databases
without having to buy a multidimensional engine.
* Unpredictable types of
queries--Because ROLAP engines can allow virtually any operational
data source to be queried and summarized, ROLAP has a clear
advantage for the decision support application that cannot predefine
its query requirements. Of course, this flexibility comes at the
cost of ease-of-use, because the IS department must often get
involved to assist end users in creating the mappings to the
operational databases.
Today, many developers are using relational
databases to build their data warehouses and simulate multiple
dimensions, and specific design techniques are being used for this.
The push toward STAR schema design has been somewhat successful,
especially because designers do not have to buy multidimensional
databases or invest in expensive front end tools.
Several methods can be used to aggregate
data within OLAP servers. As you can see in Figure 5.18, this method
extracts data from the relational engine and summarizes the data for
display. Another popular method pre-aggregates the data and keeps
the summarized data ready for retrieval.
Figure 5.18 Aggregation and OLAP servers.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |