|
Mark Rittman
How Do I Combine Relational And Multidimensional Data In A Single OLAP Cube?
"Everytime I do a web search for
anything to do with OLAP, I always seem to be directed to your
web-site!
So I have a query for you, having
viewed your presentation on BI Beans. The Oracle9i Release 2 OLAP
option appears to allow us to store both relational and
multi-dimensional OLAP data within the Oracle database. Good news.
And typically we may want to store raw data in the relational
element, and aggregated data in the multi-dimensional. When a user
is using a BI Beans application or Discoverer, they may not want to
know about the subtlety of which element the data is stored in, but
just to be able to drill down through the data, maybe starting with
the aggregates, but having to move to the relational raw data when
drilling to the lowest level. Have you come across a way to handle
this in BI Beans and Discoverer?"
The ability to present a layer of abstraction
above either relational or multidimensional data held in the database
has always been presented as one of the major strengths of Oracle OLAP.
By describing OLAP data using general terms such as 'cubes',
'dimensions' and 'measures', query tools such as Oracle Discoverer or
BI Beans can refer to data in the OLAP Catalog without needing to know
in what manner that data is actually physically stored.
In this instance, what we're trying to
achieve is a situation where the base data is held at its lowest level
in Oracle relational tables, and with an analytic workspace then being
used to aggregate and store data at summarised levels. This is in
contrast to the traditional method of summarising data and storing it,
which is either carried out using materialized views or through
separately maintained and populated summary tables. A tool like
Discoverer Administrator can set up this sort of arrangement with a
click of a few buttons, but it doesn't use analytic workspaces, which
is what we're trying to achieve.
An Oracle OLAP cube, and its associated measures can, as far as I'm
aware, only be sourced at present from a single table or
(materialized) view in the database. This table can in fact be a
virtual table defined using the OLAP_TABLE function, with its data
provided from variables and dimensions in an analytic workspace, and
in most cases the base data in the cube will need to be summarised by
using the GROUP BY, or GROUP BY ... ROLLUP feature in SQL. If the base
data for the cube comes from data held in relational tables, the GROUP
BY or GROUP BY ... ROLLUP feature can be rewritten using Query Rewrite
to use a materialized view, or if the base data comes from an analytic
workspace, the summary data will most probably have been preaggregated
and also stored in the analytic workspace.
If, in our case, we want our base data to be held in relational
tables, and our summaries held in an analytic workspace, we'd first
need to expose the analytic workspace summary through using SQL views
via the OLAP_TABLE function. The problem is, that the query rewrite
mechanism won't be able to make the connection between the GROUP BY /
GROUP BY ... ROLLUP used to request the summary on the relational
data, and the SQL used to expose the analytic workspace data through
the OLAP_TABLE function. It'll never qualify for query rewrite as the
SQL defining the base data and the analytic workspace will never
match.
What this means then, is that whilst we can define a common view
over our relational data, and our analytic workspace data, using the
same definition of measures, cubes and dimensions, all we can ever do
is swap out the relational cube for the multidimensional cube when
providing a data source for the BI Beans report - we can't use one
part for the base view and one part for the summary.
The good news, however, is that this is set to change with Oracle
10g OLAP. According to a
recent paper by Bud Endress at Oracleworld, Oracle 10g will come
with a new feature called 'Query Equivalence'. According to Bud's
paper;
"In Oracle10g a new feature, query equivalence, allows query
rewrite to be used with views. With query equivalence, the DBA
indicates to the database what SQL could have been used to create
the view even if the view was created in some other way. For
example, if the application likes to emit SQL with SUM … GROUP BY
but the view was created with entirely different SQL, the DBA could
indicated that the view is equivalent to SUM … GROUP BY. This
feature of the database is extremely useful with the OLAP option
since SQL access is always through views. The DBA can create a view
over an analytic workspace with syntax such as:
SELECT TIME, PRODUCT, CUSTOMER,
SALES
FROM OLAP_TABLE …
And indicate to the database that the view is equivalent to:
SELECT TIME.TIME,
PRODUCT.PRODUCT, CUSTOMER.CUSTOMER, SUM(FACT.SALES) …
GROUP BY …
If the application issues a query that is consistent with the
equivalence of the view, such as the example below, the query will
be automatically rewritten to the view over the analytic workspace.
SELECT TIME.TIME,
PRODUCT.PRODUCT, CUSTOMER.CUSTOMER, SUM(FACT.SALES) …
GROUP BY …
This provides the DBA and application with benefits similar to
those of materialized views – simplified maintenance and improved
query performance."
It looks like this new 10g feature is exactly what is needed. By
using Query Equivalence, the base data for our cube can be stored in
relational tables, whilst the summary data is held in an analytic
workspace. Query Rewrite will work as required, and users do not need
to worry about whether data is stored relationally or
multidimensionally.
|
|
|