 |
|
Oracle data warehouse date transformation
Don Burleson
|
There is a performance problem is with the date
transformation overhead when dealing with a single date column. End
users are demanding the ability to constrain their OLAP dimensions
in an ad-hoc fashion, and the challenge of the warehouse DBA is to
provide super-fast response time while allowing the dimension
filters.
Constrain time:
start day: ____ start month: ____ start year: _____
end day: ____ end month: _____ end year: ______ |
For example, consider this simple warehouse
query:
"Show me the
difference in March sales between 2004 and 2005, by city, by region"
|
New York |
Atlanta |
Napa |
Boston |
|
North |
$123 |
$363 |
$44 |
$233 |
|
South |
$456 |
$342 |
$-200 |
$932 |
|
West |
$994 |
$-23 |
$77 |
$278 |
|
A simple question, but a formidable query
challenge. This type of query presents a special challenge to
the data warehouse because the end-user may wish to constrain basic
OLAP counts by start date and end date. There is a significant
run-time overhead of converting and testing each date in a
multi-million row table while maintaining sub-second response time!
 |
This range comparison query would
require complex date transformation at the SQL level, where
the transaction requires one date group to be compared with
another date range.
This range-based self-join has
notoriously poor performance. The query might look
something like this:
select
sum(sales)
from
transaction_table mar2004,
where
trans_date >= to_date('1-mar-2004')
and
trans_date < to_date('31-mar-2004')
group by trans_date
MINUS
select
sum(sales)
from
transaction_table mar2005,
where
trans_date >= to_date('1-mar-2005')
and
trans_date < to_date('31-mar-2005')
group by trans_date;
Comparing ranges of values within he same table (even
with partitioning) can generate very long-running queries.
|
There are several approaches to this time
constraint issue.
Approach one: Add redundant
day-month-year
One approach is to slice-out the day-month-year
into separate columns in the transaction table:
trans_year
trans_month trans_day trans_date . . . .
2005 04
25 2005-04-25
17:25:43
2005 04
25 2005-04-25
17:25:43
2005 04
25 2005-04-25
17:25:43
This redundancy greatly simplifies range
comparisons and performance. Because the redundant date
slice-off columns (trans_year, trans_month, trans_day) become
separate dimensions, it simplifies the OLAP model.
It also improves query performance (psuedocode below- may not be
syntactically accurate):
select
sum(mar2004.sales) - sum(mar2005.sales)
from
transaction_table mar2004,
transaction_table mar2005
where
mar2004.trans_year = 2004
and
mar2004.trans_month = 3
and
mar2004.trans_year = 2005
and
mar2004.trans_month = 3;
Approach two: Add a
date lookup table
Michael Armstrong-Smith, author of
the bestselling Oracle Press book
Oracle Discoverer Handbook notes his approach to the issue:
What works best is to create a routine that
preloads calendar dates, along with their corresponding fiscal
quarters, months and years into a single master date table. Then
inside Discoverer we would join the transactional dates to this
table.
The secret is to fully de-normalize the master
table such that every date would have corresponding month, quarter
and year data. Thus for any given date, in any table, we could
look up the corresponding month, quarter and year. We should also
add start and end dates.
In Discoverer I would then build fiscal
hierarchies thus allowing the organization to create reports that do
correlations by time period. From the transactional side, all we
need to do is truncate all transaction dates
inbound. Discoverer can do the rest.
Here is an example table definition (called calendar), along what would be showing for
today's
date:
Calender_Date Date 04/25/05 Fiscal_Month Varchar(6) APR-05 Month_Start_Date Date 04/01/05 Month_End_Date Date 04/30/05 Fiscal_Quarter Varchar(5) Q2-05 Quarter_Start_Date Date 04/01/05 Quarter_End_Date Date 06/30/05 Fiscal_Year Varchar(5) FY-05 Year_Start_Date Date 01/01/05 Year_End_Date Date 12/31/05
The primary key would be the Calendar_Date itself. There would be no
need
for a surrogate key because the date itself is unique. By
pre-calculating
all of these in advance all I need to do is read off the
corresponding
fiscal value that I am interested in.
Using
Discoverer for user-constrained OLAP values
Mark Rittman (a Discoverer expert) has yet another take on the issue
of user-defined clusters with Discoverer and suggests that the
Discoverer OLAP may not support
http://www.rittman.net/archives/001238.html
Take an example, where
I want to produce a crosstab that displays the value of product
sales, broken down by product and channel (I'm using the
Global Sample Schema that you can download from OTN). So far, no
problem - but what if I want to include the names of the product
marketing managers in the report, subdividing the products by these
product managers? . . .
I've been thinking about this a
lot recently and I think it comes down to two factors, and this
is disregarding for the moment the cost issue of implementing
the OLAP Option.
- The logical dimensional
model, though powerful, does not lend itself to including
dimension attributes in a report, and
- Discoverer for OLAP, as
it always generates a crosstab report, is not a valid
solution when you need to produce tabular reports.
Because of this, I believe
that the vast majority of Discoverer installations will require
both Discoverer for OLAP and Discoverer Plus to be implemented.
|