"I have a question about Oracle partitioning and a time
dimension. The design we have for the time dimension uses a
numeric surrogate key as a primary key in the time dimension,
and the corresponding foreign key in the fact table is also a
numeric key. However, we currently range partition the fact
table by another column in the table that is an actual date
column. We don't get any partition pruning when queries
involve the time dimension, which I believe is because queries
against the time dimension are joining the fact table on a
numeric column, while that table is actually partitioned by
Should we remove the partitioning on the date column and
instead partition on the numeric key column? The purest in me
can't quite come to grips with the fact that we would be range
partitioning on a numeric "smart" key when what we really want
is the range partition by date."
This is an interesting one. If you've read Kimball's
Data Warehousing Lifecycle Toolkit, you'll know that the
recommendation is to use surrogate keys for all dimension key
lookups, including the date dimension. However, as you say, the
obvious way to partition your fact table is by date, and yet
there now is no "date" column to range partition on, just a date
dimension surrogate key.
What I do in these circumstances is either one of two options:
First is to forget the surrogate key rule for the date
dimension, and just use a regular date datetype for the date
dimension primary key. Dates are stored in an optimised format
anyway (so don't take up as much space as the date spelt out in
individual characters) and by having real dates in the fact
table, it makes range partitioning on load date much more
intuitive. Also, when Kimball came up with the surrogate key
rules, partitioning wasn't in general use and no doubt now he'd
agree with this common sense exception to the rule.
The second option, if we're going for the purist route, is to
use a surrogate key for the date dimension, and still range
partition it, but instead of using start and end dates you'll
just need to use the revelvant time dimension keys that
correspond to the start and end of your partition periods. Works
just as well, although not quite as intuitive to work with.
You'll still get partition pruning on date as the join will be
on the column that is used for the range partitioning.