Oracle OLAP 10gR2
Incremental Load Improvements: The Mystery Solved...
February 13, 2006
Right, I've been meaning to post
this to the blog for a couple of weeks now but I keep getting
sidetracked. If you followed the debate around my
Incremental Load Improvements in Oracle OLAP 10gR2 posting a
couple of months ago you'll be interested in a follow-up Scott
Powell did on the OTN OLAP Forum.
Basically neither of us could work out whether the
improvements were real or just some over-enthusiastic marketing
on Oracle's part, but in the end an ex-Oracle employee called
Dan Peltier put us out of our misery and
posted some details of how these improvements actually work,
and how you can use new features such as partitioning and global
composites to improve the performance of a cube aggregation. I
won't repeat the bulk of Dan's posting here, but there's a
particularly relevant bit at the end that explains how the
improvements were actually implemented (the quoted text at the
start is a question from me):
"> Interesting point by Dan (Vlamis?) about compressed
composites having a special ability to detect changed
> data, and deal with incremental loads faster than
non-compressed composites. Certainly compression
> itself isn't the answer, as Oracle's slides suggest that
incremental loads are faster in 10gR2 compared
> to 10gR1, and compression was in both, but maybe there's an
improvement in compression in R2 that
> gives the effect that Dan mentions when specifically dealing
with incremental loads. I'll have to check it
There is in fact exactly such an improvement.
If you look at the OLAP DML level, in 10.1 you weren't even
allowed to load data into a compressed variable unless you
cleared all the aggregates first. Thus, every time you wanted
to make a change, you had to clear all the aggregates, change
your detail data, and rebuild from scratch.
In 10.2 you're allowed to make changes to an aggregated,
compressed variable. If you load some new data and then run
the AGGREGATE command, the engine will actually try to do some
tricks to aggregate only the parts of the cube that were
affected by the changes you made since the last time you
aggregated. It's not incredibly good at this, but it can
handle some simple cases. If the engine decides that your
changes are too complicated, then it just nukes all the
aggregates and rebuilds from detail.
Obviously, the slide you saw was for one of the simple cases.
The most trivial case in which the engine can avoid doing a
full rebuild is if you haven't made any changes at all to the
variable _or_ to the hierarchies. Also, I believe that it can
avoid doing a full rebuild if you have added a small number of
dimension values to one dimension but have not changed the
parent of any existing dimension value in any of the cube's
hierarchies - in other words, if you've added a week's worth
AWM does aggregation on a partition-by-partition basis, so
there the question actually becomes, have you changed this
_partition_ in such a way as to necessitate a full rebuild?
The answer may be yes for some partitions and no for others.
Note that _none_ of this applies to regular, non-compressed
cubes. However, with uncompressed composites, you can, at the
OLAP DML level, keep track manually of which areas of the
variable need to be recomputed, and then use that information
to set dimension status before running AGGREGATE. Only the
areas that are in status get rebuilt. This technique is not
allowed for a compressed variable, because the compression
technique used generally makes it impossible to recalculate
some arbitrary subset of the data."