If you've an interest in the OLAP
Option and you've read some of my
recent
articles on the new features in Oracle 10g OLAP, you've
probably seen a feature called "compression" mentioned. In
this DBAZine article I described compression as "a novel
form of cube compression, which promises to both enhance query
performance (by retrieving fewer blocks of data for a given logical
amount of data) and drastically reduce batch loading and aggregation
times, saving disk space on the way. Compression in Oracle 10g OLAP is
more about improving performance and scalability than saving disk space
(although that’s a nice side effect). The net result of this is that, in
a given batch window, Oracle 10g OLAP can now load and aggregate more
data than before, and for a given amount of disk, can store more
information than before. This, plus big advances in scalability
internally around areas such as very big composites, makes Oracle 10g
OLAP potentially a very effective platform when building particularly
large cubes." So how does this compression feature actually work?
Good question. I actually
found out about the compression feature from talking to a couple of
people in the OLAP Product team, and at the time this was news to me as
I hadn't seen this feature too well signposted in either the recent
(2003) batch of Open World papers or in the online documentation. Indeed
I was
under the impression
that it was only coming with the 10.1.0.3 patch release, but I've now
located this feature in the online documentation
and it's all a bit clearer now.
It appears that what we're
talking about when referring to "compression" in 10g OLAP cubes is
actually something called "Compressed Composites". Composites should be
fairly familiar to anyone working with Express or Oracle OLAP and are
structures that you set up when you've got very sparse cubes. In cases
where there are few actual used combinations of dimensions for your
variable, compared to the potential set of valid combinations, setting
up composites, which only store within them the actual combinations of
dimensions used, together with indexes to the underlying dimensions,
reduces the amount of NA values stored in the variable and results in
more efficient data storage.
So what are compressed
composites? Probably the best description is to just quote
the OLAP DML Reference R1 (10.1) section on compressed composites:
"In some cases, when
you aggregate data in a variable dimensioned by a composite defined with
one or more hierarchical dimension, one parent node may have only one
descendant node — and so on all the way up to the top level. When a
variable has a good deal of this type of sparsity, use a compressed
composite as the dimension of the variable. Dimensioning this type of
variable with a compressed composite creates the smallest possible
variable, composite, and composite index—much smaller than if you
dimension a variable with a b-tree or hash composite.
This reduction in size does not occur at the detail level. Oracle
OLAP creates composite values for detail level the same way for all
composites. A composite contains one composite tuple for each set of
base dimension values that identifies non-NA detail data in the
variables that use it.
The reduction in size occurs for those sets of base dimension values
that identify non-NA data at higher levels of hierarchical dimensions.
Oracle OLAP populates these higher-level values differently depending on
whether a variable is dimensioned by a b-tree, hash, or compressed
composite:
For variables dimensioned by b-tree and hash composites, Oracle OLAP
creates composite tuples for non-NA data at higher levels the same way
that it does for non-NA data at the detail level. There is one composite
tuple (with its own physical position) for each set of base dimension
values that identifies non-NA data. The composite index contains all of
the index entries needed to relate the composite tuple to the base
dimension values.
For variables dimensioned by compressed composites, Oracle OLAP
reduces redundancy in the variable, composite, and composite index by
using the"intelligence" of the AGGREGATE command that populates the
variable. For sets of base dimension values that represent parent nodes,
Oracle OLAP creates a physical position in the composite only for those
tuples that represent a parent with more than one descendant. Oracle
OLAP then creates an index between this composite structure and the base
dimensions and uses this composite structure as the dimension of the
variable. Since the actual structure of a compressed composite is
smaller than that of a b-tree or hash composite, a variable dimensioned
by a compressed composite is also smaller than a variable dimensioned by
a b-tree or hash composite. Also, since the index for a compressed
composite only has nodes for parents with more than one descendant, the
index of a compressed composite has fewer levels and is smaller than the
index of a b-tree composite.
Although performance varies depending on the depth of the hierarchies
and the order of the dimensions in the composite, aggregating variables
defined with compressed composites is typically much faster than
aggregating variables defined with b-tree or hash composites."
You can define a composite as compressed using the syntax:
DEFINE name COMPOSITE <dims...> [AW workspace] COMPRESSED [SESSION]
Alternatively, with the forthcoming 10.1.0.4 patch release and the
accompanying new version of Analytic Workspace Manager, you can turn on
compression for a cube (which translates to one or more variables) using
the GUI.

This presumably tells AWM
to create any composites with the COMPRESSED option as detailed above.
One of the things that was
mentioned to me when I was first told about compression was that in this
initial version, the scope for using it was quite limited. This would
appear to be borne out by the reference in the above documentation that
states:
"...when you aggregate
data in a variable dimensioned by a composite defined with one or more
hierarchical dimension, one parent node may have only one descendant
node — and so on all the way up to the top level. When a variable has a
good deal of this type of sparsity, use a compressed composite as the
dimension of the variable. Dimensioning this type of variable with a
compressed composite creates the smallest possible variable, composite,
and composite index—much smaller than if you dimension a variable with a
b-tree or hash composite"
which would suggest that
compression in this release is only of value when you have the specific
situation where you're finding parent nodes with only one descendent
node. If anyone from the product team is reading, could you add any more
detail for this?
Now that this is a bit
clearer, the next step is to get hold of a migrated OES database that's
going to 10g, identify whether the specific conditions for compressed
composites are met, implement them and do some benchmarking. We've got a
couple of clients who are looking to implement this feature so I'll
report back with results in due course. Once again full details on
compressed composites can be found in the
OLAP DML Reference available on OTN.