 |

Hierarchical attribute design for an Oracle8i warehouse
Jan 18, 2001 -
Donald Burleson |
|
In an Oracle data warehouse, hierarchies are especially difficult
to maintain. Unlike finite attributes, hierarchical
classifications may assign many values when an item is added to
an Oracle data warehouse. The hierarchical classification
displayed in Figure A shows a hierarchical attribute
classification for a product.
| Figure A |
 |
| A sample classification hierarchy |
For example, when a transaction for the purchase of chicken wings
is recorded, attributes made for the transaction need to include
the sale of a chicken product, a meat product, and a food
product. So a mechanism needs to be devised to store the
hierarchy of attributes in a meaningful way and to insert the
proper values each time a row is added to the fact table.
In our example, we have the following two methods for
representing our hierarchy of product attributes:
- Assign an arbitrary level classification for each
level in the hierarchy. Because each level is
mutually exclusive (i.e., the product can't be both food
and nonfood), the following scheme would be the most
efficient use of storage in the fact table:
level_one char(1)
CONSTRAINT level_one_check
CHECK (level_one in ('F','N');
level_two char(1)
CONSTRAINT level_two_check
CHECK (level_two in ('A','V','P','T');
level_three char(1)
CONSTRAINT level_three_check
CHECK (level_three in ('U','B','C');
In our oversimplified example, level_one can be food or
nonfood; level_two can be animal, vegetable, paper, or
toy; and level_three can be 'U' for unclassified (in cases
where a level three classification does not apply), beef, or
chicken. The advantage of this representation of the hierarchy is
that it will add only 3 bytes to each fact row. The downside is
that queries are cryptic for end users. End users will have to
know the valid values for each classification level in their
queries. For example, to query for the total sales of nonfood
items by customers who earn more than $50,000 per year would
require that the user know the proper flag values, as shown
below:
SELECT sum(sale_amount) from FACT
WHERE
level_one = 'N'
AND
yearly_income_class > 3;
- Assign a descriptive name for each level in the
hierarchy. This method assigns descriptive values for
each product class, thereby making it easier for the end
users to query the fact table. For example:
food_flag char(1)
CONSTRAINT food_check
CHECK (food_flag in ('Y','N');
animal_or_vegetable_flag char(1)
CONSTRAINT animal_check
CHECK (animal_or_vegetable_flag in ('A','V','U');
type_of_meat_flag char(1)
CONSTRAINT meat_check
CHECK (type_of_meat_flag in ('F','C','P','B');
. . .
The benefit of this approach is that the queries can be made very
descriptive. For example:
SELECT *
from
FACT
WHERE
food_flag = 'N'
AND
yearly_income_class = 2;
The downside, of course, is that there will be many more flags in
the fact table, and because the values are mutually exclusive at
each level, most of the flags will contain NULL values.
There will also be far more indexes on the fact table, and they
will slow down the nightly batch update process. As data
warehouses grow into the terabyte range, even a few additional
bytes can have a substantial disk cost. In addition, the type of
front end will influence the decision about the type of flag. If
your application hides the SQL behind a front-end query tool,
this type of approach has no real advantage.
Again, these physical attribute representation issues go to the
very heart of the Oracle data warehouse, and intelligent up-front
planning will ensure a sound, robust system.
Rollups and summaries
Oracle8i provides a wealth of tools that assist in the
hierarchical classification of fact attributes, but the DBA must
make special plans to accommodate these hierarchies.
You can use Oracle data warehouse tools such as Oracle Express to
automatically display summary values and classification
hierarchy. Oracle8i also provides materialized views that
allow Oracle SQL to be automatically translated from the
lowest-level facts and popped into the aggregation hierarchies.
Let's take a look at how we can manually summarize information
into classification three.
A supermarket needs to know in advance that prune juice belongs
to the juice category, which in turn participates in the fruit
category, which in turn participates in the edible food groups
category. Once these facts are established, we can write Oracle
SQL that will read classification definitions and summarize the
information based upon the unique idea of the item within the
supermarket.
For example, if we assign a distinct internal category for each
SKU, a lookup table might look like this:
CREATE TABLE CATEGORIES
(
category_name char(80),
category_number number
)
Figure B shows how the table entries would appear.
| Figure B |
 |
Now that we have classified each supermarket item with a unique
category, the classification definition three might look
something like this:
CREATE TABLE category_cross_reference
(
has_category number,
is_a_category number
)
This table represents a recursive many-to-many relationship
between categories. This means that any single category may have
categories and at the same time participate as a smaller unit of
a larger category. For example, the juice category has
subcategories, bottled juices and frozen juices. At the same
time, the juice category participates in the larger fruit
category.
Figure C shows what the rows in this table would look
like.
| Figure C |
 |
From this table, we can re-create the classification hierarchy
for any items we require. For example, we can use the following
SQL to find all items within the Fruit category:
SELECT
Has_categories
FROM
Category_cross_reference a,
Category_cross_reference b
WHERE
a.has_categories = b.is_a_category
AND
a.is_a_category = 'Fruit';
This is an interesting SQL query because it joins the table
against itself. In this fashion, we can "walk" the
recursive relationships in two directions. We can find all
categories beneath the specified category, or we can display all
items above the specified category.
As we can see from this SQL, each classification is preprogrammed
with its relationships to other categories, such that the Oracle
SQL will be able to instantly determine where an individual item
belongs within the classification hierarchy.
Taking this table, we can write an Oracle SQL summary statement
that will automatically compute the sum of all sales within each
category. For example, the SQL below can be used to compute the
sales of all juice products within the supermarket. Furthermore,
if we know that people with certain buyer demographics are
interested in juice products, queries such as this can be used
for target marketing of these products.
CREATE TABLE
Summary_juice_sales
AS
SELECT
Sum(sales)
FROM
Fact
WHERE
Sales.category in
(SELECT
Has_categories
FROM
Category_cross_reference a,
Category_cross_reference b
WHERE
a.has_categories = b.is_a_category
AND
a.is_a_category = 'Juice'
)
;
This type of query is repeated for all relevant categories in the
database. In most Oracle8i warehouses, these rollup
summaries are scheduled to run overnight, so the summary tables
are available during the normal workday.
Using summary tables
Most Oracle8i data warehouses precreate these summary
tables to improve response times. The secret of Oracle8i
data warehousing is to provide the end users with the
"illusion" of instant response time. As fast as Oracle
is, a large data warehouse query involving millions of rows can
run for many hours.
However, we have now created dozens, if not hundreds, of summary
tables. How do we easily direct the end-user queries to use these
queries? Oracle8i provides two mechanisms for using
summary tables. The first is materialized views. In a
materialized view, the Oracle8i SQL parse-searches for any
references to sum functions and rewrites the query automatically
to use the summary table. The Oracle8i documentation
provides extensive information on this technique. Another popular
approach is to use the Oracle Express front end. Within Oracle
Express, the summary tables can be identified and automatically
used for the queries.
Conclusion
As Oracle data warehouses become more complex, the DBA is always
challenged to provide mechanisms to allow end users to have fast
access to summaries of millions of rows of detailed sales data.
Only with the preclassification and intelligent rollup of
information can an Oracle data warehouse provide instantaneous
access to the relevant data the end user requires for making
important business decisions.