Oracle Data Warehouse Design
This chapter focuses on data warehouse
design techniques for creating high-performance Oracle warehouses.
While traditional online systems design involves the design of the
input screens, data structures, and output reports, data warehouse
design places the focus on designing flexibility into the Oracle
table structures and the creation of flexible query input software.
Without an effective database design, no amount of tuning will allow
the Oracle warehouse to achieve optimal performance. Hence, it is
critical to a database design that we properly design the Oracle
table structures and the data input screens.
Normalization And Modeling Theory
Let’s begin by briefly reviewing data
modeling theory from a normalization perspective. It is interesting
to note that Dr. Codd coined the term normalization in reference to
current events of the day. At the time Dr. Codd was developing his
mathematical rules for data redundancy, President Nixon was
normalizing relations with China. Because Nixon was normalizing
relations, Dr. Codd decided that he would also normalize relations
as he refined his rules. (I’m not making this up!)
For database systems, a systems developer
begins by taking raw, de-normalized relations from a systems
analysis. Then, the developer takes the relations to third normal
form and looks at the introduction of redundancy for improved
performance. Of course, data redundancy becomes even more important
for an Oracle warehouse developer than for a traditional OLTP
designer, so we will carefully explore the options of table
de-normalization in this chapter. In addition, we will also design a
method for storing the pre-calculated data summaries that were
defined in our systems analysis. Finally, as pointed out in the last
chapter, we cannot always predict all the possible combinations of
data attributes that will compose aggregate fact tables, so we must
design a method for allowing our end users to dynamically define
aggregation criteria and store the aggregate values into Oracle
tables.
This text does not attempt to fully explore
data normalization because dozens of texts are available for that
purpose. Instead, this text offers a brief discussion of the
normalization process as it applies to our example from Guttbaum’s
Grocery. The processes of normalization was originally
intended to be a method for decomposing data structures into their
smallest components. The process begins with the original data
structures which are called un-normalized relations, and progresses
through first normal for to third normal form. At this stage
the data structures are completely free of redundancy and are at
their most decomposed level. To fully appreciate the process,
let’s take a look at the successive process of normalization.
Unnormalized Form
Essentially, an unnormalized relation is a
relation that contains repeating values. An unnormalized relation
can also contain relations nested within other relations, as well as
all kinds of transitive dependencies. Sometimes unnormalized
relations are signified by 0NF, but an unnormalized relation is not
to be confused with a denormalized relation. The unnormalized
relation is any relation in its raw state, and they commonly contain
repeating vales, and other characteristics that are not found in
denormalized relations. The process of denormalization is a
very deliberate attempt to introduce controlled redundant items into
an already normalized form.
Today, only a handful of database management
systems support repeating values, including UniSQL and some object
databases. The relational database model requires that each column
within a table contains atomic values, and there is no facility for
indexing multiple occurrences of a data item within a table. The
idea of repeating groups was first made popular with the use of the
Cobol language with IS-AM files. For example, a Cobol working
storage definition could allow for repeating occurrences of items,
as follows:
03
order_form.
05 customer_name pic x(80).
05 customer_address pic x(80);
05 ordered_items occurs from 1 to 10 times depending on
number-ordered.
07 item_number
pic s9(8) comp3.
07 item_description pic x(80).
07 quantity_ordered pic 9(4).
In any case, relations with repeating groups
are not supported by Oracle, and they must be moved into new
relations. Here are some relations from Chapter 3 which have
repeating groups:
Transaction =
1
{ trans_type = | GC |, quantity_sold, product_description,
total_price },[TZ55]
999
| WT |
| N/A |
1
total_sale_amount = { total_price },
999
1
grocery_club_discount =
{ trans_type = | GC |, quantity_sold, product_description,
total_discount },
999
1
total_discount_amount = {total_discount },
999
In this case, you can see that a customer
transaction consists of many repeating groups, and each of these
groups will be moved from the transaction relation. Also, notice
that some of the repeating groups, such as total_discount_amount,
are derived from the sum of other values in the transaction
relation. In those cases, we must make a conscious decision whether
to redundantly store these summations or have Oracle compute them at
runtime.
First Normal Form
In essence, a relation is in first normal
form if it does not contain any repeating values. Here, we have
taken our relations with repeating values and moved them to separate
relations. When the new relations are created, we carry the primary
key of the original relation into the new relation.
Second Normal Form
The purpose of the second normal form (2NF)
test is to check for partial key dependencies. Partial key
dependencies are created when we break off an unnormalized relation
into first normal form by carrying the key, thereby creating a
concatenated key with several data items. The formal definition of
second normal form is as follows:
A relation is in second normal form if and
only if the relation is in first normal form and each no-key
attribute is fully functionally dependent on the entire concatenated
key.
However, I prefer the following definition:
A relation is in second normal form if each
attribute depends on the key, the whole key, and nothing but the
key, so help me Codd.
It should be apparent that the second normal
form test only applies to relations that have more than one key
field. A relation in first normal form that only has one key is
automatically in second normal form if each attribute is
functionally dependent on the key.
In the following example, we see partial
dependencies, where an attribute is functionally dependent on only
one key field. In this type of situation, the partial dependency is
moved to another relation.
Report_card = Student_num, course_num, course_name, grade
In this case, we see that grade depends on
both Student_num and course_num, because you must know both the
student and the course to know the grade. course__name, on the other
hand, is fully functionally dependent only on the course_num key and
does not require Student_num.
Third Normal Form
The third normal form (3NF) test refers to
transitive dependencies. A transitive dependency is a circumstance
where one non-key attribute is functionally dependent on another
no-key attribute. Whereas the 2NF test serves to check for
dependencies between key fields and attribute fields, the 3NF test
serves to check for dependencies between no-key attributes.
Entity/Relation Modeling
If we have followed the process for
normalization through third normal form, we will be able to derive
an entity/relation model that is essentially free of redundant
information (see Figure 4.1). The entity/relation model was first
introduced by Professor Emeritus Peter Chen from the University of
Louisiana, and it is sometimes called a Chen diagram. In the 15
years since the introduction of this model, many permutations have
been created, but the basic principles of entity/relations modeling
remain intact.
Figure 4.1 An entity/relation model
for Guttbaum’s Grocery.
But is the E/R model for Guttbaum’s Grocery
truly free from data redundancy? While the model is free of
redundant information, it is impossible to implement the model in a
relational database without introducing redundancy to support the
data relationships. For example, if Guttbaum’s model was implemented
using a pointer-based DBMS, such as IMS, pointers would be used to
establish relationships between entities. For relational databases,
data columns must be copied as foreign keys to establish data
relationships, thereby introducing redundancy.
Also See:
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |