Data Transformation Analysis
Oracle Tips by Burleson Consulting
The Data Warehouse Development Life Cycle
DATA TRANSFORMATION ANALYSIS
In the next chapter, we’ll also
discuss data warehouse design where physical techniques are used to
store unique data attributes within data warehouse tables. For
example, some grocery attributes are specific to a class of items,
such as the shelf life data attribute that applies to calf livers,
but not to paper towels. These are just some of the types of data
issues that must be addressed in a data warehouse design.
This leads us to the second task in
data transformation analysis--data dissection. Let’s start with an
example. Your end users state that they want to analyze sales by
date. Armed with this knowledge, you note that a date column is
necessary, and you don’t give the issue another thought. However,
what if your end users want to compare sales for the same months,
except in different years? What if an end user wants to compare
sales by quarter? In order to quickly service these queries, you’ll
want to decompose the date component into its component parts. With
some thought, you may be able to identify many other data items that
need to be decomposed for representation in your data warehouse.
Data dissection implies that the
data has a functional dependency upon the derived components. As you
may recall from basic data normalization theory, full functional
dependence is required for data models to participate in second
normal form (2NF) and third-normal form (3NF) relations. Functional
dependence implies that one “key” value will be the determinant for
another “non-key” value. For example, if we know the date, we also
know the day, month, quarter, and year. If we know a ZIP code, we
therefore could know the state, city, and metropolitan area.
While this analysis may seem
obvious, the decomposition of these types of column values can
greatly aid the query speed for Oracle databases. Let’s consider a
simple example using a date column. As you may know, a
date column datatype contains the century, year, day, month, and
time. However, getting these values out of a date datatype is
not always easy, especially when groupings such as the quarter of a
year are desired. This type of date transformation also has a nasty
side effect of invalidating the use of date indexes in Oracle,
causing painful full-table scans. However, if we break out the
date datatype into its component year, month, and
day values, we see that SQL queries can be greatly
simplified, as shown in Table 3.3.
This is an excerpt from "High Performance
Data Warehousing", copyright 1997.
If you like Oracle tuning, you may enjoy the book
Oracle Tuning: The Definitive Reference , with over 900 pages of BC's favorite tuning
tips & scripts.
You can buy it directly from the publisher and save 30%, and get
instant access to the code depot of Oracle tuning scripts.