| |
 |
|
Oracle
normalization & Performance
Oracle Tips by Burleson Consulting |
Over-Normalization of Oracle Entities
Some Oracle databases were modeled according to
the rules of normalization that were intended to eliminate
redundancy. However, a pure third-normal form (3NF) database can
cause high overhead on the Oracle database and
over-normalization
is a big problem, especially in databases that were designed when
disk cost $250,000 per gigabyte.
I once met Ted Codd at the Database World
Conference where we both were speaking, and I asked him how he
came-up with the word “Normalization”. He said that Nixon was
normalizing relations with China at the time, and if Nixon could
normalize relations, then so could he!
Obviously, the rules of normalization are
required to understand your relationships and functional
dependencies, but BCNF is just a starting point, not a completed
data model. Legacy systems tended to be more highly normalized than
today's databases because of the high cost of disk in the 1980's and
1990's. But disk is cheap today, RAM-SAN is coming, and
denormalization is a Godsend. If we introduce redundancy to
reduce joins, we can retrieve high-volume queries with far less
runtime overhead. In many OLTP
systems there are usually a small set of queries that account for
90% of the overhead, and these queries are the starting point for a
change in normalization. I've seen databases where a single
query form (show me all items for an order) was 75% of system
traffic.
In addition,
Oracle offers several popular denormalization tools, some that
create non first-normal form structures (0NF):
-
Object tables – Oracle
has nested tables and varray table columns whereby repeating
groups are stored within a row, violating 1NF.
-
Materialized Views -
Tables are pre-joined together, queries are re-written to access
the MV, and a method (Oracle snapshots) keeps the
denormalization in-sync with the normalized representation of
the data.
How and where do we introduce redundancy to
remove table joins? The answer depends on the “redundancy
boundary”, a function of the size and volatility of the
redundant item. Let’s look at a real example where an 6-way table
join was required to display basic information about people.
select
p.tai_person_id,
p.per_first_name,
p.per_last_name,
p.per_date_of_birth,
addr.tai_add_id,
addr.add_address_line1,
i.state_name,
pn.phone_number
from
tai_person p,
(select
pa.tai_person_id, pa.tai_add_id,
a.add_address_line1,
a.state_name, a.add_ctry_cd
from
tai_per_address pa,
tai_address a
where
a.expire_date=active()
and pa.expire_date=active()
and pa.tai_add_id = a.tai_add_id) addr,
tai_island_state_prov i,
tai_person_to_phone_number ppn,
tai_phone_number pn
where
p.tai_person_id = addr.tai_person_id(+)
and
addr.state_name = i.tai_state_cd(+)
and
addr.add_ctry_cd = i.tai_ctry_cd(+)
and
p.tai_person_id = ppn.tai_person_id(+)
and
ppn.tai_phone_number_id = pn.tai_phone_number_id(+)
and
pn.expire_date(+) = active()
and
ppn.expire_date(+) = active()
Is this high-level of normalization required
just to display a person? Optimizing the SQL is time-consuming
and the query would be required to do at least five logical I/O's.
When we plan to introduce redundancy to improve
performance and simplify the data model, we must always remember
that we have to code to go to several tables to update the redundant
data item. This overhead can be huge if we have large data items
that change frequently. In this example system, data was to be
stored with the historical values.
Now, in all fairness, a 3NF design was perfect
in 1986 when disk was expensive. Today, over-normalization adds a
huge burden on a high-performance online transaction processing
(OLTP) database:
-
Complexity to the developers (lots of extra coding for n-way
table joins)
-
Run-time overhead (complex SQL pre-processing by the CBO)
-
Higher disk I/O (many data blocks must be visited to fetch
related data)
This example illustrates the huge problems
associated with fixing a “bad” schema and shows how it can be a huge
and expensive undertaking to denormalize a schema to reduce
unnecessary table joins. Some Oracle professionals use
Materialized Views to de-normalize 3NF
structures, but this only works in cases where the data changes
infrequently.
A real-world
case of over-normalization
This database stored information about
establishments and their addresses, a legitimate many-to-many
relationship that would normally require three tables. In this case
there were five “lookup” tables that linked to two-column tables.
From a pure theoretical perspective we could model this many-to-many
relationship in 3NF with 8 tables; a ludicrous approach causing
unnecessary 8-way tables joins simply to display the basic
information about an establishment. Primary keys are
yellow and foreign keys are
blue:
tai_store_entity_type
|
Column Name |
Data Type |
|
TAI_STORE_TYPE_CD |
VARCHAR2(4) |
|
STORE_TYPE_DESC |
VARCHAR2(20) |
tai_store_entity
|
Column Name |
Data Type |
|
TAI_STORE_ID |
NUMBER |
|
TAI_STORE_TYPE_CD |
VARCHAR2(4) |
|
STORE_NAME |
VARCHAR2(100) |
|
LICENSE_NUMBER |
VARCHAR2(20) |
tai_entity_address
|
Column Name |
Data Type |
|
TAI_STORE_ID |
NUMBER |
|
TAI_ADR_ID |
NUMBER(38,0) |
tai_address
|
Column Name |
Data Type |
|
TAI_ADR_ID |
NUMBER(38,0) |
|
TAI_ADRT_CD |
VARCHAR2(4) |
|
ADR_POSTAL_CD |
VARCHAR2(9) |
|
ADR_DIST_CD |
VARCHAR2(2) |
|
ADR_ISLE_STATE_PROV_CD |
VARCHAR2(10) |
|
ADR_CTRY_CD |
VARCHAR2(3) |
|
ADR_PARISH_CD |
VARCHAR2(5) |
|
ADR_STREET_NUMBER |
VARCHAR2(10) |
|
ADR_STREET |
VARCHAR2(100) |
|
ADR_NEIGHBORHOOD |
VARCHAR2(100) |
|
ADR_PO_BOX |
VARCHAR2(100) |
tai_address_type
|
Column Name |
Data Type |
|
TAI_ADRT_CD |
VARCHAR2(4) |
|
ADRT_DESC |
VARCHAR2(30) |
tai_state
|
Column Name |
Data Type |
|
TAI_STATE_PROV_CD |
VARCHAR2(10) |
|
TAI_CTRY_CD |
VARCHAR2(3) |
|
STATE_NAME |
VARCHAR2(20) |
|
TAI_PARISH_CD |
VARCHAR2(5) |
tai_country
|
Column Name |
Data Type |
|
TAI_CTRY_CD |
VARCHAR2(3) |
|
CTRY_NAME |
VARCHAR2(30) |
tai_parish
|
Column Name |
Data Type |
|
TAI_PARISH_CD |
VARCHAR2(5) |
|
PARISH_DESC |
VARCHAR2(20) |
tai_district
|
Column Name |
Data Type |
|
TAI_DIST_CD |
VARCHAR2(2) |
|
DIST_NAME |
VARCHAR2(20) |
|
DIST_USAGE |
VARCHAR2(2) |
|
TAI_ISLE_STATE_PROV_CD |
VARCHAR2(10) |
|
TAI_CTRY_CD |
VARCHAR2(3) |
We also have “type” lookup tables, in this case
a two-column table to allow for changes to address types.
Unfortunately, using this approach would corrupt historical data,
because the system does not want history information to change.
tai_address_type
|
TAI_ADRT_CD |
ADRT_DESC |
|
HS |
Hospital |
|
CL |
Clinic |
|
RE |
Residence |
|
BU |
Business |
|
PO |
Postal |
Here is another example of an inappropriate
lookup table, doing nothing but creating system overhead. In this
case, a table join is required on the key HOTE in order to get the
full-name HOTEL:
tai_store_type
|
TAI_STORE_TYPE_CD |
STORE_TYPE_DESC |
|
HOSP |
Hospital |
|
CLIN |
Clinic |
|
HOTE |
Hotels |
|
SUPM |
Supermarket |
We can remove the complexity for the developers
be encapsulating the complexity with a view (named
store_details_view), but we still have the run-time overhead
of processing a complex 8-way table join. All of this complex SQL,
just to display a business address!
This is the syntax for a
store_details_view that joins all eight tables. This can be
simplified by removing the incorrect design of the lookup tables.
Here is a view to encapsulate this mess. To Oracle, parsing an
8-way table join is very time-consuming since Oracle must evaluate
8! (eight factorial) possible table join combinations:
create or replace view
store_details_view_bad
as
select
store_name,
responsible_person_id,
license_number,
store_type_desc,
adr_address_line1,
adr_address_line2,
adr_town,
adr_postal_cd,
adr_street_number,
adr_street,
adr_neighborhood,
adr_po_box,
adrt_desc,
isle_state_prov_name,
ctry_name,
parish_desc,
dist_name
from
tai_store_entity est,
tai_store_entity_type ent_type,
tai_entity_address ei,
tai_address addr,
tai_address_type addr_type,
tai_island_state_prov island,
tai_country cntry,
tai_parish par,
tai_district dist
where
ent_type.tai_store_type_cd = est.tai_store_type_cd
and
est.tai_store_id = ei.tai_store_id
and
ei.tai_adr_id = addr.tai_adr_id
and
addr_type.tai_adrt_cd = addr.tai_adrt_cd
and
island.tai_isle_state_prov_cd = addr.adr_isle_state_prov_cd
and
cntry.tai_ctry_cd = addr.adr_ctry_cd
and
par.tai_parish_cd = addr.adr_parish_cd
and
dist.tai_dist_cd = addr.adr_dist_cd
;
Towards a More Appropriate Model
We must model the many-to-many between people
and previous addresses and between establishments and address
types. The unnecessary joins include the five lookup tables and the
surrogate foreign keys must be removed and the values inserted into
the main tables. To minimize code re-writing, the lookup table
remain as-is with an unused surrogate key.
This will reduce the 8-way join into a
manageable (and logically correct) 3-way join to cross the
many-to-many relationship. This structure removes the five of the
lookup tables without breaking any existing system code, all by
adding the descriptive column to the table.
-
Alter table xxx add descriptive_column
-
Update xxx set descriptive_column (select description from
lookup_table)
-
Alter table xxx drop column surrogate_lookup_key
|
|
Need an Oracle Health Check?
- Do you have
bad performance after an upgrade?
- Need to
certify that your database follows best practices?
BC Oracle performance gurus can quickly
certify every aspect of your
Oracle database and provide a complete verification that your database
is fully optimized. |

|
|