Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Oracle normalization tips

Oracle Database Tips by Donald Burleson

Over-Normalization of Oracle Entities

Some Oracle databases were modeled according to the rules of normalization that were intended to eliminate redundancy. However, a pure third-normal form (3NF) database can cause high overhead on the Oracle database and over-normalization is a big problem, especially in databases that were designed when disk cost was $250,000 per gigabyte.

I once met Ted Codd at the Database World Conference where we both were speaking, and I asked him how he came-up with the word "Normalization". He said that Nixon was normalizing relations with China at the time, and if Nixon could normalize relations, then so could he!

Obviously, the rules of Oracle normalization are required to understand your relationships and functional dependencies, but BCNF is just a starting point, not a completed data model. Legacy systems tended to be more highly normalized than today's databases because of the high cost of disk in the 1980's and 1990's. But disk is cheap today, RAM-SAN is coming, and denormalization is a Godsend.

If we introduce redundancy to reduce joins, we can retrieve high-volume queries with far less runtime overhead. In many OLTP systems there are usually a small set of queries that account for 90% of the overhead, and these queries are the starting point for a change in Oracle normalization. I've seen databases where a single query form (show me all items for an order) was 75% of system traffic.

In addition, Oracle offers several popular denormalization tools, some that create non first-normal form structures (0NF):

Normalization & Object tables - Oracle has nested tables and varray table columns whereby repeating groups are stored within a row, violating 1NF.
Denormalization & Materialized Views - Tables are pre-joined together, queries are re-written to access the MV, and a method (Oracle snapshots) keeps the denormalization in-sync with the normalized representation of the data.

How and where do we introduce redundancy to remove table joins? The answer depends on the "redundancy boundary", a function of the size and volatility of the redundant item. Let's look at a real example where an 6-way table join was required to display basic information about people.

Is this high-level of normalization required just to display a person? Optimizing the SQL is time-consuming and the query would be required to do at least five logical I/O's.

When we de-normalize to introduce redundancy to improve performance and simplify the data model, we must always remember that we have to code to go to several tables to update the redundant data item. This overhead can be huge if we have large data items that change frequently. In this example system, data was to be stored with the historical values.

Now, in all fairness, a fully normalized Oracle design (3NF) design was perfect in 1986 when disk was expensive. Today, over-normalization of Oracle databases adds a huge burden on a high-performance online transaction processing (OLTP) database:

Complexity to the developers (lots of extra coding for n-way table joins)
Run-time overhead (complex SQL pre-processing by the CBO)
Higher disk I/O (many data blocks must be visited to fetch related data)

This example illustrates the huge problems associated with fixing a "bad" schema and shows how it can be a huge and expensive undertaking to denormalize a schema in Oracle to reduce unnecessary table joins. Some Oracle professionals use Materialized Views to de-normalize 3NF structures, but this only works in cases where the data changes infrequently.

This is an excerpt from the book PL/SQL: The Definitive Reference by Boobal Ganesan.

Relational Model and Normal Form

In this section, the inadequacies of the existing non-inferential models and the concept of universal data sublanguage are discussed. The relational view of data provides the means of describing the data with its natural structure only without imposing any additional structure for the machine representation. This model also provides the overall idea of creating a superior data language which results in maximum independence between the programs and the machine representation.

Further advantages of the relational view are that it forms a strong base for consistency, derivability and redundancy of the relations, whereas on the other hand, the graphical model spawned confusions on these terminologies.

Finally, the relational view clearly evaluates the scope and the logical limitations of the present system and also the merits of competing the representations of data within a single system.

Data Dependencies in the Present Systems

The provision of data description tables in the modern system represents the major advancements in the data independently. These tables help in changing the characteristics of data representation preserved in the databases. There are three major kinds of data dependencies which are needed to be removed are ordering dependence, indexing dependence and access path dependence.

Ordering Dependence

There are multiple different ways of storing the elements of data in the databases.

· Some data are not concerned about ordering.

· Some data are concerned with a particular ordering.

· Some data are concerned in several ordering.

The graphical models normally permit application programs to assume that the order of representation of the data from the file is similar to the data stored order. These kind of systems which take advantage of the stored ordering are likely to fail if the ordering is replaced by a different one. These kind of ordering dependency can be avoided by solving significant implementation.

Indexing Dependence

In the context of formatted data, an index is usually thought to be a performance oriented component of a database and it tends to improve performance when an update is performed and at the same time, lose performance when insertions or deletions are performed.

Different data systems take widely different approach towards indexing. Some data systems provide indexing of all attributes and other provides users with a choice of no indexing at all or indexing only on the primary keys. Application programs enjoying the advantage of these indexing chains must refer to them by their names. These programs tend to fail when the chains are later removed.

Access Path Dependence

Many of the data systems provide users with data of tree structured files or slightly more general network models of the data. If these trees or networks changes, the applications developed to work with these systems tend to be logically impaired.

A Relational View of Data

This section urges the users to interact with a relational model of the data consisting of a collection of time varying relationships than relations.

The term relation is used in its accepted mathematical sense. Given sets A1 , A2 , . . . , A (not necessarily distinct), B is a relation on these n sets if it is a set of n-tuples, each of which has its first element from A1, its second element from A2, and so on.' We shall refer to a as the jet domain of B. As defined above, B is said to have degree n. Relations of degree 1 are often called unary, degree 2 binary, degree 3 ternary, and degree n n-ary.

The totality of the data in a database may be considered as a collection of time varying relations. These relations are of varied degrees and as time progresses, each n-nary relations may result in the insertion of new n-tuples, deletion/ modification of any of its existing n-tuples.

Normal Form

A relation whose simple domains can be stored in a two dimensional column arrays and non-simple domains can be stored in a complicated data structure. The procedure for eliminating the non-simple domains is called as normalization.

If the normalization as described above is considered to be applicable, the un-normalized relations must satisfy the below conditions.

1. The graph of interrelationships of the non-simple domains is a collection of time.

2. There should be no primary key with a non-simple domain component.

Some Linguistic Aspects

The acceptance of the relational model permits the creation of a universal sub language based on a relational calculus. Such a language will provide immense power to all other proposed data languages and it will be a strong choice for embedding with a varied host language.

Expressible, Named and Stored Relations

The named set is the total collection of the relations that a data language can identify it by a simple name. Examples of named sets are declarations and identifiers.

The expressible set is a collection of all relations that can be identified by expressions in a community. Examples of expressible sets are < and =.

Redundancy and Consistency

Operations on Relations

This section of the paper describes the manipulative part of the relational data model. The below operations are defined in this section.

If a permutation is applied to the columns of an n-nary relation, the resulting relation is said to be a permutation of the given relation.

If a certain number of columns are selected from a relation and then if the duplicated rows are removed from the resulting array, the final relational array is said to be a projection of the given relation.

The circumstances under which two relations having some domain in common can be combined together to form a relationship to preserve all the information in the given relation is said to be a join.

The two relations are composed only if there exists a join between them. If there are more than one join possible between the given two relations, it does not comply that there is a possibility of more than one composition between them.

The relation A acting upon the relation B to generate a subset of B is through the operation restriction of B by A.

Redundancy

This section explains about the strong and weak redundancies.

A collection of relations are is to be strongly redundant if it holds at least one relation that contains a projection which is derivable from the other projections of relations from the set.

A set of relations is said to be weakly redundant if it has a relation that contains a projection which is not derivable from the other members in the collection but is always a projection of some join of other projections of relations in that collection.

Consistency

Considering a set of relations, the system should be provided with the information if there are any associated redundancies to this set, so that the set can enforce consistency. The set is said to be consistent only if it confirms to the provided redundancies.

��