 |
|
Oracle
normalization
tips
Oracle Database Tips by Donald Burleson |
Over-Normalization of Oracle Entities
Some Oracle databases were modeled according to
the rules of normalization that were intended to eliminate
redundancy. However, a pure third-normal form (3NF) database can
cause high overhead on the Oracle database and
over-normalization
is a big problem, especially in databases that were designed when
disk cost was $250,000 per gigabyte.
I once met Ted Codd at the Database World
Conference where we both were speaking, and I asked him how he
came-up with the word "Normalization". He said that Nixon was
normalizing relations with China at the time, and if Nixon could
normalize relations, then so could he!
Obviously, the rules of Oracle normalization are
required to understand your relationships and functional
dependencies, but BCNF is just a starting point, not a completed
data model. Legacy systems tended to be more highly normalized than
today's databases because of the high cost of disk in the 1980's and
1990's. But disk is cheap today, RAM-SAN is coming, and denormalization is a Godsend.
If we introduce redundancy to
reduce joins, we can retrieve high-volume queries with far less
runtime overhead. In many OLTP
systems there are usually a small set of queries that account for
90% of the overhead, and these queries are the starting point for a
change in Oracle normalization. I've seen databases where a single
query form (show me all items for an order) was 75% of system
traffic.
In addition,
Oracle offers several popular denormalization tools, some that
create non first-normal form structures (0NF):
-
Normalization & Object tables - Oracle
has nested tables and varray table columns whereby repeating
groups are stored within a row, violating 1NF.
-
Denormalization & Materialized Views -
Tables are pre-joined together, queries are re-written to access
the MV, and a method (Oracle snapshots) keeps the
denormalization in-sync with the normalized representation of
the data.
How and where do we introduce redundancy to
remove table joins? The answer depends on the "redundancy
boundary", a function of the size and volatility of the
redundant item. Let's look at a real example where an 6-way table
join was required to display basic information about people.

Is this high-level of normalization required
just to display a person? Optimizing the SQL is time-consuming
and the query would be required to do at least five logical I/O's.
When we de-normalize to introduce redundancy to improve
performance and simplify the data model, we must always remember
that we have to code to go to several tables to update the redundant
data item. This overhead can be huge if we have large data items
that change frequently. In this example system, data was to be
stored with the historical values.
Now, in all fairness, a fully normalized Oracle
design (3NF) design was perfect
in 1986 when disk was expensive. Today, over-normalization of Oracle
databases adds a
huge burden on a high-performance online transaction processing
(OLTP) database:
-
Complexity to the developers (lots of extra coding for n-way
table joins)
-
Run-time overhead (complex SQL pre-processing by the CBO)
-
Higher disk I/O (many data blocks must be visited to fetch
related data)
This example illustrates the huge problems
associated with fixing a "bad" schema and shows how it can be a huge
and expensive undertaking to denormalize a schema in Oracle to reduce
unnecessary table joins. Some Oracle professionals use
Materialized Views to de-normalize 3NF
structures, but this only works in cases where the data changes
infrequently.


This is an excerpt from the book PL/SQL: The Definitive Reference by Boobal Ganesan.
Relational Model and Normal Form In this section, the inadequacies of the existing non-inferential models and the concept of universal data sublanguage are discussed. The relational view of data provides the means of describing the data with its natural structure only without imposing any additional structure for the machine representation. This model also provides the overall idea of creating a superior data language which results in maximum independence between the programs and the machine representation. Further advantages of the relational view are that it forms a strong base for consistency, derivability and redundancy of the relations, whereas on the other hand, the graphical model spawned confusions on these terminologies. Finally, the relational view clearly evaluates the scope and the logical limitations of the present system and also the merits of competing the representations of data within a single system. Data Dependencies in the Present Systems The provision of data description tables in the modern system represents the major advancements in the data independently. These tables help in changing the characteristics of data representation preserved in the databases. There are three major kinds of data dependencies which are needed to be removed are ordering dependence, indexing dependence and access path dependence. Ordering Dependence There are multiple different ways of storing the elements of data in the databases. · Some data are not concerned about ordering. · Some data are concerned with a particular ordering. · Some data are concerned in several ordering. The graphical models normally permit application programs to assume that the order of representation of the data from the file is similar to the data stored order. These kind of systems which take advantage of the stored ordering are likely to fail if the ordering is replaced by a different one. These kind of ordering dependency can be avoided by solving significant implementation. Indexing Dependence In the context of formatted data, an index is usually thought to be a performance oriented component of a database and it tends to improve performance when an update is performed and at the same time, lose performance when insertions or deletions are performed. Different data systems take widely different approach towards indexing. Some data systems provide indexing of all attributes and other provides users with a choice of no indexing at all or indexing only on the primary keys. Application programs enjoying the advantage of these indexing chains must refer to them by their names. These programs tend to fail when the chains are later removed. Access Path Dependence Many of the data systems provide users with data of tree structured files or slightly more general network models of the data. If these trees or networks changes, the applications developed to work with these systems tend to be logically impaired. A Relational View of Data This section urges the users to interact with a relational model of the data consisting of a collection of time varying relationships than relations. The term relation is used in its accepted mathematical sense. Given sets A1 , A2 , . . . , A (not necessarily distinct), B is a relation on these n sets if it is a set of n-tuples, each of which has its first element from A1, its second element from A2, and so on.' We shall refer to a as the jet domain of B. As defined above, B is said to have degree n. Relations of degree 1 are often called unary, degree 2 binary, degree 3 ternary, and degree n n-ary. The totality of the data in a database may be considered as a collection of time varying relations. These relations are of varied degrees and as time progresses, each n-nary relations may result in the insertion of new n-tuples, deletion/ modification of any of its existing n-tuples. Normal Form A relation whose simple domains can be stored in a two dimensional column arrays and non-simple domains can be stored in a complicated data structure. The procedure for eliminating the non-simple domains is called as normalization. If the normalization as described above is considered to be applicable, the un-normalized relations must satisfy the below conditions. 1. The graph of interrelationships of the non-simple domains is a collection of time. 2. There should be no primary key with a non-simple domain component. Some Linguistic Aspects The acceptance of the relational model permits the creation of a universal sub language based on a relational calculus. Such a language will provide immense power to all other proposed data languages and it will be a strong choice for embedding with a varied host language. Expressible, Named and Stored Relations The named set is the total collection of the relations that a data language can identify it by a simple name. Examples of named sets are declarations and identifiers. The expressible set is a collection of all relations that can be identified by expressions in a community. Examples of expressible sets are < and =. Redundancy and Consistency Operations on Relations This section of the paper describes the manipulative part of the relational data model. The below operations are defined in this section. If a permutation is applied to the columns of an n-nary relation, the resulting relation is said to be a permutation of the given relation. If a certain number of columns are selected from a relation and then if the duplicated rows are removed from the resulting array, the final relational array is said to be a projection of the given relation. The circumstances under which two relations having some domain in common can be combined together to form a relationship to preserve all the information in the given relation is said to be a join. The two relations are composed only if there exists a join between them. If there are more than one join possible between the given two relations, it does not comply that there is a possibility of more than one composition between them. The relation A acting upon the relation B to generate a subset of B is through the operation restriction of B by A. Redundancy This section explains about the strong and weak redundancies. A collection of relations are is to be strongly redundant if it holds at least one relation that contains a projection which is derivable from the other projections of relations from the set. A set of relations is said to be weakly redundant if it has a relation that contains a projection which is not derivable from the other members in the collection but is always a projection of some join of other projections of relations in that collection. Consistency Considering a set of relations, the system should be provided with the information if there are any associated redundancies to this set, so that the set can enforce consistency. The set is said to be consistent only if it confirms to the provided redundancies.
|