The
cost-based optimizer (CBO) improves with each new
release of Oracle, and the most current enhancement with
Oracle9i is the consideration of external
influences (CPU cost and I/O cost) when formulating an
execution plan. As Oracle evolves into Oracle10g we may
see even more improvements in the ability of the CBO to
always get the optimal execution plan for a query, but
in the meantime, every Oracle developer must understand
these mechanisms to properly tune her SQL.
Rules for
Oracle indexing
To understand how Oracle chooses the execution plan for
a query, you need to first learn the rules Oracle uses
when it decides whether or not to use an index.
While important characteristics of column data within
tables are known to the CBO, the most important
characteristics are the clustering factor for the column
and the selectivity of column values. Oracle provides a
column called clustering_factor in the
dba_indexes view that provides information on how
the table rows are synchronized with the index. The
table rows are synchronized with the index when the
clustering factor is close to the number of data blocks
and the column value is not row-ordered when the
clustering_factor approaches the number of rows in
the table.
For queries that access
common rows with a table (e.g. get all items in order
123), unordered tables can experience huge I/O as the
index retrieves a separate data block for each row
requested.

If we group like rows
together (as measured by the clustering_factor in
dba_indexes) we can get all of the row with a single
block read because the rows are together. You can
use 10g hash cluster tables, single table clusters, or
manual row re-sequencing (CTAS with ORDER BY) to achieve
this goal:

To illustrate, consider this query that filters the
result set using a column value:
select
customer_name
from
customer
where
ustomer_state = ‘New Mexico’;
Here, the decision to use an index vs. a full-table scan
is at least partially determined by the percentage of
customers in New Mexico. An index scan is faster for
this query if the percentage of customers in New Mexico
is small and the values are clustered on the data
blocks.
Why, then, would a CBO choose to perform a full-table
scan when only a small number of rows are retrieved?
Perhaps it is because the CBO is considering the
clustering of column values within the table.
Four factors work together to help the CBO decide
whether to use an index or a full-table scan: the
selectivity of a column value, the db_block_size,
the avg_row_len, and the cardinality. An index
scan is usually faster if a data column has high
selectivity and a low clustering_factor (Figure
A).
| Figure A |
 |
| This column has small rows,
large blocks, and a low clustering factor.
In the real-world, many Oracle database use the
same index for the vast majority of queries.
If these queries always to an index range scan
(e.g. select all orders for a customer), them
row re-sequencing can greatly reduce Oracle
overhead:
|

Oracle provides several storage mechanisms to fetch a
customer row and all related orders with just a few row
touches:
- Sorted hash clusters - New in 10g, a
great way to sequence rows for super-fast SQL
- Multi-table hash cluster tables - This will
cluster the customer rows with the order rows, often
on a single data block.
- Periodic reorgs in primary index order -
You can use the dbms_redefinition utility to
periodically re-sequence rows into index order.
To maintain row order, the DBA will periodically
re-sequence table rows (or use a
single-table,
or multi-table cluster) in
cases where a majority of the SQL references a
column with a high clustering_factor, a large
db_block_size, and a small avg_row_len. This
removes the full-table scan, places all adjacent rows in
the same data block, and makes the query up to thirty
times faster.
On the other hand, as the clustering_factor nears
the number of rows in the table, the rows fall out of
sync with the index. This high clustering_factor,
where the value is close to the number of rows in the
table (num_rows), indicates that the rows are out
of sequence with the index and an additional I/O may be
required for index range scans.
Even when a column has high selectivity, a high
clustering_factor, and small avg_row_len,
there is still indication that column values are
randomly distributed in the table, and an additional I/O
will be required to obtain the rows. An index range scan
would cause a huge amount of unnecessary I/O as shown in
Figure B, thus making a full-table scan more
efficient.
| Figure B |
 |
| This column has large rows,
small blocks, and a high clustering factor. |
In sum, the CBOs decision to perform a full-table vs.
an index range scan is influenced by the
clustering_factor, db_block_size, and
avg_row_len. It is important to understand how the
CBO uses these statistics to determine the fastest way
to deliver the desired rows.
For more information on
table clustering and row-re-sequencing techniques (and
scripts), please see my latest book "Oracle
Tuning: The Definitive Reference". You can get
it direct from the publisher for 30%-off at this link:
http://www.rampant-books.com/book_2005_1_awr_proactive_tuning.htm