I have heard it said that since
Oracle will reorder the WHERE clause statements to ensure that the
leading edge of an index will be used if possible, and, since the
new skip-scan index search method is available, it is not required
to properly order the columns in a concatenated index. In tests
against an Oracledatabase, I found this assertion to be false,
and it can result in poorly performing queries. This tip will show
the results from some basic tests to disprove this statement.
For
queries that access common rows with a table (e.g. get all items in
order 123), unordered tables can experience huge I/O as the index
retrieves a separate data block for each row requested.

If we
group like rows together (as measured by the clustering_factor in
dba_indexes) we can get all of the row with a single block read
because the rows are together. You can use 10g hash cluster
tables, single table clusters, or manual row re-sequencing (CTAS
with ORDER BY) to achieve this goal:

INDEX ORDER AND STRUCTURE
In a standard concatenated B-tree
index, the first level will be based on the initial column values
and subsequent levels on the following columns. If the index is
properly built, then the mapping from these levels into the source
table will be fairly linear; that is, the index clustering factor
will be close to the number of dirty blocks in the table.
Figure 1 shows a properly ordered index in relation to its
table.

Figure 1. A Well-Ordered Index
If, on the other hand, an improper
order is selected for the columns in the index, then the index will
not be a linear match to the source table, and the result will be a
large clustering factor that is closer to the number of rows in the
table. A large clustering factor requires more table reads to
process the same amount of data as a small clustering factor. A
poorly ordered index is shown in
Figure 2.

Figure 2. A Poorly Ordered Index
A clustering factor can be equated
to the number of table block reads that would be required to perform
a full table scan based on reads from the index. A large clustering
factor indicates that table blocks would be required to be accessed
more than once for a particular index scan operation.
The order in which a table's
columns are specified in a concatenated B-tree index can make a
significant difference in the indexes' ordering, hence in its
efficiency. The next section contains examples.
EXAMPLES USING VARIOUS COLUMN
ORDERS IN AN INDEX
Before we can create indexes, we
must have a test table that can be easily created and populated with
data. I use the DBA series of views to provide input to a create
table as select (CTAS). For this example, we will use the
DBA_OBJECTS view since it has an easy structure to use for indexing.
Example 1 shows the CTAS and various statistics for the table TEST
created as an image of the DBA_OBJECTS view.
SQL> create table test as select *
from dba_objects;
Table created.
SQL> select count(*) from test;
COUNT(*)
----------
31472
SQL> SELECT 'test',COUNT( DISTINCT( SUBSTR(
dbms_rowid.rowid_to_restricted(ROWID,1),1,8))) blocks
2 FROM test;
TEST
BLOCKS
---- ----------
test
434
Example 1. CTAS to Create
TEST Table
Notice the number of blocks (434)
and the number of rows (31472). These will be important once we have
our index clustering factors. Now that we have our table, we will
create several indexes using the same columns, but placing them in
different orders. The columns we will use are the object_type,
owner, and object_name. The possible orders for these columns are:
-
Object_type, owner, object_name
-
Object_type, object_name, owner
-
Object_name, owner, object_type
-
Object_name, object_type, owner
-
Owner, object_name, object_type
-
Owner, object_type, object_name
Now, we will select a couple of
these orders and create some example indexes. Example 2 shows the
create statements for these indexes.
SQL> create index test_ord1 on
test(object_type,owner,object_name);
SQL> analyze index test_ord1
compute statistics;
SQL> create index test_ord2 on
test(object_name,owner,object_type);
SQL> analyze index test_ord2
compute statistics;
SQL> create index test_ord3 on
test(owner,object_name,object_type);
SQL> create index test_ord4 on
test(owner,object_type,object_name);
SQL> analyze index test_ord4
compute statistics;
Example 2. Creation of
Various Indexes with Different Column Orders
Now we will pull the example indx
clustering factors all together in one place and compare them in
Example 3.
SQL> select
index_name,clustering_factor from user_indexes where index_name like
'TEST%';
INDEX_NAME
CLUSTERING_FACTOR
------------------------------
-----------------
TEST_ORD1
925
TEST_ORD2
25289
TEST_ORD3
451
TEST_ORD4
908
Example 3. The Clustering
Factors for the Indexes
As you can see, the clustering
factors range from a low of 451 for TEST_ORD3 to a high of 25289 for
TEST_ORD2. So, based on the clustering factor, TEST_ORD3 is the
best-ordered index and TEST_ORD2 is the worst-ordered index. Now we
will do some sample queries to see how the various indexes fare as
far as cost in the cost-based optimizer (CBO).
First, we must analyze the base
table (see Example 4).
SQL> analyze table test compute
statistics;
Table analyzed.
Example 4. Analysis of the
Base Table
Next, we will issue a SELECT that
will use one of the indexes, the one chosen by the CBO:
SQL> select count(*) from test
2 where
3 owner='DBAUTIL' and
object_type='TABLE';
COUNT(*)
----------
338
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=2 C
ard=65 Bytes=845)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 5.
As you can see in Example 5, the
CBO used the index with the higher clustering factor based on the
right-to-left read of the WHERE clause leading column rule. The
overall cost of the operation was 2. Now we will force the optimizer
to use a different index and see what the cost becomes. First, let's
use the lowest clustering factor index, TEST_ORD3. We will force the
query to use the index through use of the INDEX hint (see Example
6).
SQL> select /*+ index(test
test_ord3) */ count(*) from test
2 where
3 owner='DBAUTIL' and
object_type='TABLE';
COUNT(*)
----------
338
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=13)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD3' (NON-UNIQUE) (Cost=14
Card=65 Bytes=845)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1
rows processed
Example 6.
Note that the cost (14) was seven
times the cost of our first query (2). The use of the TEST_ORD3
index resulted in a higher cost because the query had to traverse
more of the index since the OWNER column was not the leading column.
Now we will examine the results form using the highest clustering
cost index, TEST_ORD2 (see Example 7).
SQL> select /*+ test(test_ord2) */
count(*) from test
2 where
3* owner='DBAUTIL' and
object_type='TABLE'
SQL> /
COUNT(*)
------------
338
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=13)
1 0
SORT (AGGREGATE)
2 1
INDEX (FULL SCAN) OF 'TEST_ORD2' (NON-UNIQUE) (Cost=221
Card=65 Bytes=845)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
221 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 7.
Wow! A whopping 110 times higher
cost (221) over the lowest cost (2) so far. Have you noticed a
quasi-correlation between the consistent gets statistics and the
cost factors? Let us look at TEST_ORD4 as shown in Example 8.
aultdb1>select /*+ index(test
test_ord4) */ count(*) from test
2 where
3 owner='DBAUTIL' and
object_type='TABLE';
COUNT(*)
----------
338
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=2 C
ard=65 Bytes=845)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 8.
TEST_ORD4 performs as well as the other two low cost indexes.
Now we will try the query with no index by using the FULL hint as
shown in Example 9.
SQL> select /*+ full(test) */
count(*) from test
2 where
3* owner='DBAUTIL' and
object_type='TABLE'
SQL> /
COUNT(*)
----------
338
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=13)
1 0
SORT (AGGREGATE)
2 1
TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=65 Bytes=845
)
Statistics
----------------------------------------------------------
0 recursive calls
36 db block gets
446 consistent gets
0 physical reads
0 redo size
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 9.
As you can see, the indexes are
making a difference as far as cost. However, if you improperly
arranged the index columns as in TEST_ORD2 and you had no other
column orders to compare to, from these statistics you would choose
to use a full table scan, increasing your cost by up to a factor of
30 based on the best performing index.
What about a situation that
requires and index scan? Let us change our query to use a LIKE
operation and see what the CBO does with it in Example 10.
SQL> select count(*) from test
2 where
3* owner='DBAUTIL' and
object_name like 'DBA%'
SQL> /
COUNT(*)
----------
6
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD2' (NON-UNIQUE) (Cost=2 C
ard=1 Bytes=29)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 10.
It is not what we expected, is it?
In this case, it looks like the CBO chose TEST_ORD2 even though it
has the highest clustering factor. Let's look at some of the other
indexes and see why. First, our lowest clustering factor index,
TEST_ORD3 (see Example 11).
SQL> select /*+ index(test
test_ord3) */ count(*) from test
2 where
3* owner='DBAUTIL' and
object_name like 'DBA%'
SQL> /
COUNT(*)
----------
6
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD3' (NON-UNIQUE) (Cost=2 C
ard=1 Bytes=29)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 11.
As you can see, this index also
produced a cost of 2, so the optimizer probably chose the TEST_ORD2
index based on column order. Now we will look at our previous
queries' high performer, TEST_ORD1 shown in Example 12.
SQL> select /*+ index(test
test_ord1) */ count(*) from test
2 where
3* owner='DBAUTIL' and
object_name like 'DBA%'
SQL> /
COUNT(*)
----------
6
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=29)
1 0
SORT (AGGREGATE)
2 1
INDEX (FULL SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=221
Card=1 Bytes=29)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
221 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 12.
There is our 221 cost again, and
221 consistent gets. Obviously, the TEST_ORD1 index would not be a
good candidate for this query. How about TEST_ORD4? (See Example
13.)
SQL> select /*+ index(test
test_ord4) */ count(*) from test
2 where
3* owner='DBAUTIL' and
object_name like 'DBA%'
SQL> /
COUNT(*)
-------------
6
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=29)
1 0
SORT (AGGREGATE)
2 1
INDEX (RANGE SCAN) OF 'TEST_ORD4' (NON-UNIQUE) (Cost=14
Card=1 Bytes=29)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 13.
Again, this demonstrates that by
properly ordering columns, we can reduce the cost of processing. By
using the column order in TEST_ORD4, we would have increased our
cost by a factor of 7 over TEST_ORD2 or TEST_ORD3, and by a factor
of 110 using the order from TEST_ORD1. Example 14 shows what the
cost would be for a full table scan.
SQL> select /*+ full(test) */
count(*) from test
2 where
3* owner='DBAUTIL' and
object_name like 'DBA%'
SQL> /
COUNT(*)
------------
6
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=29)
1 0
SORT (AGGREGATE)
2 1
TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=1 Bytes=29)
Statistics
----------------------------------------------------------
0 recursive calls
36 db block gets
446 consistent gets
0 physical reads
0 sorts (memory)
0 sorts (disk)
1 rows processed
Example 14.
As we might have anticipated, it
is the same as with the query in Example 9. But, had we used the
column order in TEST_ORD1 for our index, based solely on the results
from our analysis of the first query, we would choose a full table
scan for this query and again increase our cost by up to a factor of
30 over the best performing indexes.
The examples in
Table 1 show that the index column order should be analyzed for
all of the major queries, and a matrix showing cost should be
prepared. Using the matrix, the index with the lowest overall
average cost for all queries should be used instead of just blindly
choosing a column order.
Query |
TEST_ORD1 |
TEST_ORD2 |
TEST_ORD3 |
TEST_ORD4 |
FTS |
Q1 |
2 |
221 |
14 |
2 |
67 |
Q2 |
221 |
2 |
2 |
14 |
67 |
Average |
111.5 |
111.5 |
8 |
8 |
67 |
Table 1.
So, for our indexes, the TEST_ORD3
or TEST_ORD4 column orders are the best choices, assuming that the
queries have equal weight in our applications. However, if Q1 was
performed thousands of times a day and Q2 only a few, then TEST_ORD1
would be a better choice; if Q2 were the predominate query, then
TEST_ORD2 would be a better choice.
SUMMARY
In this article, we have examined
the effects of concatenated index column order on clustering factor
and query performance for some simple test cases. These tests show
that in your environment, preparation of a query/index performance
matrix for concatenated indexes and your predominate queries is a
must, in order to ensure that you choose the proper index column
order.
Got Scripts?
Mike Ault, one of the
world's most widely-read Oracle experts, has finally consented to release his
complete collection of more than 450 Oracle scripts.
Mike has priced his
collection of 465 scripts at $39.95, less than a dime per script.
You can download them immediately at this link:
http://www.rampant-books.com/download_adv_mon_tuning.htm
|