Free Oracle Tips

HTML Text
Oracle Consulting Oracle Support
Oracle Training
SQL Tuning Consulting
Oracle Tuning Consulting
Data Warehouse Consulting
Oracle Project Management
Oracle Security Assessment
Unix Consulting
Burleson Books
Burleson Articles
Burleson Web Courses
Burleson Qualifications
Oracle Internals Magazine
Oracle Links
Oracle Monitoring
Remote Support Benefits
Remote Plans & Prices
Our Automation Strategy
What We Monitor
Oracle Apps Support
Print Our Brochure
Contact Us (e-mail)
Oracle Job Opportunities
Oracle Consulting Prices
 

Free Oracle Tips


 
HTML Text AOL
 
 

Index Column Order Does Matter

Mike Ault
 

I have heard it said that since Oracle will reorder the WHERE clause statements to ensure that the leading edge of an index will be used if possible, and, since the new skip-scan index search method is available, it is not required to properly order the columns in a concatenated index. In tests against an Oracledatabase, I found this assertion to be false, and it can result in poorly performing queries. This tip will show the results from some basic tests to disprove this statement.

For queries that access common rows with a table (e.g. get all items in order 123), unordered tables can experience huge I/O as the index retrieves a separate data block for each row requested.

If we group like rows together (as measured by the clustering_factor in dba_indexes) we can get all of the row with a single block read because the rows are together.  You can use 10g hash cluster tables, single table clusters, or manual row re-sequencing (CTAS with ORDER BY) to achieve this goal:

INDEX ORDER AND STRUCTURE

In a standard concatenated B-tree index, the first level will be based on the initial column values and subsequent levels on the following columns. If the index is properly built, then the mapping from these levels into the source table will be fairly linear; that is, the index clustering factor will be close to the number of dirty blocks in the table. Figure 1 shows a properly ordered index in relation to its table.

 

Figure 1. A Well-Ordered Index

 

If, on the other hand, an improper order is selected for the columns in the index, then the index will not be a linear match to the source table, and the result will be a large clustering factor that is closer to the number of rows in the table. A large clustering factor requires more table reads to process the same amount of data as a small clustering factor. A poorly ordered index is shown in Figure 2.

 

Figure 2. A Poorly Ordered Index

 

A clustering factor can be equated to the number of table block reads that would be required to perform a full table scan based on reads from the index. A large clustering factor indicates that table blocks would be required to be accessed more than once for a particular index scan operation.

The order in which a table's columns are specified in a concatenated B-tree index can make a significant difference in the indexes' ordering, hence in its efficiency. The next section contains examples.

EXAMPLES USING VARIOUS COLUMN ORDERS IN AN INDEX

Before we can create indexes, we must have a test table that can be easily created and populated with data. I use the DBA series of views to provide input to a create table as select (CTAS). For this example, we will use the DBA_OBJECTS view since it has an easy structure to use for indexing. Example 1 shows the CTAS and various statistics for the table TEST created as an image of the DBA_OBJECTS view.

 

SQL> create table test as select * from dba_objects;

Table created.

SQL> select count(*) from test;

 COUNT(*)
----------
  31472

 
SQL> SELECT 'test',COUNT( DISTINCT( SUBSTR( dbms_rowid.rowid_to_restricted(ROWID,1),1,8))) blocks

  2  FROM test;

 

TEST     BLOCKS
----        ----------
test            434

 

Example 1. CTAS to Create TEST Table

Notice the number of blocks (434) and the number of rows (31472). These will be important once we have our index clustering factors. Now that we have our table, we will create several indexes using the same columns, but placing them in different orders. The columns we will use are the object_type, owner, and object_name. The possible orders for these columns are:

  • Object_type, owner, object_name

  • Object_type, object_name, owner

  • Object_name, owner, object_type

  • Object_name, object_type, owner

  • Owner, object_name, object_type

  • Owner, object_type, object_name

Now, we will select a couple of these orders and create some example indexes. Example 2 shows the create statements for these indexes.

SQL> create index test_ord1 on test(object_type,owner,object_name);

SQL> analyze index test_ord1 compute statistics;

SQL> create index test_ord2 on test(object_name,owner,object_type);

SQL> analyze index test_ord2 compute statistics;

SQL> create index test_ord3 on test(owner,object_name,object_type);

SQL> create index test_ord4 on test(owner,object_type,object_name);

SQL> analyze index test_ord4 compute statistics;

 

Example 2. Creation of Various Indexes with Different Column Orders

Now we will pull the example indx clustering factors all together in one place and compare them in Example 3.

SQL> select index_name,clustering_factor from user_indexes where index_name like 'TEST%';

INDEX_NAME                     CLUSTERING_FACTOR

------------------------------ -----------------

TEST_ORD1                                    925

TEST_ORD2                                  25289

TEST_ORD3                                    451

TEST_ORD4                                    908

 

Example 3. The Clustering Factors for the Indexes

As you can see, the clustering factors range from a low of 451 for TEST_ORD3 to a high of 25289 for TEST_ORD2. So, based on the clustering factor, TEST_ORD3 is the best-ordered index and TEST_ORD2 is the worst-ordered index. Now we will do some sample queries to see how the various indexes fare as far as cost in the cost-based optimizer (CBO).

First, we must analyze the base table (see Example 4).

SQL> analyze table test compute statistics;

Table analyzed.

 

Example 4.  Analysis of the Base Table

Next, we will issue a SELECT that will use one of the indexes, the one chosen by the CBO:

SQL> select count(*) from test

  2  where

  3  owner='DBAUTIL' and object_type='TABLE';

 COUNT(*)
----------
    338

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=2 C

          ard=65 Bytes=845)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          4  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed


Example 5.

As you can see in Example 5, the CBO used the index with the higher clustering factor based on the right-to-left read of the WHERE clause leading column rule. The overall cost of the operation was 2. Now we will force the optimizer to use a different index and see what the cost becomes. First, let's use the lowest clustering factor index, TEST_ORD3. We will force the query to use the index through use of the INDEX hint (see Example 6).

SQL> select /*+ index(test test_ord3) */ count(*) from test

  2  where

  3  owner='DBAUTIL' and object_type='TABLE';

 COUNT(*)
----------
     338

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=13)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD3' (NON-UNIQUE) (Cost=14

          Card=65 Bytes=845)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          8  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

1          rows processed


Example 6.

Note that the cost (14) was seven times the cost of our first query (2). The use of the TEST_ORD3 index resulted in a higher cost because the query had to traverse more of the index since the OWNER column was not the leading column. Now we will examine the results form using the highest clustering cost index, TEST_ORD2 (see Example 7).

SQL> select /*+ test(test_ord2) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_type='TABLE'

SQL> /

 
COUNT(*)
------------
      338

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=13)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (FULL SCAN) OF 'TEST_ORD2' (NON-UNIQUE) (Cost=221

          Card=65 Bytes=845)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        221  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

1  rows processed


Example 7.

Wow! A whopping 110 times higher cost (221) over the lowest cost (2) so far. Have you noticed a quasi-correlation between the consistent gets statistics and the cost factors?  Let us look at TEST_ORD4 as shown in Example 8.

aultdb1>select /*+ index(test test_ord4) */ count(*) from test

  2  where

  3  owner='DBAUTIL' and object_type='TABLE';

 
COUNT(*)
----------
     338

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=2 C

          ard=65 Bytes=845)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          4  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

 

Example 8.

TEST_ORD4 performs as well as the other two low cost indexes. Now we will try the query with no index by using the FULL hint as shown in Example 9.

SQL> select /*+ full(test) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_type='TABLE'

SQL> /


COUNT(*)
----------
   338

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=13)

   1    0   SORT (AGGREGATE)

   2    1     TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=65 Bytes=845

          )

Statistics

----------------------------------------------------------

          0  recursive calls

         36  db block gets

        446  consistent gets

          0  physical reads

          0  redo size

          0  sorts (memory)

          0  sorts (disk)

1  rows processed


Example 9.

As you can see, the indexes are making a difference as far as cost. However, if you improperly arranged the index columns as in TEST_ORD2 and you had no other column orders to compare to, from these statistics you would choose to use a full table scan, increasing your cost by up to a factor of 30 based on the best performing index.

What about a situation that requires and index scan? Let us change our query to use a LIKE operation and see what the CBO does with it in Example 10.

SQL> select count(*) from test

  2  where

  3* owner='DBAUTIL' and object_name like 'DBA%'

SQL> /


COUNT(*)
----------
       6

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD2' (NON-UNIQUE) (Cost=2 C

          ard=1 Bytes=29)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          5  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed


Example 10.

It is not what we expected, is it? In this case, it looks like the CBO chose TEST_ORD2 even though it has the highest clustering factor. Let's look at some of the other indexes and see why. First, our lowest clustering factor index, TEST_ORD3 (see Example 11).

 SQL> select /*+ index(test test_ord3) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_name like 'DBA%'

SQL> /

 
COUNT(*)
----------
      6

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD3' (NON-UNIQUE) (Cost=2 C

          ard=1 Bytes=29)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          2  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

1  rows processed


Example 11.

As you can see, this index also produced a cost of 2, so the optimizer probably chose the TEST_ORD2 index based on column order. Now we will look at our previous queries' high performer, TEST_ORD1 shown in Example 12.

SQL> select /*+ index(test test_ord1) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_name like 'DBA%'

SQL> /

 
COUNT(*)
----------
     6

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=29)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (FULL SCAN) OF 'TEST_ORD1' (NON-UNIQUE) (Cost=221

          Card=1 Bytes=29)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        221  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

 

Example 12.

There is our 221 cost again, and 221 consistent gets. Obviously, the TEST_ORD1 index would not be a good candidate for this query. How about TEST_ORD4? (See Example 13.)

SQL> select /*+ index(test test_ord4) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_name like 'DBA%'

SQL> /


COUNT(*)
-------------
        6

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=29)

   1    0   SORT (AGGREGATE)

   2    1     INDEX (RANGE SCAN) OF 'TEST_ORD4' (NON-UNIQUE) (Cost=14

          Card=1 Bytes=29)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          8  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed


Example 13.

Again, this demonstrates that by properly ordering columns, we can reduce the cost of processing. By using the column order in TEST_ORD4, we would have increased our cost by a factor of 7 over TEST_ORD2 or TEST_ORD3, and by a factor of 110 using the order from TEST_ORD1. Example 14 shows what the cost would be for a full table scan.

SQL> select /*+ full(test) */ count(*) from test

  2  where

  3* owner='DBAUTIL' and object_name like 'DBA%'

SQL> /


COUNT(*)
------------
        6

Elapsed: 00:00:00.02

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=29)

   1    0   SORT (AGGREGATE)

   2    1     TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=1 Bytes=29)

Statistics

----------------------------------------------------------

          0  recursive calls

         36  db block gets

        446  consistent gets

          0  physical reads

          0  sorts (memory)

          0  sorts (disk)

1  rows processed


Example 14.

As we might have anticipated, it is the same as with the query in Example 9. But, had we used the column order in TEST_ORD1 for our index, based solely on the results from our analysis of the first query, we would choose a full table scan for this query and again increase our cost by up to a factor of 30 over the best performing indexes.

The examples in Table 1 show that the index column order should be analyzed for all of the major queries, and a matrix showing cost should be prepared. Using the matrix, the index with the lowest overall average cost for all queries should be used instead of just blindly choosing a column order.

Query TEST_ORD1 TEST_ORD2 TEST_ORD3 TEST_ORD4 FTS
Q1 2 221 14 2 67
Q2 221 2 2 14 67
Average 111.5 111.5 8 8 67

Table 1.

So, for our indexes, the TEST_ORD3 or TEST_ORD4 column orders are the best choices, assuming that the queries have equal weight in our applications. However, if Q1 was performed thousands of times a day and Q2 only a few, then TEST_ORD1 would be a better choice; if Q2 were the predominate query, then TEST_ORD2 would be a better choice.


 

SUMMARY

In this article, we have examined the effects of concatenated index column order on clustering factor and query performance for some simple test cases. These tests show that in your environment, preparation of a query/index performance matrix for concatenated indexes and your predominate queries is a must, in order to ensure that you choose the proper index column order.



 


Got Scripts?

Mike Ault, one of the world's most widely-read Oracle experts, has finally consented to release his complete collection of more than 450 Oracle scripts.

Mike has priced his collection of 465 scripts at $39.95, less than a dime per script.  You can download them immediately at this link:

http://www.rampant-books.com/download_adv_mon_tuning.htm


Need Oracle Tuning Support?

Burleson Consulting now offers a one-day performance review for your Oracle database.  Working with top experts who tune hundreds of databases each year, you can get fast expert tuning advice to hypercharge your Oracle database.

We also provide expert upgrades to Oracleand Oracle10g, and our DBAs can quickly show you how to implement the important new features of new Oracle releases.

Call now for remote Oracle support.

Regards,


Don Burleson

Kittrell, NC, USA, 27544

www.dba-oracle.com
www.remote-dba.net 

 

Burleson Consulting
 Kittrell, NC, 27544

Email: info@remote-dba.net • Phone (800) 766-1884

Copyright © 1996 - 2015 by Burleson , Inc. All rights reserved.