Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Manipulating the CBO in an OLTP Environment

Oracle Tips by
Mladen Gogala

Mladen Gogala, author of the exciting book on dynamic web pages with Oracle, "Easy Oracle PHP" has published a great article about Oracle CBO internals for OLTP systems:


Part 1 - Oracle RBO and CBO for OLTP

Part 2: Painkillers

Tools to mitigate the separation anxiety are many: hints, outlines, fiddling with "OPTIMIZER_INDEX" parameters, manipulating index statistics directly by using DBMS_STATS.SET_INDEX_STATS and last but not least, the proper application design.

Actually, proper design is the most important method. If your database is designed well, your SQL is going to be natural and simple. The untold secret of the trade is that bad SQL comes from trying to access data which wasn't designed to allow easy access. That results in N-way joins, using at least one non-indexed column and massive DECODE and PL/SQL functions created in order to be able to retrieve the data. There are few universally known rules here:

       Every table must have a primary key.

       Every table must contain timestamp, type of action performed and who did it, as a very minimum. Other identifying data, like the  client id, contract id or alike should also be present.

       Columns frequently used in queries must be indexed.

       Same item must be called the same in every table.

       Business rules should be enforced by the database (triggers) and not the application. This is usually hotly debated topic as application developers feel that an application should be the place where business rules are enforced. The problem arises when there is more then one application handling the data. There is no way to make sure that two programmers will not differ in their implementations,  which would lead to inconsistencies.

We have all seen applications breaking at least one of these rules. I've also seen applications breaking them all.  Especially so called "legacy applications",  precisely the applications designed for the RBO, are prone to break those rules. When this is the case, one needs to consider redesigning the application. Applications have their life cycle and after 6-8 years they have to go through a complete overhaul to remain useful. If your application was designed for RBO,  it was likely designed before 2002 and it a major face lift should be in order. Sometimes, that is not an option, mainly for business reasons. So, now we have a problem: we have an old application, designed for RBO which we have to move to Oracle 10g. We cannot redesign, the most we can do is to export/import the application from an old version of the database.

The next best option, after redesigning the application is to, somehow, make CBO behave like RBO on steroids.  That is achieved by manipulating two parameters, OPTIMIZER_INDEX_CACHING and OPTIMIZER_ INDEX_COST_ADJ. Those  two parameters regulate the following things:

       OPTIMIZER_INDEX_CACHING  makes CBO assume that for any index certain percentage of blocks is cached. If  the value of the parameter is 80, CBO assumes that 80% of the index blocks are cached. According to Jonathan's book and private communication, this parameter has no effect on single table access paths, it only affects in-list iterators and nested loop joins.

       OPTIMIZER_ INDEX_COST_ADJ parameter defines the cost of a single-block I/O as a percentage of a cost of a multi-block I/O request. Single block I/O is most frequently used when retrieving data from an index, while multi-block I/O is normally used when retrieving data from a table.  Reference manual   claims that this parameter determines a cost of an I/O against index as a percentage of a cost of an I/O against tables. I have to thank Jonathan Lewis again for this clarification.

These two parameters can be manipulated both on the instance level, with ALTER SYSTEM or at the session level, with ALTER SESSION. Those two parameters will make CBO behave like RBO: if an index is  there, it will be used. Unfortunately,  this approach suffers from the same shortcoming as RBO: if  an index is there and we don't want it used, we have to explicitly disable it, by using hints or otherwise. That is something that we can live with, in the OLTP world. Of course, this is the real world so databases are almost never pure OLTP databases. 

Every OLTP database that I've ever seen runs some reports, processes to supply data to the data warehouse, periodical billing or some other kind of a batch job, which is usually bad suited for an OLTP environment. The owners of those jobs  are happy with CBO as it is and don't want it skewed toward index usage.  In such cases we have to manipulate those parameters on the session level.  Applications written in Java, using EJB and an application server like WebLogic or WebSphere can help us immensely here.  Such applications make use of the application server pooling and are all contacting database using the same user id.

This is, of course, a pain to debug and monitor as it is almost impossible to connect a physical user to his database session, but can be helpful if we want to fiddle with the CBO.  If we cannot alter OPTIMIZER INDEX parameters on the instance level, we can do it on the session level. Having only a single user to do it for makes things easier. The right way of doing it, without having to ask  the application server administrator is through a database trigger, like this one:

CREATE TRIGGER OIPARAM
AFTER LOGON
ON SCOTT.SCHEMA
DECLARE
OIC VARCHAR2(128):= 'ALTER SESSION SET OPTIMIZER_INDEX_CACHING=80';
OICA VARCHAR2(128):= 'ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=25';
BEGIN
EXECUTE IMMEDIATE OIC;
EXECUTE IMMEDIATE OICA;
END;
/

This is a simple database event trigger which sets the OPTIMIZER_INDEX parameters to values which guarantee "rule based" behavior by using  the  "EXECUTE IMMEDIATE" command. On a well oiled OLTP systems, one can reasonably expect large part of index blocks to be cached. Together with CPU costing  (enabled by gathering system statistics) it usually provides good enough emulation of RBO.

In case with mixed database with very many users, creating a trigger for each one is not practical. In that case, we have to manipulate the statistics directly. The parameter that we can manipulate is called clustering factor and measures the "degree of disorder" of the table with respect to the given index. Clustering factor of an index is calculated by inspecting all keys in index sequentially and adding one whenever block change is encountered.  Clustering factor for single column index on column COL is always between the following two values:

SELECT COUNT(DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))

FROM OUR_TABLE

WHERE COL IS NOT NULL

and

SELECT COUNT(ROWID) FROM OUR_TABLE WHERE COL IS NOT NULL

 

Clustering factor is a very important attribute, significantly influencing optimizer decision whether to use the index or not. Higher the clustering factor is, less likely is that the index will be chosen by the CBO.  By setting the clustering factor low or high, we can suggest the optimizer whether to use the index or not. Clustering factor seems trivial but is actually more important then it seems.

For queries that access common rows with a table (e.g. get all items in order 123), unordered tables can experience huge I/O as the index retrieves a separate data block for each row requested.

If we group like rows together (as measured by the clustering_factor in dba_indexes) we can get all of the row with a single block read because the rows are together.  You can use 10g hash cluster tables, single table clusters, or manual row re-sequencing (CTAS with ORDER BY) to achieve this goal:

For instance, in Oracle 10.2 all tablespaces have automated segment space management by default. That usually results in more blocks in the table and higher clustering factor. Some plans are likely to go awry after upgrade to 10.2. So, how do we set statistics?

Let's create an additional index on the table  SCOTT.EMP:

 

CREATE INDEX "SCOTT"."EMP_ENAME_I" ON "SCOTT"."EMP" ("ENAME")

 

When this index is analyzed, we get the following from DBA_INDEXES:

  1  SELECT CLUSTERING_FACTOR

  2  FROM DBA_INDEXES

  3* WHERE OWNER='SCOTT' AND INDEX_NAME='EMP_ENAME_I'

SQL> /

 

CLUSTERING_FACTOR

-----------------

                1

OK, let's now execute a query with AUTOTRACE on and pay attention to the overall cost:

  1* select empno,ename,job from scott.emp where ename='CLARK'
SQL> /
 
     EMPNO ENAME      JOB
---------- ---------- ---------
      7782 CLARK      MANAGER
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 549418132
 
-------------------------------------------------------------------------------------------
 
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 
-------------------------------------------------------------------------------------------
 
|   0 | SELECT STATEMENT            |             |     1 |    18 |     2   (0)| 00:00:01 |
 
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP         |     1 |    18 |     2   (0)| 00:00:01 |
 
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_I |     1 |       |     1   (0)| 00:00:01 |
 

 

 

 

So, index is used and cost is 2. Let's now "fix" the statistics for that index:

begin

DBMS_STATS.SET_INDEX_STATS (

ownname  =>'SCOTT',

indname  => 'EMP_ENAME_I',

clstfct  => 100);

end;

 

 

Now, let's check the result from DBA_INDEXES:

  1  SELECT CLUSTERING_FACTOR

  2  FROM DBA_INDEXES

  3* WHERE OWNER='SCOTT' AND INDEX_NAME='EMP_ENAME_I'

SQL> /

 

CLUSTERING_FACTOR

-----------------

              100

 

Now, we will re-execute our simple query, asking for ENAME='CLARK', with AUTOTRACE on:

 

SQL> select empno,ename,job from scott.emp where ename='CLARK';
 
     EMPNO ENAME      JOB
---------- ---------- ---------
      7782 CLARK      MANAGER
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    18 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    18 |     2   (0)| 00:00:01 |

 

Suddenly, the index is no longer used as the new clustering factor makes it much more expensive. Let's force the issue and make the query use the index by using hint:

 

  1  select /*+ index(e emp_ename_i) */
  2         empno,ename,job
  3  from scott.emp e
  4* where ename='CLARK'
SQL> /
 
     EMPNO ENAME      JOB
---------- ---------- ---------
      7782 CLARK      MANAGER
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 549418132
 
-------------------------------------------------------------------------------------------
 
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
 
-------------------------------------------------------------------------------------------
 
|   0 | SELECT STATEMENT            |             |     1 |    18 |     5   (0)| 00:00:01 |
 
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP         |     1 |    18 |     5   (0)| 00:00:01 |
 
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_I |     1 |       |     1   (0)| 00:00:01 |

 

 

The cost is now 5. We didn't change the table, we didn't change  the index, the only thing that was hacked was the index statistics. It works both ways. We can make index "good" or "bad", depending on our intentions.

What are the problems with this approach? First, the next time that the index is analyzed, the clustering factor will be reset:

SQL> analyze index scott.emp_ename_i compute statistics;
 
Index analyzed.
 
SQL> select clustering_factor from dba_indexes
  2  where owner='SCOTT' and index_name='EMP_ENAME_I';
 
CLUSTERING_FACTOR
-----------------
                1
 
 

Every database that worked with has some kind of statistics computation job, executed in regular intervals. In other words, our clever hack will be gone after certain period of time, changing  the execution plans of the SQL statements without warning.  That can be contravened by using DBMS_STATS.LOCK_TABLE_STATS procedure, but that defeats the purpose of having data regularly analyzed. It is very likely that the statistics will not have to be manipulated just for a single table and a single index, we will have to manipulate it for the set of tables,  comprising a significant part of  our tables, therefore making periodical statistics computation an exercise in futility.

So, how necessary periodic statistics computation really is? To make the long story short, it isn't. Every statistics re-computation will  change execution plans, sometimes quite unpredictably. What is the purpose of having an optimizer statistics? The purpose of statistics c is to have a good input values for the CBO to be able to come up with intelligent plans.  As long as the statistics reflects realistic relationships among the object sizes, it doesn't need to be changed. 

The so called "Ensor's paradox" (named after Dave Ensor, famous Oracle DBA and an Oak Table member) states that the only time when it is safe to collect statistics is when the statistics is not needed. Essentially, following the usual business logic,  after each statistics computation, the system should undergo an extensive QA test, to make sure that the overall performance is still acceptable. Collecting statistics is something that should be done rarely, only when the data volume changes significantly.

Such a "lazy" model should also be adopted  to allow for a normal development process.  The DBMS_STATS package can also be used to transfer statistics between systems, making sure that the development, staging, QA and production systems all have the same statistics, making those EXPLAIN PLAN commands done by the developers relevant in production. 

Having stable and dependable statistics should be a paramount in a business environment and yet I find it very hard to convince production DBA people not to collect statistics on a daily, weekly or monthly basis. Collecting up-to date statistics is obviously some kind of  religious ritual which has exactly the effect it is supposed to alleviate: it introduces ghosts, spirits and daemons in your databases.  How many times have I seen questions like "It worked fine yesterday, but it is extremely slow today. Nothing was changed. Please, help me"?  More often then not, the root of the evil is in statistics. To add insult to the injury, Oracle 10g has automated job that collects statistics once a week, so even if  the DBA didn't create the job it still exists and runs, unless explicitly disabled:

SQL> connect sys@local as sysdba
Enter password:
Connected.
SQL> call dbms_scheduler.disable('GATHER_STATS_JOB');
 
Call completed.

 

Finally, what are the shortcomings of fiddling with statistics? Well, for one, it's global in scope. By hacking statistics and converting a "bad index" into a "good index", you can inadvertently affect any application using the underlying table, not just an OLTP ones. If a table is used mainly for reporting purposes and the indexes it has have to be referenced by hints to be used, then hacking the clustering factor will mess up the access path for the reports accessing the table.

Conclusion

There are several methods for achieving acceptable performance and stability in an OLTP environment, with varying degree of sophistication.  Each method has its scope, environment for which it is appropriate, its advantages and its shortcomings.  The last method, fiddling with statistics requires quite a bit of knowledge and understanding and is usually not recommended. DBA will usually have a hard time convincing management that statistics should not be  collected on a regular basis or that "regular basis" should mean "once or twice a year". 

Therefore, I find recommending  the parameter based "silver bullet" approach most effective and the easiest to implement.  None of these methods is a solution. CBO is still very much in development and occasional excitements are more or less guaranteed, from version to version and even from patchset to patchset..  Combination of logic and common sense can achieve an acceptable performance for an OLTP system, even with all the changes that optimizer is experiencing. It is also possible to organize stable and dependable development process around CBO although it takes a little bit more planning than with the RBO.

 
Get the Complete
Oracle SQL Tuning Information 

The landmark book "Advanced Oracle SQL Tuning  The Definitive Reference"  is filled with valuable information on Oracle SQL Tuning. This book includes scripts and tools to hypercharge Oracle 11g performance and you can buy it for 30% off directly from the publisher.

 


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.