Optimizing Oracle Optimizer Statistics
Oracle Tips by Donald K. Burleson
Updated March 3, 2015
In Oracle we now see
extended optimizer statistics, an
alternative to dynamic_sampling for estimating result set
The default "For all columns size auto"
It is wasteful to create a histogram on a column that is never
referenced by a SQL statement. The default value of
"for all columns size auto" is the Oracle default and
this option will analyze histograms on what the optimizer
considers to be "important columns". If your
database uses all bind variables, it is best to remove and then
disable all histograms.
The optimizer does this by examining your current workload when
you run the dbms_stats command, interrogating all
current SQL in the library cache to see which columns might
benefit from a histogram.
As a review, the CBO gathers information from many sources, and he has
the lofty goal of using DBA-provided metadata to always make the "best"
execution plan decision:
Gathering Optimizer Statistics
Oracle also wants you to
generate system statistics and statistics on fixed tables. In this
section we will look at how you do this. First we will look at
gathering system statistics, and then we will look at how you gather
statistics on fixed tables.
The optimizer depends on various inputs, and
one of these is system derived information. This information
includes such things as disk response times and CPU response times.
Oracle Database 10g uses the dbms_stats.gather_system_stats command
to generate system statistics. You generate system statistics under
normal system loads. Generating the statistics is a three stage
* Start collecting statistics
* Run a representative load during the
* Stop collecting statistics
Here is an example of this operation (note
that the -- is a comment):
the next statement before you start running the workload
a typical workload here
the next statement to stop gathering statistics.
You should only need to run this when things
change on your system, such as when you add new disks, adds CPU's or
if the system load changes.
To fully appreciate dbms_stats, you need to
examine each of the major directives. Let's take a close look at
each directive and see how it is used to gather top-quality
statistics for the cost-based SQL optimizer.
Using one of the four provided methods, this
option governs the way Oracle statistics are refreshed:
* gather—Reanalyzes the whole schema
* gather empty—Only analyzes tables that
have no existing statistics
* gather stale—Only reanalyzes tables with
more than 10% modifications (inserts, updates, deletes).
* gather auto—Reanalyzes objects which
currently have no statistics and objects with stale statistics
(Using gather auto is like combining gather stale and gather empty.)
Note that both gather stale and gather auto
require monitoring. If you issue the alter table xxx monitoring
command, Oracle tracks changed tables with the dba_tab_modifications
view, which allows you to see the exact number of inserts, updates,
and deletes tracked since the last analysis of statistics.
estimate_percent argument is a
new way to allow Oracle's dbms_stats to automatically estimate the
best percentage of a segment to sample when gathering statistics:
You can verify the accuracy of the automatic
statistics sampling by looking at the dba_tables sample_size column.
It is interesting to note that Oracle chooses between 5 and 20
percent for a sample size when using automatic sampling. Remember,
the better the quality of your statistics, the better the decision
of the CBO.
In 11g and beyond you should not use estimate percent" and let
Oracle default to the automatic sample size. This ensures the
"best" sample size with a minimum amount of time spent analyzing
tables and indexes. See these updated notes on
Oracle estimate percent.
The method_opt parameter for
very useful for refreshing statistics when the table and index data
change. The method_opt parameter is also very useful for determining
which columns require histograms.
In some cases, the distribution of values
within an index will effect the CBOs decision to use an index versus
perform a full-table scan. This happens when a where clause has a
disproportional amount of values, making a full-table scan cheaper
than index access.
Oracle histograms statistics can be created
when you have a highly skewed index, where some values have a
disproportional number of rows. In the real world, this is quite
rare, and one of the most common mistakes with the CBO is the
unnecessary introduction of histograms in the CBO statistics. As a
general rule, histograms are used when a column's values warrant a
change to the execution plan.
To aid in intelligent histogram generation,
Oracle uses the method_opt parameter of dbms_stats. There are also
important new options within the method_opt clause, namely skewonly,
repeat and auto:
method_opt=>'for all indexed columns size
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto' -- the 10g default
Oracle allows for parallelism when
collecting CBO statistics, which can greatly speed up the time
required to collect statistics. A parallel statistics collection
requires an SMP server with multiple CPUs.
The dbms_stats utility is a great way to
improve SQL execution speed. By using dbms_stats to collect
top-quality statistics, the CBO will usually make an intelligent
decision about the fastest way to execute any SQL query. The
dbms_stats utility continues to improve and the exciting new
features of automatic sample size and automatic histogram generation
greatly simplify the job of the Oracle professional.
Many Oracle professional do not realize the important of having
good statistics for the Oracle cost-based SQL optimizer (CBO). The CBO
has received a "bum-rap" from some Oracle professionals who wonder why
the CBO is generating sub-optimal execution plans for their SQL
If you provide Oracle with good statistics about the schema the CBO
will almost always generate an optimal execution plan. The areas of
schema analysis include:
- Object statistics - Statistics for all tables,
partitions, IOTs, etc should be sampled with a "deep" and
statistically valid sample size.
- Critical columns - Those columns that are
regularly-referenced in SQL statements that are:
- Heavily skewed columns - This helps the CBO properly
choose between an index range scan and a full table scan
- Foreign key columns - For n-way table joins, the CBO
needs to determine the optimal table join order and knowing the
cardinality of the intermediate results sets is critical.
- External statistics - Oracle will sample the CPU cost and
I/O cost during statistics collection and use this information to
determine the optimal execution plan, based on your
optimizer_mode. External statistics are most useful for SQL
running in the all_rows optimizer mode.
Frequency of re-analyze
It astonishes me how many shops prohibit any un-approved production
changes and yet re-analyze schema stats weekly. Evidently, they do not
understand that the purpose of schema re-analysis is to change their
production SQL execution plans, and they act surprised when
Most Oracle experts only recommend scheduled re-analysis for highly
dynamic databases, and most shops save one very-deep sample (with
histograms), storing the statistic with the
dbms_stats.export_schema_stats procedure. The only exceptions are
highly-volatile systems (i.e. lab research systems) where a table is
huge one-day and small the next.
For periodic re-analysis, many shops us the table "monitoring"
option and also method_opt "auto" after they are confident that all
histograms are in-place.
Saving and re-using stats
For data warehouses and database using the
optimizer_mode, Oracle9i release 2 has the ability to collect the
external cpu_cost and
io_cost metrics. The ability to
save and re-use schema statistics is important for several types of
- Bi-Modal shops - Many shops get huge benefits from using
two sets of stats, one for OLTP (daytime), and another for batch
- Test databases - Many Oracle professionals will export
their production statistics into the development instances so that
the test execution plans more closely resemble the production
Getting top-quality stats
Because Oracle schema statistics work best with external system
load, I like to schedule a valid sample (using
dbms_stats.auto_sample_size) during regular working hours. For
example, here we refresh statistics using the "auto" option which
works with the table monitoring facility to only re-analyze those
Oracle tables that have experienced more than a 10% change in row
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
Tips for optimizing the CBO with statistics
There are several tips for optimizing your CBO with good
- Find skewed columns that are referenced in SQL - Many
shops do not use method_opt=skewonly and suffer from poor execution
plans on skewed column access.
- Find histograms for foreign key columns - Many DBAs
forget that the CBO must have foreign-key histograms in order to
determine the optimal table join order (i.e. the ORDERED hint).
- Fix the cause, not the symptom - For example, whenever I
see a sub-optimal order for table joins, I resist the temptation to
add the ORDERED hint, and instead create histograms on the foreign
keys of the join to force the CBO to make the best decision.
For new features, explore the Oracle10g automatic histogram
collection mechanism that interrogates v$sql_plan to see where
the foreign keys are used. It claims to generate histograms when
appropriate, all automatically.
Oracle guru Guy
Harrison also offers this advice for 11g statistics collection:
"In 11g, I think there are two other ways to get
statistics collected for indexed expressions:
Collect extended statistics directly on the expression.
So for instance, if we had a function SALES_CATEGORY, we might do
2) Create a virtual column on the
expression, then index that column. So for the same
example as above we might create the following virtual column,
then index the column and collect stats as usual:
(ownname => USER,
tabname => ‘SALES',
method_opt => ‘FOR ALL
COLUMNS FOR COLUMNS
I think I like the first method better,
because the statistics will still exist even if the index is
dropped and – unlike the second approach – it doesn't change the
logical structure of the table.
Arup Nanda has a great article on extended statistics with dbms_stats,
specialty histogram analysis using function-based columnar data:
Next, re-gather statistics on the table and collect the
extended statistics on the expression upper(cust_name).
ownname => 'ARUP',
tabname => 'CUSTOMERS',
method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
Alternatively you can define
the column group as part of the gather statistics command.
You do that by placing
these columns in the method_opt parameter of the gather_table_stats procedure in
dbms_stats as shown below:
ownname => 'ARUP',
tabname => 'BOOKINGS',
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',
cascade => true