 |
|
Cost Control: Inside the Oracle Optimizer
Oracle Tips by Burleson Consulting
|
2008 Updates:
This article has the following sections:
PART 4 - Considering the CBO
While we have gone into great detail on the
CBO, there is always more to learn as the CBO becomes more powerful
(and complex) with each new release of Oracle. The main points of this
article include general guidelines for adjusting the behavior of the CBO:
- The DBA can control the overall
behavior of the CBO with several Oracle parameters, but they should
only be changed under limited circumstances.
- The CBO relies on statistics to
determine the optimal execution plan for SQL statements, and
statistics should always be collected with the dbms_stats
package.
- An important job of the Oracle DBA
is the collection and management of statistics for the CBO. CBO
statistics can be collected, stored, and migrated to other related
instances to ensure consistency of execution plans.
- Re-analyzing schema statistics
without exporting the old statistics (using export_system_stats) is dangerous
because the execution plans for thousands of SQL statements may
change, and you cannot get back to previous SQL performance.
Re-analyze a schema only when there are significant changes to the
data.
In Part 1 of the series, we discussed the basic mechanisms
of Oracle's cost-based SQL optimizer for making
the best decisions about the access paths to data. In this
concluding installment, we'll address the use of histograms,
external costing features, SQL hints for changing execution plans,
and techniques for locating and tuning suboptimal SQL.
Using Histograms
In some cases, the distribution of values
within a column of a table will affect the optimizer's decision to use
an index vs. perform a full-table scan. This scenario occurs when the
value with a where clause has a
disproportional amount of values, making a full-table scan cheaper
than index access.
A column histogram should only be created
when we have data skew exists or is suspected. In the real world,
that happens rarely, and one of the most common mistakes with the
optimizer is the unnecessary introduction
of histograms into optimizer statistics. The histograms signals the
optimizer that the column is not linearly distributed, and the
optimizer will peek into the literal value in the SQL where clause and
compare that value to the histogram buckets in the
histogram statistics (see Figure
3).
Many Oracle professionals misunderstand the
purpose of histograms. While they are used to make a yes-or-no decision about the use of an index to access the table,
histograms are most commonly used to predict the size of the intermediate result set from a multi-way table
join.
For example, assume that we have a five-way
table join whose result set will be only 10 rows. Oracle will
want to join the tables together in such a way as to make the result
set (cardinality) of the first join as small as possible. By carrying
less baggage in the intermediate result sets, the query will run
faster. To minimize intermediate results, the optimizer attempts to
estimate the cardinality of each result set during the parse phase of SQL execution. Having histograms on skewed column will
greatly aid the optimizer in making a proper decision. (Remember, you can create a histogram even if the
column does not have an index and does not participate as a join key.)
Because a complex schema might have tens of
thousands of columns, it is impractical to evaluate each column for skew and thus Oracle provides an automated method for
building histograms as part of the dbms_stats
utility. By using the method_opt=>'for all columns size
skewonly' option of dbms_stats,
you can direct Oracle to automatically create histograms for those
columns whose values are heavily skewed. We'll take a look at this
option in more detail later.
As a general rule, histograms are used to
predict the cardinality and the number of rows returned in the result
set. For example, assume that we have a product_type
index and 70% of the values are for the HARDWARE
type. Whenever SQL with where product_type='HARDWARE'is
specified, a full-table scan is the fastest execution plan, while a
query with where product_type='SOFTWARE'
would be fastest using index access.
Because histograms add additional overhead
to the parsing phase of SQL, you should avoid them unless they are
required for a faster optimizer execution plan. But there are several
conditions where creating histograms is advised:
- When the column is referenced in
a query — Remember, there is no
point in creating histograms if the queries do not reference the column. This mistake is common,
and many DBAs will create histograms on a
skewed column, even though it is not referenced by any queries.
- When there is a significant skew
in the distribution of columns values
—This skew should be sufficiently significant
that the value in the WHERE clause will make the optimizer choose a
different execution plan.
- When the column values cause an
incorrect assumption — If the
optimizer makes an incorrect guess about the
size of an intermediate result set it may choose a sub-optimal table
join method. Adding
a histogram to this column will often provide the information
required for the optimizer to use the best join method.
So how do we find those columns that are
appropriate for histograms? One exciting feature of
dbms_stats is the ability to automatically
look for columns that should have histograms, and create the
histograms. Again, remember that multi-bucket histograms add a huge
parsing overhead to SQL statements, and histograms should only be used
when the SQL will choose a different execution plan based upon the
column value.
To aid in intelligent histogram generation,
Oracle uses the method_opt
parameter of dbms_stats. There
are also important new options within the method_opt
clause, namely skewonly, repeat
and auto.
method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'
Let's take a close look at each method
option.
The first is the "skewonly" option, which is
very time-intensive because it examines the distribution of values for every column within every index. If dbms_stats
discovers an index with columns that are unevenly distributed, it will
create histograms for that index to aid the cost-based SQL optimizer
in making a decision about index vs. full-table scan
access. For example, if an index has one column that is in 50% of the
rows, a full-table scan is faster than
and index scan to retrieve these rows.
Histograms are also used with SQL that has
bind variables and SQL with cursor_sharing
enabled. In these cases, the
optimizer determines if the column value could affect the execution
plan, and if so, replaced the bind
variable with a literal and performs a hard parse:
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
/
The auto option
is used when monitoring is implemented (alter table xxx
monitoring;) and creates histograms based upon data distribution and the manner in
which the column is accessed by the application (e.g. the
workload on the column as determined by monitoring). Using
method_opt=>'auto'is similar to using the gather
auto in the option
parameter of dbms_stats:
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
/
Our tools:

Our Ion tool is
the easiest way to analyze Oracle performance and Ion
allows you to spot hidden performance trends.
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|
|