What are the most
common root-causes of poor Oracle performance? Every expert will
give you a different opinion, but I’ve prepared my list based on our
hundreds of Oracle tuning engagements:
-
Bad Design – The number one offender to poor performance is
over-normalization of Oracle tables, excessive (unused indexes)
and 15-way table joins for what should be a simple fetch.
-
Poor server optimization – Setting the server kernel
parameters and I/O configuration (e.g. direct I/O) has a
profound impact on Oracle performance
-
Bad disk I/O configuration – Inappropriate use of RAID5,
disk channel bottlenecks and poor disk striping.
-
Poor Optimizer Statistics – Prior to Oracle 10g (automatic
statistics), a common cause of poor SQL performance was
missing/stale CBO statistics and missing histograms.
-
Object contention – Failing to set ASSM, freelists or
freelist_groups for DML-active tables and indexes can cause very
slow DML performance.
-
Under-allocated RAM regions – Not allocating enough RAM for
shared_pool_size, pga_aggregate_target and db_cache_size can
cause an I/O-bound database.
-
Non-reentrant SQL – All SQL should use host
variables/cursor_sharing=force to make SQL reusable within the
library cache.
-
Un-set initialization parameters – Many of the
initialization parameters are made to be set by the DBA
(db_file_multiblock_read_count, optimizer_index_caching) and
failing to set these parameters properly results in poorly
optimized execution plans.
-
Excessive nested loop joins – In 64-bit Oracle systems we
have gigabytes available for RAM sorts and hash joins. Failing
to set pga_aggregate_target to allow the CBO to choose hash
joins can result in very slow SQL performance.
-
Human Misfeasance – The DBA’s failure to monitor their
database (STATSPACK/AWR), set-up exception reporting alerts
(OEM) and adjusting their instance to match changing workloads
is a major cause of poor performance.
The
Oracle Documentation lists these ten reasons for poor
performance. The BC list is similar, but our top-10 list is based
on what we see with our clients:
-
Bad Connection Management
The application connects and disconnects for each database
interaction. This problem is common with stateless middleware in
application servers. It has over two orders of magnitude impact on
performance, and it is totally unscalable.
-
Bad Use of Cursors and the Shared Pool
Not using cursors results in repeated parses. If bind variables are
not used, then there is hard parsing of all SQL statements. This has
an order of magnitude impact in performance, and it is totally
unscalable. Use cursors with bind variables that open the cursor and
execute it many times. Be suspicious of applications generating
dynamic SQL.
-
Getting Database I/O Wrong
Many sites lay out their databases poorly over the available disks.
Other sites specify the number of disks incorrectly, because they
configure disks by disk space and not I/O bandwidth.
-
Redo Log Setup Problems
Many sites run with too few redo logs that are too small. Small redo
logs cause system checkpoints to continuously put a high load on the
buffer cache and I/O system. If there are too few redo logs, then
the archive cannot keep up, and the database will wait for the
archive process to catch up.
-
Serialization of data blocks in the buffer cache due to lack
of free lists, free list groups, transaction slots (
INITRANS),
or shortage of rollback segments.
This is particularly common on
INSERT-heavy
applications, in applications that have raised the block size to 8K
or 16K, or in applications with large numbers of active users and
few rollback segments.
-
Long Full Table Scans
Long full table scans for high-volume or interactive online
operations could indicate poor transaction design, missing indexes,
or poor SQL optimization. Long table scans, by nature, are I/O
intensive and unscalable.
-
In Disk Sorting
In disk sorts for online operations could indicate poor transaction
design, missing indexes, or poor SQL optimization. Disk sorts, by
nature, are I/O-intensive and unscalable.
-
High Amounts of Recursive (
SYS)
SQL
Large amounts of recursive SQL executed by
SYS could
indicate space management activities, such as extent allocations,
taking place. This is unscalable and impacts user response time.
Recursive SQL executed under another user ID is probably SQL and
PL/SQL, and this is not a problem.
-
Schema Errors and Optimizer Problems
In many cases, an application uses too many resources because the
schema owning the tables has not been successfully migrated from the
development environment or from an older implementation. Examples of
this are missing indexes or incorrect statistics. These errors can
lead to sub-optimal execution plans and poor interactive user
performance. When migrating applications of known performance,
export the schema statistics to maintain plan stability using the
DBMS_STATS
package.
Likewise, optimizer parameters set in the initialization parameter
file can override proven optimal execution plans. For these reasons,
schemas, schema statistics, and optimizer settings should be managed
together as a group to ensure consistency of performance.
-
Use of Nonstandard Initialization Parameters
These might have been implemented based on poor advice or incorrect
assumptions. In particular, parameters associated with
SPIN_COUNT on
latches and undocumented optimizer features can cause a great deal
of problems that can require considerable investigation.