Tuning distributed Oracle
SQL is fundamentally different than tuning ordinary SQL. Because a
single SQL query may touch many distributed instances, the optimizer
resolution becomes more complex, and this reflects in the execution plan for
the distributed query. Tuning any SQL statement involves extracting
the execution plan for the SQL, and getting a real execution plan is much
more difficult task in a distributed environment.
Let's start with a
discussion of the performance issues that happen when an SQL query spans
several Oracle databases.
distributed SQL execution plans
The phenomenal challenge
of optimizing SQL gets even harder when we see the explain plan for a
distributed query. The Oracle database is forced to "break-up" a
distributed SQL statement into several components and ship each query to the
remote Oracle database for execution. The Database must then gather
the result set from each remote database and prepare a single result from
The physical challenge for
optimizing cross-database SQL is compounded by these issues:
database characteristics - The database involved in a distributed query may be vastly different
from each other. For example, you can join an EMP table on a giant
IBM Mainframe with a SALARY table that resides on a tiny Windows server.
The optimizer cannot always compute the relative expense of an operation
on a remote server.
Hard to see the
entire distributed execution plan
- As we shall see, a remote execution plan display will only show the
SQL that was sent to the remote instance for execution. To see the
execution plan for the remote instance, we must go to that instance and
execute the sub-component to get the execution plan.
Oracle*Net has no
knowledge of network latency
- Oracle cannot factor-in the costs of network transmission time between
the instances. Hence, Oracle may choose a sub-optimal execution
plan, resulting in excessive network traffic and slow results. However,
Oracle has some safeguards against this problem. For example, if
all the objects in a query are remote (and use the same database link),
Oracle will attempt to send the complete query to the remote database
No cohesive SQL
execution - If some tables in
an SQL query are remote and some are local, then Oracle will break up
the query into individual SQL statements and pass the SQL to the remote
database for independent execution. If one database is very slow,
the whole query will wait.
Creating cross-databases execution plans
we know, Oracle develops an execution plan for every SQL statement prior to
starting the execution. In a single Oracle database, determining the
execution plan is straightforward because a single Oracle data dictionary
contains the object statistics.
Execution plan for distributed queries are different. The "set
autotrace on" command does not work for distributed queries, and you must
run a special script to display the remote execution plan.
set long 2000
set arraysize 1
col operation format a22
col options format a8
col object_name format a10
col object_node format a5
col other format a20
col position format 99999
col optimizer format a10
lpad(' ',2*(level-1))||operation operation,
start with id=0
connect by prior id=parent_id
Note that this differs from the standard SQL execution plan display because
of the inclusion of the other column in the query.
For distributed queries, Oracle uses plan_table.other column to show
the remote queries that were sent to the remote instance for execution.
truncate table plan_table;
explain plan for
Here we see an interesting execution plan. In this case we are doing a
remote hash join to the book_details table in the newpubs
database. It is very important to note that Oracle has "split" this
Note that Oracle sent the select statement for the book_details table
over Oracle*Net to be executed on the remote instance (newpubs), and
we do not see the execution plan for this piece of the distributed query.
OPTIONS OBJECT_NAM OPTIMIZER OBJEC OTHER
-------- ---------- ---------- ----- -----------
TABLE ACCESS FULL BOOK
LE.CO _DETAILS" M
Now that we see how to get remote execution plans, let's look at how to
control the site for the remote execution.
Determining the driving site and driving table for cross-database queries
Just like an SQL query has a "driving table", a distributed query has a
"driving site". The choice of the driving site can be
manipulated with the driving_site SQL hint. Here is an example
of a remote query with a driving_site hint:
In this example, we see that we have directed Oracle to make the r1 site
(PUBS) the driving site for this query.
Let's get the distributed execution plan for the following SQL. Note
that 3 of the four database tables are remote, and only the book_details
table is local to this instance.
truncate table plan_table;
explain plan for
Here is the execution plan. In the output we see
the remote notation.
OPTIONS OBJECT_NAM OPTIMIZER OBJEC OTHER
---------------------- -------- ---------- ---------- -----
F EWPUB ROM "AUTHOR"
NES.N "BOOK_KEY" FROM
"BOO EWPUB K_AUTHOR"
EWPUB AIL_PRICE" FROM
"BOO S K" "BOOK"
Let's take a closer look at what is happening inside a distributed SQL
statement. Note that Oracle has broken-up the query into many separate
- Independent SQL is sent to each remote Oracle database
- Result sets are sent back from each remote Oracle database. The
query cannot continue until all remote Oracle databases have sent their
results back over Oracle*Net.
The independent result from each remote request are collected inside the SGA
RAM memory. Only after all results are collected can the SQL progress
to join the tables.
- Oracle performs hash joins on the remote result sets, creating a single
result set for the query.
- Oracle then sorts the result set (for the ORDER BY clause) in on the
originating Oracle database.
- Oracle passes the completed result back to the originating query.
Tips for distributed
SQL execution plans
Given the challenges of distributed SQL execution plans, there are several
things that you can do to improve the performance of distributed SQL query
Choose the right driving table
- Always execute the join remotely when the remote table is much larger than
the local one!
Pull vs. Push
- In general, performance can be faster if you "pull" the data (calling the
remote table from the master instance), as opposed to a "push" where you
perform the join on the remote table. This is especially true of you
have a large sort, because the rows may be transferred to the remote host
for sorting, and then back again afterwards.
Pre-join remote tables with views
- If you want to join the remote tables, the best way to achieve this is by
building a view of these tables on the remote site. Remote views allow
join operations to happen on the remote Oracle database, thereby reducing
the amount of Oracle*Net traffic. Note that the predicates do not have to be
created within the view as they should be passed within the SQL statement
sent from the local to remote node.
- While correlated subqueries should be avoided in favor of standard joins,
there are cases when a distributed SQL that only returns a small rowset from
a remote database will run faster if you replace a standard join with a
correlated subquery. Using this technique, only the subquery is passed
to the remote database and the remote database is not aware of the
distributed join, making optimization easier to manage.
Make the largest table local to your SQL -
If possible, always try to make the largest table the local table, to
minimize network overhead. You can use the driving_site hint
for this purpose.
Use the ordered hint
- Joining tables across database links is always challenging, and a savvy
developer will use the ordered hint to tip-off the optimizer about
the optimal table join order. This ensures fast SQL response time, and
also reduces the amount of work parsing the SQL statement.
Consider using views
- The ensure optimal performance, consider encapsulating a distributed join
into a remote view. You define the view on the remote database, and
then reference the view from the local database.
Replicate the tables locally
- If the tables are small and non-volatile, using Oracle replication to keep
a local copy of the tables is the fastest alternative to distributed SQL.
If you can reduce network overhead by replicating a read-only table, you
should do so.
- By default all sorting imposed by order by and group by
operations should be done on the local database to minimize network traffic.
Get the Complete
Oracle SQL Tuning Information
The landmark book
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
for 30% off directly from the publisher.