|
 |
|
Tuning SQL with "rownum" filters
Oracle Database Tips by Donald BurlesonJuly 24, 2015
|
Many people forget that the Oracle SQL optimizer is an
in-house computer program, written by software engineers, and it's not
something where the rules of scientific endeavor apply.
Oracle SQL has lots of special nuances and there are
many counterintuitive tips and tricks for tuning Oracle SQL. The
rownum pseudo column is interesting because there are a few legitimate uses
for it:
-
Top-n queries - The rownum can be
used to materialize an in-line view. Beware,
there are
reports that using where rownum<n for top-n queries has the
same net effect as using a first_rows_n hint in the query,
changing the optimizer mode. This is not always a good thing, as adding
the where rownum<n will invalidate the all_rows
optimization, which may make a query run far slower. It's far
better to materialize a subquery using the
WITH clause.
-
Range-bounded queries - In the
discussion below, there are special cases where it may be safe to use
rownum to speed-up special cases of range-bounded queries.
-
DML - In some cases you can use
rownum with DML, but only if you force the update to use an index with
an index hint. See
example here.
However, before we go into more details, note that
complex subqueries can be tuned more efficiently by decomposing the subquery
into separate queries by using the
WITH clause.
Using rownum for top-n queries
As noted, rownum can be used to filter query results, but
it may be done at the expense of poor performance.
Yoni Sade notes that using the where rownum<n caused a query
to run 20x longer, an issue that he claims happens because his predicate (alert_level=3)
isn't being pushed into the view when the rownum is used :
select * from (
select * from all_alerts where alert_level=3 order by alert_time desc);
It takes 5 seconds to get 1000 rows.
When I query to get the last
10 alerts:
select * from (
select * from my_view where alert_level=3 order by alert_time desc)
where rownum<=10;
It takes 2 minutes (!)
In this case, with rownum filter, Oracle must fully
execute the subquery and sort the entire set before returning
the top rows, in some cases, processing thousands of rows.
Beware! This use of
rownum< can cause performance problems.
Using rownum may change the all_rows optimizer mode for a query to
first_rows, causing unexpected sub-optimal execution plans.
One solution is to always include an all_rows hint when using
rownum to perform a top-n query.
For top-n SQL queries, it's more efficient form of the query using the
rank and
over functions may perform faster, with the WINDOW SORT PUSHED RANK
execution plan:
select *
from (select empno, sal, rank()
over (order by sal desc ) rnk from emp) where rnk
<= 5;
Also, note that for tuning top-n queries we need to
ensure that the subquery is using an index to minimize I/O.
Next, let's examine how using rownum might be used to
speed-up range bounded SQL queries.
Using rownum with range bound queries
The "hack" for SQL tuning is used to force a subquery
to use an index and invoke the COUNT STOPKEY execution plan. For a
simple example, consider a query to display all rows with a MAX date:
select
emp_name,
hire_date from emp where hire_date =
(select max(hire_date) from emp);
In this case, it has been suggested that Oracle will
perform an expensive full-scan operation to get the max hire date in the
subquery.
William Robertson explains how the "rownum=1" might be used to make this
query run faster by forcing the subquery to use an index:
"It has long been possible (even before Dan Tow wrote the
article five years ago) to get a subquery or inline view to use an index,
and then filter it to get the first row it returns, instead of using
something like
WHERE somedate = ( SELECT MAX(somedate) FROM
sametable WHERE... )
The idea is you
access the values via an index that you know is internally sorted a
particular way and stop after the first row, avoiding scanning multiple rows
to find the MAX.
Of course, unlike other tuning hacks (like using a
rownum expression in a subquery or inline view to prevent merging or
unnesting), if it doesn't work as expected you don't just get an unexpected
execution plan, you start getting wrong results."
Here is an example of tuning with rownum=1.
select
emp_name,
hire_date from emp where hire_date =
(select max(hire_date) from emp) and
rownum=1;
Another approach might be to retrieve the subquery rows
in pre-sorted order with an index hint:
select
emp_name,
hire_date from emp where hire_date =
(select /+* hire_idx */ max(hire_date) from
emp);
Oracle guru Dan Tow published
this great note on how to make range-bound queries run faster by
including the clause "where rownum=1" to the query.
Dan notes a SQL "trick" notes that SQL can run faster if you add a "where
rownum=1" to the SQL:
"Although I created this trick for use on Oracle, it
should work on MySQL;
use the clause "LIMIT 1" at the end of the query (or at
the subquery inside the FROM clause), in place of the condition "rownum=1",
and
use the hint USE INDEX() (or FORCE INDEX()
) at the end of the table-reference, and the hint /*! STRAIGHT JOIN */ to
force the join order (in place of ORDERED), if necessary. . .
Similar tricks are likely possible in other open
databases, as long as there is a way to force a join order and to stop a
query, especially a subquery in a FROM clause, at the first row."
Oracle guru
Mark Bobak notes that the rownum=1 is used to "short circuit" an index
range scan, useful to speed up range-bounded queries that have overlapping
end points:
"As to the rownum=1, the whole idea there is to short circuit the range
scan, as soon as you have a single match. It's a way of implementing the
idea behind the "least-greater-than-or-equal-to" and
"greatest-less-than-or-equal-to" operators which he discusses in the
article.
To answer your questions, the code that Dan provided went
into production shortly after he provided me that solution, and I believe
it's still going strong, 4 years later."
However, this "trick" appears to be specific to cases where this
solution a result set has overlapping endpoints.
How does adding "where rownum" change the
SQL execution plan?
An Oracle ACE notes
that one reason for the faster performance is the "COUNT STOPKEY" execution
plan step:
"Using rownum only changes the plan slightly. With a rownum
qualification, you should see a 'count stopkey' line in the plan.
Which I believe means that Oracle counts the number of records it finds, and
then stops searching when it reaches the rownum value:
select visit_id
from pat_visit;
ID PID Operation
Name
Rows Bytes Cost
CPU Cost IO Cost
SELECT STATEMENT 5000K
28M 4090
684M 4049 1 0
INDEX FAST FULL SCAN XPKPAT_VISIT
5000K 28M 4090
684M 4049
select visit_id from
pat_visit where rownum <=100;
ID PID
Operation
Name
Rows Bytes Cost CPU
Cost IO Cost 0
SELECT STATEMENT
100 600
2 18881
2 1 0 COUNT STOPKEY
2 1 INDEX
FAST FULL SCAN XPKPAT_VISIT 101
606 2
18881 2
I have seen valid uses for "where
rownum=1" in cases where the existence of any record automatically
required certain processes take place. For example, a patient where
the result of any test at all having a result value in the critical range is
likely to automatically require emergency notifications to hospital
personnel."
Internally, it is not complete clear how this rownum=1 trick work,
especially since it has been noted that the Oracle optimizer changes the
overall optimizer mode behind the curtains.
Rownum and release
level
It's important to note that rownum in SQL behavior has changed
between releases and it behaved quite differently before Oracle10g.
See
Bug 6845871 - Suboptimal plan from ROWNUM predicate. An Oracle ACE
offers these notes:
In earlier versions of
Oracle, the cost calculation did not change when the stopkey was added to
the plan:
alter session set optimizer_features_enable='9.2.0';
select visit_id
from pat_visit v, patient p where p.pat_id=v.pat_id
and rownum <=1000000
ID PID Operation
Name Rows Bytes Cost IO Cost
Temp space 0 SELECT STATEMENT
1000K 15M 4890 4890 1 0 COUNT
STOPKEY 2 1 HASH JOIN
4990K 76M 4890
4890 16M 3 2 INDEX FAST FULL SCAN XPKPATIENT
1000K 4882K 286 286 4 2 TABLE ACCESS FULL
PAT_VISIT 5000K 52M 2598 2598
alter session set optimizer_features_enable='9.2.0';
select visit_id from pat_visit v, patient p where p.pat_id=v.pat_id;
ID PID Operation
Name Rows Bytes Cost CPU Cost
IO Cost 0 SELECT STATEMENT
4990K 76M 4890 4890
1 0 HASH JOIN
4990K 76M 4890 4890 16M
2 1 INDEX FAST FULL SCAN XPKPATIENT 1000K 4882K
286 286 3 1 TABLE ACCESS
FULL PAT_VISIT 5000K 52M 2598 2598
Note the different behavior in 10.2.0.4. Also, note that the optimizer
is smart enough in 10.2.0 to realize it doesn't even need to look at the
patient table. In version prior to 10.2.0, the cost calculation will
remain the same with or without the rownum qualifier.
However, in the execution phase, the version with the rownum qualification
will run faster, if rownum is set to significantly fewer rows than the query
would otherwise retrieve.
In 10.2.0 and later, the rownum value will
affect the actual cost calculation for the plan, as well as reduce the run
time for the query.
alter session set optimizer_features_enable='10.2.0.4';
select visit_id from pat_visit v, patient p where p.pat_id=v.pat_id
and rownum <= 100000;
ID PID Operation
Name Rows Bytes Cost CPU Cost IO Cost Temp
space IN-OUT PQ Dist PStart PStop 0 SELECT
STATEMENT 100K 1074K 96 21M
95 1 0 COUNT STOPKEY
2 1 TABLE
ACCESS FULL PAT_VISIT 100K 1074K 96 21M 95
select visit_id from pat_visit v, patient p where
p.pat_id=v.pat_id ;
ID PID Operation
Name Rows Bytes Cost CPU Cost IO Cost
Temp space IN-OUT PQ Dist PStart PStop 0 SELECT
STATEMENT 5000K 52M 4699 1G 4635
1 0 TABLE ACCESS FULL PAT_VISIT
5000K 52M 4699 1G 4635
Alternatives to rownum
Using rownum is very
dangerous, especially in the hands of beginners, and there are always alternatives to using
rownum:
-
The WITH clause - Another approach to tuning
rownum queries is to
separate-out the subquery using the powerful
WITH clause.
Another benefit of separating-out the sorted subquery is that you can
easily apply either a parallel hint or an index hint,
if it's faster to retrieve the rows in
pre-sorted order.
-
The rank or
row_number analytics - You can replace
rownum in top-n queries with analytics
functions, using rank() or row_number()
instead, getting the same top-in result,
but with much faster response time.
-
Optimizer goal hint - In cases where
rownum is used to change the optimizer
mode to first_rows_n, it may be possible to negate this effect
by using an all_rows hint.
-
Index hint - In cases where
rownum is used to force an index in a subquery, again, deploy an
index hint to duplicate the faster execution
plan.
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|
|