Hypercharging SQL by restructuring queries
I am now about one year into writing my next book “Oracle
SQL Tuning: The Definitive Reference” and I’ve discovered some
fascinating new principles of SQL tuning, counterintuitive
approaches that can make your SQL: run faster than ever before.
Some guru's rightfully say that you should always
solve a problem using SQL without using PL/SQL unless absolutely
necessary, but there is a limit to that technology. Once a SQL
query has subqueries nested within subqueries, subqueries in the
select clause (a scalar subquery), subqueries in the from
clause (an in-line view), the SQL becomes difficult to read, hard to
maintain, and difficult to optimize. In these cases, we can use
some powerful Oracle tools to divide and conquer complex SQL statements.
Because SQL is a declarative language we can
formulate equivalent queries with numerous techniques.
SQL is a “state space” query language where you simply request
the desired rows, there are many ways to write the same query.
Back in the 1970’s, programming competitions were
not concerned with who could solve a problem the fastest but with who
could write the solution that ran the most efficiently.
Today that has changed, and SQL developers are charged with
getting the right rows back as quickly as possible, regardless of the
internal execution plan or the time required to execute the query!
In the examples below we will see a simple two-way
table query that can be written in these forms, each returning identical
results, but with vastly different performance:
-
Outer join with NOT NULL test
-
Non-correlated NOT IN subquery
-
Correlated NOT EXISTS subquery
-
NOT IN subquery using the MINUS clause
-
Standard correlated subquery
Complex SQL behaves in the same fashion, and you
can re-write SQL to improve performance. With the introduction of
advanced analytical SQL functions such as the PIVOT operator in 11g, SQL
coders can now solve complex programming problems without using PL/SQL.
SQL was never designed to perform complex process
logic, and the performance of complex SQL will often be far worse than
when the SQL is decomposed into manageable simple queries. Complex
queries can be re-written in many ways, all with the same results and
different performance:
-
Complex subqueries - Queries
with nested subqueries, scalar subqueries and in-line views can
often perform poorly.
-
Global temporary tables -
Global temporary tables allow you to "divide and conquer", breaking
the SQL into many simple queries.
-
The WITH clause - The SQL-99
WITH clause allows you to execute subqueries independently of the
outer query.
Let's take a closer look at these powerful SQL
re-writing techniques.
One result, many forms
We can see this phenomenon of identical SQL with
different performance very easily. Consider this simple example of a
two-way table join. Each of these queries returns the exact same
results, in this case, all authors who have not yet written a book:
--*******************************************************
--
Outer Join
--*******************************************************
select
author_last_name
from
author a
full outer join
book_author ba
on
a.author_key = ba.author_key
where
royalty is NULL;
--******************************************
-- NOT IN
--******************************************
select
author_last_name
from
author
where
author_key not in
(select author_key from
book_author);
--******************************************
-- NOT EXISTS correlated subquery
--******************************************
select
author_last_name
from
author a
where not exists
(select 1 from book_author ba where ba.author_key =
a.author_key);
--******************************************
-- NOT IN with minus
--******************************************
select
author_last_name
from
author
where
author_key in
(select author_key from author minus select
author_key from book_author);
--******************************************
-- Correlated subquery
--******************************************
select
author_last_name
from
author a
where not exists
(select 1 from book_author ba where ba.author_key =
a.author_key);
While these SQL queries return equal results,
internally, they have vastly different execution plans, with vastly
different performance.
SQL> --******************************************
SQL> --
Outer join
SQL> --******************************************
SQL> select
2
author_last_name
3
from
4
author
a
5
full outer join
6
book_author
ba
7
on
8
a.author_key = ba.author_key
9
where
10
royalty is NULL;
-----------------------------------------------------------------------------------------------
| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------------
|
0 | SELECT STATEMENT
|
|
26 |
572 |
6
(0)| 00:00:01 |
|
1 |
VIEW
|
|
26 |
572 |
6
(0)| 00:00:01 |
|
2 |
UNION-ALL
|
|
|
|
|
|
|*
3 |
FILTER
|
|
|
|
|
|
|
4 |
NESTED LOOPS OUTER
|
|
25 |
1225 |
4
(0)| 00:00:01 |
|
5 |
TABLE ACCESS FULL
| AUTHOR
|
10 |
360 |
2
(0)| 00:00:01 |
|
6 |
TABLE ACCESS BY INDEX ROWID| BOOK_AUTHOR
|
3 |
39 |
2
(0)| 00:00:01 |
|*
7 |
INDEX RANGE SCAN
| SYS_C004000 |
3
|
|
1
(0)| 00:00:01 |
|
8 |
NESTED LOOPS ANTI
|
|
1 |
13 |
2
(0)| 00:00:01 |
|*
9 |
TABLE ACCESS FULL
| BOOK_AUTHOR |
1 |
8 |
2
(0)| 00:00:01 |
|* 10 |
INDEX UNIQUE SCAN
| SYS_C003994 |
10 |
50 |
0
(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
127
consistent gets
SQL> --******************************************
SQL> --
NOT IN
SQL> --******************************************
SQL>
SQL> select
2
author_last_name
3
from
4
author
5
where
6
author_key not in
7
(select
author_key from book_author);
----------------------------------------------------------------------------------
| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------------
|
0 | SELECT STATEMENT
|
|
2 |
34 |
3
(0)| 00:00:01 |
|
1 |
NESTED LOOPS ANTI |
|
2 |
34 |
3
(0)| 00:00:01 |
|
2 |
TABLE ACCESS FULL| AUTHOR
|
10 |
120 |
2
(0)| 00:00:01 |
|*
3 |
INDEX RANGE SCAN | SYS_C004000 |
20 |
100 |
1
(0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
7
consistent gets
SQL> --******************************************
SQL> --
NOT EXISTS correlated subquery
SQL> --******************************************
SQL>
SQL>
SQL> select
2
author_last_name
3
from
4
author a
5
where not exists
6
(select 1 from book_author ba where
ba.author_key = a.author_key);
----------------------------------------------------------------------------------
| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------------
|
0 | SELECT STATEMENT
|
|
2 |
34 |
3
(0)| 00:00:01 |
|
1 |
NESTED LOOPS ANTI |
|
2 |
34 |
3
(0)| 00:00:01 |
|
2 |
TABLE ACCESS FULL| AUTHOR
|
10 |
120 |
2
(0)| 00:00:01 |
|*
3 |
INDEX RANGE SCAN | SYS_C004000 |
20 |
100 |
1
(0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
7
consistent gets
SQL> --******************************************
SQL> --
NOT IN with minus
SQL> --******************************************
SQL>
SQL>
SQL> select
2
author_last_name
3
from
4
author
5
where
6
author_key in
7
(select
author_key from author minus select author_key from book_author);
--------------------------------------------------------------------------------------------
| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------------
|
0 | SELECT STATEMENT
|
|
10 |
190 |
5
(40)| 00:00:01 |
|
1 |
NESTED LOOPS
|
|
10 |
190 |
5
(40)| 00:00:01 |
|
2 |
VIEW
| VW_NSO_1
|
10 |
70 |
4
(50)| 00:00:01 |
|
3 |
MINUS
|
|
|
|
|
|
|
4 |
SORT UNIQUE
|
|
10 |
50 |
|
|
|
5 |
INDEX FULL SCAN
| SYS_C003994 |
10 |
50 |
1
(0)| 00:00:01 |
|
6 |
SORT UNIQUE
|
|
25 |
125 |
|
|
|
7 |
INDEX FULL SCAN
| SYS_C004000 |
25 |
125 |
1
(0)| 00:00:01 |
|
8 |
TABLE ACCESS BY INDEX ROWID| AUTHOR
|
1 |
12 |
1
(0)| 00:00:01 |
|*
9 |
INDEX UNIQUE SCAN
| SYS_C003994 |
1 |
|
0
(0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
9
consistent gets
SQL> --******************************************
SQL> --
Correlated subquery
SQL> --******************************************
SQL>
SQL> select
2
author_last_name
3
from
4
author a
5
where not exists
6
(select 1 from book_author ba where
ba.author_key = a.author_key);
----------------------------------------------------------------------------------
| Id
| Operation
| Name
| Rows
| Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------------
|
0 | SELECT STATEMENT
|
|
2 |
34 |
3
(0)| 00:00:01 |
|
1 |
NESTED LOOPS ANTI |
|
2 |
34 |
3
(0)| 00:00:01 |
|
2 |
TABLE ACCESS FULL| AUTHOR
|
10 |
120 |
2
(0)| 00:00:01 |
|*
3 |
INDEX RANGE SCAN | SYS_C004000 |
20 |
100 |
1
(0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
7
consistent gets
This idea of many ways to write an equivalent query
is especially important when se write complex SQL with subqueries.
In some cases, using global temporary tables and the WITH clause can
hypercharge SQL by simplifying the queries by decomposing them into
individual pieces.
Using the
WITH clause to simplify complex SQL
This same principle holds true for complex queries.
Consider this example of a query to list all stores with above
average sales:
To keep it simple, the following example only
references the aggregations once, where the SQL WITH clause is normally
used when an aggregation is referenced multiple times in a query.
The following is an example of a request to see the
names of all stores with above average sales. For each store, the
average sales must be compared to the average sales for all stores:.

Essentially, the query below accesses the STORE and
SALES tables, comparing the sales for each store with the average sales
for all stores. To answer this query, the following information
must be available:
§
The total sales for all stores.
§
The number of stores.
§
The sum of sales for each store.
To answer this in a single SQL statement, inline
views will be employed along with a subquery inside a HAVING clause:
select
store_name,
sum(quantity)
store_sales,
(select sum(quantity) from sales)/(select count(*) from store)
avg_sales
from
store s,
sales sl
where
s.store_key = sl.store_key
having
sum(quantity) > (select sum(quantity) from sales)/(select count(*)
from store)
group by
store_name
;
While this query provides the correct answer, it is
difficult to read and complex to execute as it is recomputing the sum of
sales multiple times. We can also specify this query using temporary
tables:
create table t1 as
select sum(quantity) all_sales from stores;
create table t2 as
select count(*) nbr_stores from stores
create table t3 as
select store_name, sum(quantity) store_sales from store natural
join sales;
select
store_name
from
t1,
t2,
t3
where
store_sales > (all_sales / nbr_stores)
;
Next, we can re-write this query yet again, using the
SQL-99 WITH clause:
with
number_stores as
(select count(*) nbr_stores from store),
total_sales as
(select sum(quantity) all_sales from sales),
store_sales as
(select store_name, sum(quantity) sales from store natural join
sales
group by
store_name)
select
store_name
from
number_stores,
total_sales,
store_sales
where
sales > (all_sales
/ nbr_stores);
Again, each of these queries returns identical
results, but very different ways.
For the above queries, see this benchmark showing the difference in
performance of complex SQL subqueries rewritten with intermediate tables