Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 
 

SQL to compare rows within two tables

Oracle Tips by Burleson Consulting

Oracle Corporation's developer Vadim Tropashko has some interesting notes on tuning Oracle SQL queries that compare the contents of two tables, showing several SQL solutions and their performance within the Oracle cost-based optimizer. 

Vadim Tropashko's  is the author of "SQL Design Patterns: The Expert Guide to SQL Programming", the definitive reference for SQL design patterns, a critical knowledge area for any Oracle developer.

Vadim also shows a great example of using the hidden parameter _convert_set_to_join to improve SQL execution speed for queries that find the semantic difference between two tables, and he shows how to compare the rows and columns of two tables with Oracle SQL syntax. 

(Note: Hidden parameters are extremely powerful and you should always thoroughly test all use of hidden SQL tuning parameters and only use hidden parms like _coinvert_set_to_join at the direction of Oracle technical support).

Vadim Tropashko's new book "SQL Design Patterns: The Expert Guide to SQL Programming", is the very first book to apply the mathematical foundation of SQL in-terms of design patterns, a must-own book for any professional SQL developer.

Computing table content differences with Oracle SQL

The question "what is the difference between two tables? is simple to ask, but difficult to code.  It's like an "anti-union" where we seek only the unique rows within each table.

Tropashko has a great analysis of this common SQL problem from both a mathematical perspective and a practical Oracle perspective. It also shows an undocumented technique for improving execution speed for querying the non-intersection of two tables. 

Vadim notes that the _convert_set_to_join parameter can be used when comparing the contents of two tables.  This type of SQL query is called a "semantic difference", or an "anti-union" operation.  Tropashko depicts the table comparison problem below:

In the above figure we read the "/" as "not in" and the "∪" as "union".   In sum, we want the table comparison SQL to find:

Rows in table A that are not in table B
plus
Rows in table B that are not in table A

Vadim notes that the obvious solution van have poor performance:

The figure 1 with the expressions (A \ B) (B \ A) and (A B) \ (A B) pretty much exhaust all the theory involved. It is straightforward to translate them into SQL queries. Here is the first one:

 
(select * from A 
 minus
 select * from B) -- Rows in A that are not in B
union all
 select * from B 
 minus
 select * from A
) -- rows in B that are not in A

In practice, however, this query is a sluggish performer. With a na?e evaluation strategy, the execution flow and the operators are derived verbatim from the SQL which we have written. First, each table has to be scanned twice. Then, four sort operators are applied in order to exclude duplicates. Next, the two set differences are computed, and, finally, the two results are combined together with the union operator. . .

After executing the symmetric difference query, and capturing the row-source execution statistics (from the v$sql_plan_statistics view) we get the following result:


Execution plan for comparing the contents of two tables with MINUS operator

Rewriting Oracle SQL MINUS operator with a NOT IN subquery

Vadim notes that the Oracle query rewrite might come into play where a "set operation" (the minus operator) might be converted into a join by the CBO:

It confirms our intuition on the number of table scans and sorts. It also reveals that the time scanning a single table (about 100 msec) is small compared to the time spent when sorting each table (about 450 msec). Therefore we could try to aim lower than 2.4 sec of total execution time.

This is not the only possible execution plan for the symmetric difference query. Oracle has quite sophisticated query rewrite capabilities, and transforming set operation into join is one of them.

In our example, both minus operators could be rewritten as anti-joins. As of version 10.2, however, this transformation is not enabled by default, and used as part of view merging/ query unnesting framework only. It can be enabled with

alter session set "_convert_set_to_join"= true;

The other alternative is to rewrite the SQL query manually [replacing the minus operator with a NOT IN subquery] evidences about 30% improvement in execution time . . .

select *
from A
where (col1,col2,?) not in
   (select col1,col2,? from B)
   union all
select * from B
where (col1,col2,?) not in
   (select col1,col2,? from A);


The new execution statistics are very different and faster, using hash joins:


Execution plan for comparing the contents of two tables with NOT IN subquery

Disabling sort merge and hash joins

Tropashko notes that the hash joins may not be the fastest table join method and he removes them  by unsetting the hash_join_enabled parameter and reviews the resulting nested loops table join method:

alter session set "_hash_join_enabled" = false;
alter session set "_optimizer_sortmerge_join_enabled" = false;

The particular query that we are studying in this section has 5 query blocks: 2 pairs of selects from the tables A and B, and one set operation. After this query is transformed, it would consist of only 3 query blocks which we might witness from the QBLOCK_NAME column of the PLAN_TABLE.


Execution plan for comparing the contents of two tables  with sort merge disabled

Forcing nested loop joins

Vadim next tries forcing nested loop joins with the use_nl hint and discovers that the execution speed gets worse, not faster:

Those query block names allow directing the hints to the query blocks where they are supposed to be applied. In our case, the joint method can be hinted as

/*+ use_nl(@"SEL$74086987" A)use_nl(@"SET$D8486D66" B)*/

which produces the following rowsource level execution statistics:


Execution plan for comparing the contents of two tables with nested loop joins

From the statistics output we see significant increase in buffer gets which is not offset by any noticeable improvement in the execution time. It is fair to conclude that indexed nested loops didn't meet our performance expectations in this case.

Using SQL aggregation to compute the difference between two tables

Tropashko notes that this anti-union semantic difference problem can also be solved using the SQL CASE operator with in-line views (select statements in the FROM clause).  He notes a decrease in buffer gets, but no change in overall execution time:

select * from (
   select id, name,
   sum(case when src=1 then 1 else 0 end) cnt1,
   sum(case when src=2 then 1 else 0 end) cnt2
from
   (
   select id, name, 1 src from A
   union all
   select id, name, 2 src from B
   )
group by id, name
)
where cnt1 <> cnt2;


This is appeared as rather elegant solution where each table has to be scanned once only, until we capture execution statistics.  Although we succeeded decreasing buffer gets count in half, the execution time still remained around 2 sec.


Execution plan for comparing the contents of two tables with SQL aggregation

Hash Value based Table Comparison SQL Syntax

 

Tropashko notes yet another technique for using SQL to compare the contents of two tables using a hash-based method:

When comparing data in two tables with identical signatures there actually are two questions that one might want to ask:
 

  • Is there any difference? The expected answer is Boolean.

  • What are the rows that one table contain, and the other doesn't.


Answering the second question implies that we can answer the first, yet, it is possible that the first query might have better performance. We?ll approach it with the hash-based technique in mind.

 

The standard disclaimer of any hash-based method is that it is theoretically possible to get a wrong result. The rhetorical question, however, is how often did the reader experience a problem due to hash value collision? I never did. Would a user accept a tradeoff: significant performance boost of the table difference query at the expense of remote possibility of getting an incorrect result?

 

The idea of hash based method is associating a single hash value with a table. This hash value behaves like aggregate, and therefore, it can be calculated incrementally: if a row is added into a table, then a new hash value is a function of the old hash value and the added row. Incremental evaluation means good performance, and the two hash values comparison is done momentarily.


Comparing the contents of two tables with SQL hash weighting

Figure 2 is a method to ?weight? a table into a single value involves concatenating all the fields per each row first. Then a concatenated string is mapped into a number -- hash value. Finally all hash values are aggregated into a single one. Had a single field in a single row have change, the table weight would change as well.

 

A naive implementation is as simple as this SQL:

select sum(DBMS_UTILITY.get_hash_value(id||'|'||name,
                                       1,
                                       POWER(2,16)-1))
from A
 union all
select sum(DBMS_UTILITY.get_hash_value(id||'|'||name,
                                       1,
                                       POWER(2,16)-1))
from B;

We concatenate all the row fields (with a distinct separator, in order to guarantee uniqueness), then translate this string into a hash value.

 

Using get_hash_values to computer table row differences

 

Tropashko notes that you can use the dbms_utility packages get_hash_value procedure to rewrite this query with the POWER clause:

Alternatively, we could have calculated hash values for each field, and then apply some asymmetric function in order for the resulting hash value to be sensitive to column permutations:

select sum(DBMS_UTILITY.get_hash_value(id
                                       1,
                                       POWER(2,16)-1)
       + 2*DBMS_UTILITY.get_hash_value(name
                                       1,
                                       POWER(2,16)-1))
from A
 union all
select sum(DBMS_UTILITY.get_hash_value(id
                                       1,
                                       POWER(2,16)-1)
       + 2*DBMS_UTILITY.get_hash_value(name
                                       1,
                                       POWER(2,16)-1))
from B;

Row hash values are added together with ordinary sum aggregate, but we could have written a modulo 216-1 user-defined aggregate hash_sum in the spirit of CRC (Cyclic Redundancy Check) technique.

For more on advanced Oracle SQL tuning, see Vadim Tropashko's  book "SQL Design Patterns: The Expert Guide to SQL Programming"

 
Get the Complete
Oracle SQL Tuning Information 

The landmark book "Advanced Oracle SQL Tuning  The Definitive Reference"  is filled with valuable information on Oracle SQL Tuning. This book includes scripts and tools to hypercharge Oracle 11g performance and you can buy it for 30% off directly from the publisher.


 

 

��  
 
 

 
 
 
 
oracle dba poster
 

 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright ? 1996 -  2014

All rights reserved by Burleson

Oracle ? is the registered trademark of Oracle Corporation.