Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Recursive SQL

SQL Tips by Donald Burleson

DB2 and SQL Server 2005 support the ANSI SQL standard recursive SQL, which renders the transitive closure effortlessly:

with TransClosedEdges (tail, head) as
( select tail, head from Edges
  union all
  select e.tail, ee.head from Edges e, TransClosedEdges ee

  where e.head = ee.tail
)
select distinct * from TransClosedEdges

This query looks artificial at first. It requires a certain educational background to fully appreciate it.

Consider the adjacency matrix of a graph. It is a square matrix with dimensions equal to the number of nodes. It is conventional to enumerate graph nodes with numbers from 1 to N, therefore, matching nodes with matrix columns and rows. With this arrangement matrix, entry aij naturally correspond to an edge from node i to node j. If there is indeed such an edge in a graph, then we define aij=1; otherwise, aij=0.

For our purposes the powers of adjacency matrix are especially interesting. The entry in column i and row j of the adjacency matrix raised in the n-th power is the number of paths of length n from node i to node j.

Although the square of the adjacency matrix in Figure 6.7 is the adjacency matrix of the graph shown in Figure 6.8, in general, the power of the adjacency matrix does not have be an adjacency matrix anymore.

There are various ways to fix this problem, and these ways will be explained  later. For now, given the adjacency matrix A, consider a formal series:

By adding the matrix powers An what kind of matrix TA is produced? The reader can easily convince himself that an entry in column i, row j of matrix An is the number of paths of length n from node i to node j in the original graph. Then, an entry in matrix TA is the number of paths of any length from node i to node j.

For adjacency matrices corresponding to directed acyclic graphs the powers would evaluate to 0 for a sufficiently large n, and the formal series TA is finite. In other words, all the paths in a directed acyclic graph have their length bounded.

Given a transitive closure series matrix TA, if any nonzero number is changed into 1, then TA can be converted and obtained into an adjacency matrix!

Next, the matrix powers sum is arranged a little bit differently:

The series in the parenthesis is again TA, hence:

It is this expression that is compared against the recursive with query in the beginning of the section. First, consider the product of matrices A and TA. Matrix multiplication in SQL can be expressed as:

table A (
   i   integer, -- column i
   j   integer, -- column j
   val number   -- value of A(i,j)
);
table T_A (
   i   integer,
   j   integer,
   val number
);  

select A.i AS i, T_A.j AS j, sum(A.val*T_A.val) AS val
from A, T
where A.j=T_A.i
group by A.i, T_A.j


The join between A and TA looks similar to the join between Edges and TransClosedEdges in the recursive SQL query that was introduced in the beginning of the section. It is the aggregation part that appears to make them look alike.

This is not the only way to multiply the matrices in SQL, however. Remember that products and sums of adjacency matrices have nonnegative integer entries. Therefore, instead of having a single row (i,j,val), we can have a bag of val identical rows of (i,j). Schema design, once again, is important!

Counting with Bags

SQL operates with bags of values, which are not much different from numbers in base-1 number system. Any record with a nonnegative count field can be converted into a bag of identical records without this field. Bags are added with the union all operator, e.g.

select head, tail from Edges1

union all

select head, tail from Edges2

we multiply them with Cartesian product, e.g.

select e1.head, e1.tail, e2.head, e2.tail

from Edges1 e1, Edges2 e2

The bag result could be aggregated back into a set of records (by counting).

In a bag design matrix, multiplication reduces to:

table A (
   i   integer, -- column i
   j   integer  -- column j
);
table T_A (
   i   integer,
   j   integer
);  

select A.i AS i, T_A.j AS j
from A, T
where A.j=T_A.i


The right side of the equation that defines a transitive closure

could be written in full in SQL as:

select i, j
from A
union all
select A.i AS i, T_A.j AS j
from A, T
where A.j=T_A.i

After renaming variables appropriately, it becomes indistinguishable from the recursive view TransClosedEdges definition in the query:

with TransClosedEdges (tail, head) as
( select tail, head from Edges
  union all
  select e.tail, ee.head from Edges e, TransClosedEdges ee

  where e.head = ee.tail
)
select distinct * from TransClosedEdges

The outer query transforms a bag back into a set because it is of interest to know if there are paths from node tail to node head, rather than their exact number.

Given the matrix interpretation of the recursive transitive closure query, the matrix evaluation can be translated into the SQL execution steps as follows:

A reader who is already familiar with the recursive SQL from another source may have already learned an entirely different algorithm. SQL, however, is about declarative programming and not about the algorithms. Any algorithm would do as long as it produces the correct result set. Recursive SQL is known to have many execution strategies: na?e, incremental (semi-naive), etc. Our algorithm could be coined as matrix evaluation.

There is a pitfall, however. The query would never terminate on graphs with cycles. Indeed, a cycle that passes through node i would eventually produce the edge (tail=i, head=i) in the TransClosedEdges relation. The TransClosedEdges relation never shrinks; therefore this record would remain in the TransClosedEdges. Then, the join at step 2 would never be empty.

One of the solutions to the cycle problem is suggested in the DB2 Cookbook by Graeme Birchall. Recursive with construction is very powerful since extra columns may be introduced and those columns would be calculated recursively! One such column fits naturally into our query - the path expression:

with TransClosedEdges (tail, head) as
( select tail, head, tail||?.?||head AS path

  from Edges
  union all
  select e.tail, ee.head, e.tail||?.?||ee.path AS path

  from Edges e, TransClosedEdges ee

  where e.head = ee.tail
)
select distinct * from TransClosedEdges

Then Graeme goes on introducing a function called LOCATE_BLOCK, which could be used in the where clause as an indicator that the current node is in the path already:

with TransClosedEdges (tail, head) as
( select tail, head, tail||?.?||head AS path

  from Edges
  union all
  select e.tail, ee.head, e.tail||?.?||ee.path AS path

  from Edges e, TransClosedEdges ee

where e.head = ee.tail

and LOCATE_BLOCK(e.head, path) = 0
)
select distinct * from TransClosedEdges

There is a more satisfactory solution to the cycle problem, though. Both DB2 and SQL Server slightly deviated from the ANSI SQL standard which demands a union in the recursive query definition. With the set semantics, the transitive closure relation TransClosedEdges cannot grow larger than a complete graph -- a graph in which every pair of nodes is connected.

Let's complete the section with simple wisdom about performance. Clearly, the Edges.head column has to be indexed if this query is to be scaled to a hierarchy of any significant size. The same comment applies to the Oracle solution discussed next.

Connect By

There are several ways to query a transitive closure in Oracle, beginning with parsing the sys_connect_by_path pseudo column and ending with firing the corellated connect by subquery:

select a.tail, b.head from Edges a, Edges b
where b.head in (
      select head from Edges
      connect by prior head = tail     
      start with tail = a.tail
)

Without question the most satisfactory method both aesthetically and performance-wise is:

select connect_by_root(tail), tail
from Edges
connect by prior head = tail


This transitive closure query is succinct, but still the syntax could be improved. Apparently, the designers were preoccupied with tree problems, hence the connect_by_root pseudo column. There is no concept of the root node anywhere in the transitive closure problem scope.  Likewise, the concept of the prior edge belongs to the solution space rather than is inherent to the transitive closure problem. Overall, the transitive closure is defined symmetrically in terms of the head and tail, but the query above is skewed.

The cycle issue is no-brainer, as there is a dedicated keyword to take care of it:

select connect_by_root(tail), tail
from Edges
connect by nocycle prior head = tail

 

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.