Search BC Oracle Sites

Recursive SQL

SQL Tips by Donald Burleson

DB2 and SQL Server 2005 support the ANSI SQL standard recursive SQL, which renders the transitive closure effortlessly:

( select tail, head from Edges
union all
select e.tail, ee.head from Edges e, TransClosedEdges ee

)
select distinct * from TransClosedEdges

This query looks artificial at first. It requires a certain educational background to fully appreciate it.

Consider the adjacency matrix of a graph. It is a square matrix with dimensions equal to the number of nodes. It is conventional to enumerate graph nodes with numbers from 1 to N, therefore, matching nodes with matrix columns and rows. With this arrangement matrix, entry aij naturally correspond to an edge from node i to node j. If there is indeed such an edge in a graph, then we define aij=1; otherwise, aij=0.

For our purposes the powers of adjacency matrix are especially interesting. The entry in column i and row j of the adjacency matrix raised in the n-th power is the number of paths of length n from node i to node j.

Although the square of the adjacency matrix in Figure 6.7 is the adjacency matrix of the graph shown in Figure 6.8, in general, the power of the adjacency matrix does not have be an adjacency matrix anymore.

There are various ways to fix this problem, and these ways will be explained  later. For now, given the adjacency matrix A, consider a formal series:

By adding the matrix powers An what kind of matrix TA is produced? The reader can easily convince himself that an entry in column i, row j of matrix An is the number of paths of length n from node i to node j in the original graph. Then, an entry in matrix TA is the number of paths of any length from node i to node j.

For adjacency matrices corresponding to directed acyclic graphs the powers would evaluate to 0 for a sufficiently large n, and the formal series TA is finite. In other words, all the paths in a directed acyclic graph have their length bounded.

Given a transitive closure series matrix TA, if any nonzero number is changed into 1, then TA can be converted and obtained into an adjacency matrix!

Next, the matrix powers sum is arranged a little bit differently:

The series in the parenthesis is again TA, hence:

It is this expression that is compared against the recursive with query in the beginning of the section. First, consider the product of matrices A and TA. Matrix multiplication in SQL can be expressed as:

table A (
i   integer, -- column i
j   integer, -- column j
val number   -- value of A(i,j)
);
table T_A (
i   integer,
j   integer,
val number
);

select A.i AS i, T_A.j AS j, sum(A.val*T_A.val) AS val
from A, T
where A.j=T_A.i
group by A.i, T_A.j

The join between A and TA looks similar to the join between Edges and TransClosedEdges in the recursive SQL query that was introduced in the beginning of the section. It is the aggregation part that appears to make them look alike.

This is not the only way to multiply the matrices in SQL, however. Remember that products and sums of adjacency matrices have nonnegative integer entries. Therefore, instead of having a single row (i,j,val), we can have a bag of val identical rows of (i,j). Schema design, once again, is important!

Counting with Bags

SQL operates with bags of values, which are not much different from numbers in base-1 number system. Any record with a nonnegative count field can be converted into a bag of identical records without this field. Bags are added with the union all operator, e.g.

union all

we multiply them with Cartesian product, e.g.

from Edges1 e1, Edges2 e2

The bag result could be aggregated back into a set of records (by counting).

In a bag design matrix, multiplication reduces to:

table A (
i   integer, -- column i
j   integer  -- column j
);
table T_A (
i   integer,
j   integer
);

select A.i AS i, T_A.j AS j
from A, T
where A.j=T_A.i

The right side of the equation that defines a transitive closure

could be written in full in SQL as:

select i, j
from A
union all
select A.i AS i, T_A.j AS j
from A, T
where A.j=T_A.i

After renaming variables appropriately, it becomes indistinguishable from the recursive view TransClosedEdges definition in the query:

( select tail, head from Edges
union all
select e.tail, ee.head from Edges e, TransClosedEdges ee

)
select distinct * from TransClosedEdges

The outer query transforms a bag back into a set because it is of interest to know if there are paths from node tail to node head, rather than their exact number.

Given the matrix interpretation of the recursive transitive closure query, the matrix evaluation can be translated into the SQL execution steps as follows:

A reader who is already familiar with the recursive SQL from another source may have already learned an entirely different algorithm. SQL, however, is about declarative programming and not about the algorithms. Any algorithm would do as long as it produces the correct result set. Recursive SQL is known to have many execution strategies: na?e, incremental (semi-naive), etc. Our algorithm could be coined as matrix evaluation.

There is a pitfall, however. The query would never terminate on graphs with cycles. Indeed, a cycle that passes through node i would eventually produce the edge (tail=i, head=i) in the TransClosedEdges relation. The TransClosedEdges relation never shrinks; therefore this record would remain in the TransClosedEdges. Then, the join at step 2 would never be empty.

One of the solutions to the cycle problem is suggested in the DB2 Cookbook by Graeme Birchall. Recursive with construction is very powerful since extra columns may be introduced and those columns would be calculated recursively! One such column fits naturally into our query - the path expression:

from Edges
union all
select e.tail, ee.head, e.tail||?.?||ee.path AS path

from Edges e, TransClosedEdges ee

)
select distinct * from TransClosedEdges

Then Graeme goes on introducing a function called LOCATE_BLOCK, which could be used in the where clause as an indicator that the current node is in the path already:

from Edges
union all
select e.tail, ee.head, e.tail||?.?||ee.path AS path

from Edges e, TransClosedEdges ee

)
select distinct * from TransClosedEdges

There is a more satisfactory solution to the cycle problem, though. Both DB2 and SQL Server slightly deviated from the ANSI SQL standard which demands a union in the recursive query definition. With the set semantics, the transitive closure relation TransClosedEdges cannot grow larger than a complete graph -- a graph in which every pair of nodes is connected.

Let's complete the section with simple wisdom about performance. Clearly, the Edges.head column has to be indexed if this query is to be scaled to a hierarchy of any significant size. The same comment applies to the Oracle solution discussed next.

Connect By

There are several ways to query a transitive closure in Oracle, beginning with parsing the sys_connect_by_path pseudo column and ending with firing the corellated connect by subquery:

select a.tail, b.head from Edges a, Edges b
connect by prior head = tail
)

Without question the most satisfactory method both aesthetically and performance-wise is:

select connect_by_root(tail), tail
from Edges
connect by prior head = tail

This transitive closure query is succinct, but still the syntax could be improved. Apparently, the designers were preoccupied with tree problems, hence the connect_by_root pseudo column. There is no concept of the root node anywhere in the transitive closure problem scope.  Likewise, the concept of the prior edge belongs to the solution space rather than is inherent to the transitive closure problem. Overall, the transitive closure is defined symmetrically in terms of the head and tail, but the query above is skewed.

The cycle issue is no-brainer, as there is a dedicated keyword to take care of it:

select connect_by_root(tail), tail
from Edges
connect by nocycle prior head = tail

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning