Search BC Oracle Sites

# Incremental Evaluation

SQL Tips by Donald Burleson

Transitive closure enjoys a lot of attention in the database research community. Proving the impossibility of doing things in a certain way is one of the favorite theoretical topics. Not surprisingly, it was established very early that transitive closure cannot be expressed by simple means of relational algebra, even enhanced with aggregation and grouping.

We have already seen the power of the incremental evaluation idea in the database implementation world. Indexes and materialized views are the most familiar incremental evaluation structures. Dong et al proved that transitive closure can be efficiently maintained via an incremental evaluation system. In this section Dong's approach is explored, although with some deviation that would simplify the matter.

Let's revisit the transitive closure expression in terms of the adjacency matrix:

From a mathematical perspective this is quite a handsome series. It could be made even prettier if we consider the identity matrix - the matrix with ones on the main diagonal and zeros elsewhere. A common notation for the identity matrix is symbol 1. Perhaps the most important equality involving the identity matrix is:

There it shows that the identity matrix literally begs to be added in front of the series for the transitive closure:

Speaking in graph language, what happened is the loops were included at each node. Formally, T is now both a reflexive and transitive closure.

Are there any immediate benefits? Let's multiply T by 1 - A

All the nonzero powers of A at the right side cancel out:

Multiplying both sides by the inverse of 1 - A gives an explicit formula for the transitive closure:

Therefore, we might be able to calculate transitive closure (of directed acyclic graphs, at least), if we know how to invert matrices in SQL! Unfortunately, inverting matrices in SQL is difficult. The matrix approach, however, still shows its practical merit in the scope of an incremental evaluation system.

In the incremental evaluation method, the original graph is stored together with another graph -- its transitive closure. Every time the original graph is updated that is a new edge is inserted or deleted; the transitive closure graph has to be changed as well so that the information in both structures remains consistent. The problem of transitive closure graph maintenance essentially is finding an efficient SQL expression for the insertions and deletions in the transitive closure graph.

Let's continue pursuing the adjacency matrix approach. Inserting an edge (x,y) into the original graph produces a new graph with the adjacency matrix that is the original matrix A incremented by S, the latter being a matrix with a single nonzero entry in column x, row y. How would this increment of the adjacency matrix affect the transitive closure matrix? Well, let's calculate. The new transitive closure matrix is:

Expanding powers into polynomials the result is:

Let's rearrange terms more suggestively:

The first series is the familiar transitive closure matrix T of the original not incremented adjacency matrix A. The second series could be factored into

which reduces to T S T. The last series vanishes if the scope is limited to directed acyclic graphs. Indeed, each term in that series is multiplied by S at least twice. In other words, each term in the series corresponds to a path in the graph that goes through the edge (x,y) twice, which implies a cycle.

Summarizing, the new transitive closure matrix reduces to:

Incremental Maintenance of Graph Matrices

When the original adjacency matrix A is incremented by S, the transitive closure matrix T is incremented by T S T.

This is a very succinct result. The result is double checked by comparing it with the query computing the transitive closure increment in Dong's paper:

SELECT *
FROM (

SELECT VALUES (a, b)
UNION
SELECT Start = TC.Start, End = b
FROM TC
WHERE TC.End = a
UNION
SELECT Start = a, End = TC.End
FROM TC
WHERE b = TC.Start
UNION
SELECT Start = TC1.Start, End = TC2.End
FROM TC AS TC1, TC AS TC2
WHERE TC1.End = a AND TC2.Start = b

) AS T;

SELECT *
FROM TC?NEW AS T
WHERE NOT EXISTS (SELECT *
FROM TC
WHERE TC.Start=T.Start AND TC.End=T.End)
INTO TEMP DELTA;

INSERT INTO TC
SELECT *
FROM DELTA;

The first part of the query, which is a union of four blocks is supposed to correspond to the transitive closure matrix increment T S T. It appears that they are very dissimilar. How can this be?

Remember, however, that we conveniently decided to operate the reflexive transitive closure matrices. If we go back and represent T as 1+T?, where T? is not reflexive anymore, then the increment matrix has to be written as (1+T?) S (1+T?) which can be expanded into with four terms as in Dong's method.

Now that we are confident with the matrix solution, it can be finalized in terms of SQL. It has already been shown the operating matrices of integers that count paths in the directed acyclic graph offers an exceptional clarity. Therefore, let's represent the transitive closure relation directly after the transitive closure matrix that was investigated earlier:

table TRC (     -- transitive, reflexive closure
i   integer, -- tail

val integer
-- weight
)

Once again, it is the reflexive property and the additional information in the val column that distinguishes this method formally from Dong's.

However, there is an ambiguity. What about zero matrix entries? There is an option available to either store it as (i,j,0) or ignore such rows. The first option simplifies SQL that maintains the table TRC. The second option provides a natural means for compressing sparse matrices.

Sparse Matrices

Sparse matrices save space. Instead of storing full matrix

i j val

- - ---

1 1   0

1 2   0

1 3   1

2 1   0

2 2   0

2 3   0

3 1   0

3 2   5

3 3   0

it is more economical to omit zero entries

i j val

- - ---

1 3   1

3 2   5

Matrix addition query requires a little more care for sparse matrices. Matrix multiplication, however, is an aggregate. It produces correct result set either way.

Now all the ground work for transitive closure maintenance in SQL is complete. Inserting an edge (x,y) into the adjacency graph has to trigger a conforming change in the transitive closure table TRC. The values in the TRC.val column should be incremented accordingly by the entries of the product of three matrices T S T.

Knowing how to write a product of two arbitrary matrices in SQL, the product of three matrices -- A B C -- can be written as a composition of two binary product operations (A B) C. Alternatively, matrix elements are summed up at once

which is easy to translate into SQL:

select A.i AS i, C.j AS j, sum(A.val*B.val*C.val) AS val
from A, B, C
where A.j=B.i and B.j=C.i
group by A.i, C.j

Please note how naturally the matrix associativity property goes along with the relational join associativity.

Actually, the goal is a simpler matrix product -- T S T. There is no challenge adapting the general case to our needs:

select t1.i AS i, t2.j AS j, sum(t1.val*t2.val) AS val
from TRC t1, TRC t2
where t1.j = :x and t2.i = :y
group by t1.i, t2.j

If the option of storing zero matrix entries is chosen, then the above query fits naturally into an update statement to the TRC table:

update TRC
set val = (
select val + sum(t1.val*t2.val)
from TRC t1, TRC t2
where t1.j = :x and t2.i = :y

and t1.i = trc.i, t2.j = trc.j

group by t1.i, t2.j
)

It is fascinating that the transitive closure table maintenance is solvable with a single update, but this answer is unrealistic for two reasons. First, the TRC table has to grow at some moment, and this issue is left out of scope. Second, from a performance perspective, updating or trying to update all the rows in the TRC table smells a disaster. It would be helpful to know which rows require update, and formalize it within the where clause (which is entirely missing).

Therefore, let's materialize updates to the TRC table in a designated table TRCDelta:

insert into TRCDelta
select t1.i AS i, t2.j AS j, sum(t1.val*t2.val) AS val
from TRC t1, TRC t2
where t1.j = :x and t2.i = :y
group by t1.i, t2.j

To keep this table small, matrix entries should not be stored with zero values any longer. On the other hand, by just storing a sparse matrix stored in the TRC table, the TRCDelta is automatically calculated as a sparse matrix either.

It is the TRC table update step that has to be adjusted. There are two cases:

1        All the rows in TRCDelta that do not match TRC rows have to be inserted:

insert into TRC
select * from TRCDelta
where (i,j) not in (select i,j from TRC)

1        The rows that match have to increment their values:

update TRC
set val = val + (select val from TRCDelta td
where td.i = trc.i and td.j = trc.j)
where (i,j) in (select i,j from TRCDelta)

This completes the program in the case of an insertion of an edge. What about a deletion? From the matrix perspective the cases of inserting an edge and deleting it are symmetric. Following the earlier development, the same matrix S that corresponds to a single edge (x,y) could be used, except that it must be subtracted everywhere. The final result in the matrix form is that transitive closure matrix T is decremented by T S T.

In the first na?e update solution for the TRC table all that must be done is to reverse the sign:

update TRC
set val = (
select val
- sum(t1.val*t2.val)
from TRC t1, TRC t2
where t1.j = :x and t2.i = :y

and t1.i = trc.i, t2.j = trc.j
group by t1.i, t2.j
)

Please note, that this symmetry is made possible because of the extra information that was stored in the TRC table. Since the number of paths that are going from node i to node j is known, all the paths that are affected by the deletion of an edge (x,y) can simply be subtracted. In Dong's approach, maintenance under deletion is more complicated.

Carrying over the solution to sparse matrices requires little insight. The TRCDelta table that stores the T S T matrix is calculated the same way as in the edge insertion scenario. Thus, subtracting the T S T from T brings up two possibilities:

1        An entry in the transitive closure matrix T is the same as the corresponding entry in the T S T.

2        An entry in the transitive closure matrix T is bigger than the corresponding entry in the T S T.

In the first case the entry in the difference matrix T - T S T is 0. Therefore, all these entries have to be deleted:

delete from TRC
where (i,j, val) in (select i,j, val from TRCDelta)

All the other entries have to be adjusted:

update TRC
set val = val - (select val from TRCDelta td
where td.i = trc.i and td.j = trc.j)
where (i,j) in (select i,j from TRCDelta)

Now that several methods for transitive closure calculation/ maintenance has been shown, let's again return to applications. Perhaps the most significant problem that can be expressed in terms of transitive closure is aggregation on graphs.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning