Search BC Oracle Sites

# Matrix Encoding

SQL Tips by Burleson Consulting

The very basic skill of multiplying 2´2 matrices of integer numbers is listed as a reminder in Figure 5.8.

Matrix multiplication obeys the same rules as string concatenation. It is associative

(AB)C = A(BC)

and distributive

AB + AC = A(B+C)

but not commutative

AB BA

Any materialized path is a concatenation of atomic materialized paths. For example, .1.3.2.5 can be viewed as .1 linked to .3, then joined with .2, and finally connected to .5. Can the same thing be done with matrices? The trick is to define atomic matrices, such that multiplying them would produce the matrix encoding for the full path.

Atomic matrices turned out to be quite simple. In fact every atomic matrix has three constant entries: 0 in the lower right corner, and 1 in the lower left, and -1 in the upper right. The upper left entry is the node’s sequence number in the chain of siblings incremented by 1. For example, .5 corresponds to the matrix:

And the result from multiplying the matrices corresponding to .1, .3, .2, and .5 is:

Although we didn’t seem to progress much so far, we can at least round up matrix tree encoding schema design

table MatrixTreeNodes (
a11 integer,
a12 integer,
a21 integer,
a22 integer
);

A lot of questions might emerge in the reader’s mind at this moment, and all of them will be addressed one by one. The most important of which is how this matrix encoding is related to nested intervals? Indeed, all these matrix manipulations should be evaluated from the querying perspective. In particular, how are the node’s descendants and ancestors queried? And an even more basic question, how are the node’s parent and immediate children found?

Parent and Children Queries

In the example there are a certain constraints that node encoding obeys. The entries in the left column are positive, and the entries in the right column are negative. And absolute values of the entries in the right column are component-wise smaller than those in the left column. Likewise, absolute values in the upper row are greater than the absolute values in the lower row. These properties are obvious for atomic matrices, but what about arbitrary nodes?

Consider an arbitrary node with encoding as:

Its n-th child encoding is calculated as a matrix product of:

Let us examine this expression closely. First we notice that the parent left row entries are moved into the child right row with the sign changed. Now, can we prove our intuition about the entries in the left row being positive and the entries in the right row being negative? Sure. Assume this property holds in the case of the parent encoding. Then, it must carry over to the child. By induction, it follows that any node will honor it. A similar line of reasoning proves our insight about absolute values.

The second very important constraint that matrix encoding satisfies is:

A reader with basic linear algebra background most likely recognized the matrix determinant there. Determinants obey multiplication law: when matrices multiply, their determinants multiply as well. Therefore, a determinant of the node encoding matrix is a product of the atomic matrix determinants! Since all atomic matrices have a determinant equal to 1, the determinant of any node encoding matrix must be 1.

A determinant constraint reduces the number of independent matrix entries to three. Given any three matrix elements, the forth entry is unambiguously calculated from the determinant constraint equation. And this could be done even better by reducing the number of independent elements to two.

Given the two elements a11 and a21, relabeled conventionally as a and -b, plus the unknowns: a12 and a22 relabeled as y and x, the determinant equation reads:

This is perhaps the most celebrated equation in the elementary number theory. Its integer solutions are calculated via the extended Euclidean algorithm.  Here is the algorithm illustrated on the familiar matrix encoding example:

The following equation is solved in a series of steps illustrated in Figure 5.9:

At the last algorithm iteration, it is concluded that the values x=11 and y=19 satisfy the equation.

Is this the only solution? Certainly not. Consider:

62×107 - 107×62 = 0

11×107 - 19×62 = 0

And the result

(11+62)×107 – (19+107)×62 = 0

which implies another solution of x=73, y=126! Fortunately, we know that x (i.e. a22) and y (i.e. a12) has to be smaller than 62 (i.e. a21) and 107 (i.e. a11), correspondingly. Therefore they can be dismissed.

The most important implication of the research in this section is that the combination of a11 with a21 is always unique. We can go as far as reducing the MatrixTreeNodes definition to these two attributes (and calculate the other two columns on the fly via extended Euclidean algorithm), or leave the redundant attributes in the table and just declare the unique key. The second alternative is chosen, which is justified by the next step. Knowing that a12 and a22 are always negative, we are going to store their absolute values. Then, as been shown already, the child values a12 and a22 have to refer to some parent identified by a11 and a21. In other words, a child always refers to its parent explicitly via the foreign key constraint.

Therefore, the tree schema design is enhanced as shown below:

table MatrixTreeNodes (
a11 integer,
a12 integer,
a21 integer,
a22 integer
);

alter table MatrixTreeNodes
REFERENCES MatrixTreeNodes (a11,a21);

The hierarchy design in which a node refers to the parent name explicitly is called the adjacency tree model, and its scope is actually bigger than trees. The schema for the adjacency model is the following:

id integer,
parent_id integer
);

Unlike matrix encoding, there is no theory regarding how to choose a set of node identifiers, except for the obvious restrictions that the id column is a unique identifier, and parent_id always refers to the parent node. The general adjacency model is the main topic of the next chapter.

There is one subtle distinction between matrix and adjacency encodings. What the root node refers to. In adjacency encoding the root node parent_id has to be NULL, as there is no parent. In the matrix encoding the extended Euclidean algorithm is simply applied and the four numbers obtained. The root node refers to some nonexistent parent! What if the root node matrix encoding:

is changed into:

Technically, NULLS cannot be forced into the matrix entries – they would destroy all the algorithms that were developed so far. It is more reasonable to admit that the formal referential constraint declaration for matrix encoding is invalid, and therefore, should be retracted from the schema design. This is not a big issue, however, given that matrix encoding enjoys more sophisticated constraints than referential integrity.

Parent is NULL?

In the adjacency model the root node refers to the NULL parent. Does it mean that the query “Find the root node’s parent” cannot be answered?  In the matrix model the root refers to the imaginary parent, and the query “Find the root node’s parent” returns the empty set as it supposed to.

Once again, it was possible to establish direct links between parent and children because the values of a12 and a22 were negated. From now on, the generic matrix node encoding will be referred to as:

Now that the informal referential integrity constraint is available, querying parent and children nodes becomes obvious.

Find all the employees who report directly to Jones.

select child.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where parent.name = ‘Jones’
and child.a11 = parent.a12 and child.a21 = parent.a22

Who is Jones’ manager?

select parent.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where child.name = ‘Jones’
and child.a11 = parent.a12 and child.a21 = parent.a22

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning