Search BC Oracle Sites

# Hierarchical Weighted Total

SQL Tips by Donald Burleson

Before exploring aggregation on graphs, let's have a quick look at aggregation on trees. Aggregation on trees is much simpler and has a designated name: hierarchical total. A typical query is:

Find combined salary of all the employees under direct and indirect supervision of King.

This query, however, has no inherent complexity, and it effectively reduces to the familiar task of finding a set of all the node's descendants.

In graph context, the hierarchical total query is commonly referred to as the bill of materials (BOM). Consider bicycle parts assembly:

Parts with the same name are assumed to be identical. They are modeled as nodes in an acyclic directed graph. The edges specify the construction flow:

1        Assemble each wheel from the required parts (including reflectors).

2        Assemble the frame from the required components (including reflectors).

3        Assemble a bicycle from the required parts (including frame and wheels).

Suppose the total list of parts for the bike assembly is to be ordered. How is the quantity for each part calculated? Unlike the hierarchical total for a tree, the quantities cannot simply be added as they multiply along each path.

Therefore, there are two levels of aggregation here, multiplication of the quantities along each path and summation along each alternative path.

Aggregation on Graphs

Two levels of aggregation fit naturally into in graphs queries. Consider finding a shortest path between two nodes. First, the distances are added along each path, and a path is chosen with minimal length.

Double aggregation is not something unique to graph queries, however. Consider the following query:

Find the sum of the salaries grouped by department. Select maximum of them.

When expressing such a query in SQL, the first sentence is accommodated as an inner subquery inside the outer query corresponding to the second sentence:

select max(salaryExpences) from (
select deptno, sum(sal) salaryExpenses
from Emp
group by dept
)

The hierarchical weighted total query has the same structure. The first level of aggregation where edges are joined into paths is analogous to the group by subquery from the salary expense query. Formally, it is a transitive closure, which is enhanced with additional aggregates or a generalized transitive closure.

The following is a hypothetical SQL syntax for a generalized transitive closure:

select distinct first(Part), last(SubPart), product (Quantity)
from AssemblyEdges
connect by prior Part = later SubPart

The unfamiliar syntax requires some clarification:

1        The product is a non-standard aggregate from Chapter 4.

2        The first and last refer to the first and last edges in the path, correspondingly.

These aggregates are unique to ordered structures such as directed graphs.

1      Although not used in the example, concatenating edge labels into a string is one more natural aggregation function, which is unique to graphs. The list aggregate function is a standard way to accommodate it.

2        The later keyword is just a syntactic sugar fixing apparent asymmetry caused by the prior keyword.

3       There is no start with clause, which is, in fact, redundant. It is an outer query where paths will be restricted to those originated in the ?Bicycle? node.

The generalized transitive closure query is enveloped with a second level of aggregation, which is accomplished by standard means:

select leaf, sum(factoredQuantity) from (
select product(Quantity) factoredQuantity,
first(Part) root, last(SubPart) leaf
from AssemblyEdges
connect by prior Part = later SubPart
) where root = ?Bicycle?
group by leaf

Enough theory, so what is done in the real world to implement an aggregated weighted total query? Let's start with Oracle, because the proposed syntax for a generalized transitive closure resembles the Oracle connect by.

First, we have to be able to refer to the first and the last edges in the path:

select connect_by_root(Part) root, SubPart leaf
from AssemblyEdges
connect by prior Part = SubPart

Unlike the fictional syntax, Oracle treats edges in the path asymmetrically. Any column from the AssemblyEdges table is assumed to implicitly refer to the last edge. This design dates back to version 7. The connect_by_root function referring to the first edge has been added in version 10. It is remarkable how this essentially ad-hock design proved to be successful in practice.

Next, Oracle syntax has several more path aggregate functions:

1        level - the length of the path.

2        sys_connect_by_path - essentially the list aggregate in the fictitious syntax.

3        connect_by_isleaf - an indicator if there is no paths which contain the current path as a prefix.

4        connect_by_iscycle - an indicator if the first edge is adjacent to the last.

Unfortunately, we need an aggregate which is a product of edge weights and not a string which is a concatenation of weights produced by sys_connect_by_path. You might be tempted to hack a function which accepts a string of the edge weights, parses it, and returns the required aggregate value.

Nah, too easy! Can the problem be solved without coding a function even a simple one? Yes it can, although this solution would hardly be more efficient. The critical idea is representing any path in the graph as a concatenation of three paths:

1        a prefix path from the start node to some intermediate node i

2        a path consisting of a single weighted edge (i,j)

3        a postfix path from the node i to the end node

Then, the path from x to y is frozen while interpreting the edge (i,j) as a variable. All that needs done is to aggregate the weights of all those edges.

Let's assemble the solution piece by piece. First, all the paths in the graphs are expressed as:

with TClosure as (
select distinct connect_by_root(Part) x, SubPart y,
sys_connect_by_path('['||Part||','||SubPart||'>',' ') path
from AssemblyEdges
connect by nocycle Part=prior SubPart
union
select Part, Part, '' from AssemblyEdges
union
select SubPart, SubPart, '' from AssemblyEdges
) -

This is essentially a reflexive transitive closure relation enhanced with the path column.  Paths are strings of concatenated edges; each edge is sugarcoated into '['||Part||','||SubPart||'>', which helps visual perception, but is inessential for the solution.

Next, the two paths and the intermediate edge are joined together and grouped by paths:

? , PathQuantities as (
select t1.x, t2.y,
t1.p||' ['||e.Part||','||e.SubPart||'>'||t2.p,
product(Quantity) Quantity
from TClosure t1, AssemblyEdges e, TClosure t2
where t1.y = e.Part and e.SubPart = t2.x
group by t1.x, t2.y, t1.p||' ['||e.Part||','||e.SubPart||'>'||t2.p) ?

Now all the paths with quantities aggregated along them are available.  Let's group the paths by the first and last node in the path while adding the quantities:

select x, y, sum(Quantity)
from PathQuantities
group by x, y

This query is almost final, it needs only a minor touch: restricting the node x to ?Bicycle? and interpreting the y column as a Part in the assembly:

select y Part, sum(Quantity)
from PathQuantities

where x = ?Bicycle?

group by x, y

The recursive SQL solution turns out to be quite satisfactory:

with TCAssembly as (
select Part, SubPart, Quantity AS factoredQuantity
from AssemblyEdges
where Part = ?Bicycle?
union all
select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
from TCAssembly te, AssemblyEdges e
where te.SubPart = e.Part
) select SubPart, sum(Quantity) from TCAssembly group by SubPart

Most importantly, it accommodates the inner aggregation with the non-standard aggregate effortlessly! The cycle detection issue that plagued the recursive SQL in the section on transitive closure is not a problem for the directed acyclic graphs.

The hierarchical total turns out to be surprisingly useful in the following two sections. Selecting all the subsets of items satisfying some aggregate criteria, Basket generation, appears to have little to do with the hierarchical total, at first. The other, rather unexpected application is Comparing hierarchies.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.