 |
|
Ordering
SQL Tips by Donald Burleson |
So far emphasis has
been placed on how to query the tree structure in various encodings.
The presentation layer often demands an ordered output. The
difficulty here is that the end user is expected to specify some
ordering criteria on runtime, and the order might be nonconforming
to the order encoded in the tree structure. For example, consider a
familiar hierarchy or employees with nested sets encoding:
The tree is
effectively ordered by the LEFT column. A user might ask to
display it locally, ordered by the employee names at each hierarchy
level. It is easy to see that the global ordering criteria is
essentially the path from the root to a node, which is made of
concatenated names:
Unlike the
materialized path, the path of concatenated names must be generated
dynamically. The technical problem here is aggregating the names
into a path string. Fortunately, in Chapter 3 the LIST
aggregate function was explained. Therefore, the process continues
by taking the familiar nested sets query, which returns a list of
all node ancestors, and aggregating those lists into paths:
SELECT
ii.left
,CONCAT_LIST(CAST( COLLECT('.'||i.ename) AS
strings )) path
FROM Employees i, Employees ii
where ii.left between i.left and i.right
group by ii.left;
On a cautious note,
this particular implementation of the LIST aggregate is
agnostic of the order of the aggregation summands. The reader must
double check that his favorite string concatenation method
concatenates summands in the right order.
Ordering by dynamic
path seems to be a natural solution until we are asked to order by
numeric criteria, salary, for example. Numbers sorted as strings go
in the wrong order. They have to be padded. The other concern is
that of negative values. All these inconveniences make the dynamic
path solution unsatisfactory.
An alternative
solution is recreating the hierarchy dynamically with the structure,
which is conforming to the required ordering. At first thought,
recreating the hierarchy smells performance problems, but our
rationale hinges upon a typical usage scenario. It must be the GUI
that wants to display the ordered hierarchy, and GUI rendering
capabilities impose common sense limits on the tree size. Even if
the GUI were able to display the whole tree, an end user would be
overwhelmed by the volume of information that he were presented ?at
once?. It is fair to assume that a reasonably designed GUI would
display only a local portion of the hierarchy, no matter how big the
whole hierarchy might be.
Therefore, the
issue of hierarchy ordering should be considered within the scope of
application design. The application programmer designs a method of
hierarchy navigation and display, which guarantees that the GUI
displays only a small part of the hierarchy. A familiar example is
an organizational chart where no more than two levels of the
hierarchy are displayed at each tree node: employee subordinates,
and his supervisor. Ordering such a puny tree is a laughably
easy exercise for a reader who has advanced thus far in this book.
An alternative
design is a dynamic tree widget where each node exists in on of the
two states: expanded or collapsed. The Windows file manager utility
is a typical example. In principle, a directory tree could be
expanded fully, but it is unreasonable to expect that there is a
user who is up to the challenge of manually expanding a hierarchy of
any significant size to the full depth.
Exotic
Labeling Schemas
The tree encoding
area is flourishing with various methods. It is so easy to invent
yet another tree labeling schema! This contrasts to general graphs,
which appear to defy encoding ideas. This section reviews others,
arguably, less practical tree encoding methods.
Dietz Encoding
One natural way to
label a tree is the pre-order traversal as shown in Figure 5.10.
It is natural
because the tree nodes are indexed in the depth-first order, and
tree node records are in the depth-first order in a nearly
ubiquitous tree display with the levels laid out horizontally:
The post-order traversal is a less intuitive way of navigating a
tree. The nodes are visited in the same order, but the node's index
number assignment is postponed until all of the node's children are
indexed:
The Dietz tree
encoding assigns a pair of indexes: preorder#, postorder# to
each node.
It is immediate
that the Dietz tree encoding is volatile. Inserting a new node
disrupts existing encodings, both pre-order and post-order.
Querying the Dietz
encoded tree is based upon the following ancestor criteria. Node
x is an ancestor of y if and only if x.preorder# ≤
y.preorder# and y.postorder# ≤ x.postorder#. This
criteria appears to be identical to that of nested intervals,
although unlike nested intervals neither preorder# < postorder#,
nor preorder# < postorder# is universally true for all
the nodes.
Figure 5.12 is a
two-dimensional picture of Dietz encoding. It is assumed that
preorder# is the horizontal axis, and postorder# is the
vertical axis.
Each node x
has its descendant nodes y bounded within the two-dimensional
cone defined by the two inequalities x.preorder# ≤ y.preorder#
and y.postorder# ≤ x.postorder#. For nested intervals we
would additionally have preorder# < postorder# or in
geometric terms have all the nodes above the diagonal preorder# =
postorder#. If all the tree nodes are moved above the diagonal
somehow, then the Dietz encoding would be transformed into nested
intervals. Linear mapping
left = total#nodes - postorder# +1
right = 2 ?/span> total#nodes -
preorder#
achieves that goal
in Figure 5.13.
Pre-order - Depth
Encoding
Yet another way to
label the tree is storing the combination of the pre-order index
number and level. Let's glance over the basic queries, though.
1
To find the node's parent, select all the nodes on the upper
level, filter out all the nodes beneath, and choose the one with
maximum preorder# among the rest. In the familiar example
59) Smith's parent is searched for
among the upper level nodes: Scott, Ford, Allen,
Ward and Martin. Allen, Ward and
Martin are rejected as they all have a preorder# greater
than Smith. Among Scott and Ford the latter has
a greater preorder#.
1
To find all the descendants, locate the next node on the same level
and the next node on the parent level, choose the closest of the
two, and select all the nodes between the given node and the chosen
one.
60) For Scott, pick up Ford and Blake,
then select every node between Scott and Ford (exclusively).
1
Finding all the node's children is just filtering out of case #2 the
nodes with the proper level.
2
Finding a path to the root is simply selecting all the predecessor
nodes, grouping them by level, and extracting the maximum sequence
numbers per each group.
This is, again,
volatile encoding, which kills our further interest in detailed
exploration.