Search BC Oracle Sites

# Reversed Nesting

SQL Tips by Donald Burleson

Let's revise the nested sets example in Figure 5.2. This time instead of a parent set containing its children sets, we demand the parent set to be contained in its children as shown in Figure 5.14.

This variation of Nested Sets encoding is non-volatile. Once again, there is no direct support of the set datatype on most platforms, so we have to find a workaround. Armed with Boolean algebra each set is represented as a Boolean vector, the latter can be stored as plain strings. For example, {0,1,4,6} becomes ?1100101?.

In general, set containment does not correspond to any standard operation on strings. However, sets of tree nodes are special as shown in Figure 5.15.

Set elements correspond to tree nodes, and each set is associated with a path from the root to a node. Hence, the set containment for reverse nested sets is the same as the substring operation. The methods for querying materialized path encoded trees must work for reverse nested sets as well.

Roji Thomas suggested yet another tree encoding, which is closely related to the reverse nested sets. In his method each node is labeled in two steps. First, each node is designated a unique prime number. Then, each node is encoded with a number, a product of the primes on the path from the node to the root. Node A is ancestor of node B whenever the encoding of A divides B.

By the fundamental theorem of arithmetic, every integer has a unique prime factorization:

N = 2α1?/span>3α2?/span>5α3?/span>??/span>pkαk

It is immediate that the vector of prime orders (α1, α2, α3,?, αk) in Roji Thomas? case is Boolean, and his encoding is essentially reverse nested sets. Without establishing this connection the investigation would not be complete.

When studying any encoding schema, it is natural to start with the expressive power, verifying that any hierarchical query can be expressed in terms of new encoding. The second concern is efficiency, with emphasis on access path via index. It might not be obvious how to index tree nodes in Roji Thomas encoding, yet it is the connection to reverse nested sets that solves the problem.

Ordered Partitions

Given a positive integer N, in how many ways it can be expressed as a sum of smaller integers? Assume the order of the summands is important. Although this problem seems to be too distant from database practice, it nevertheless leads to yet another encoding method.

First of all, ordered integer partition is essentially a materialized path. Let's enumerate all possible partitions by arranging them into the following table:

This enumeration is generated recursively. All the partitions of N+1 are generated from the partitions of N in either way:

1    putting an extra component with 1 in front of all of the other components of N, or

2    incrementing the first component of each partition of N by 1.

Then, it follows that partitions encoded with the even numbers have 1 as the last component. Indeed, as long as partitions of N have been evenly encoded to end up with 1, this property is carried over to the partitions of N+1. Since partitions are associated with materialized paths, this means the oldest child encoding is twice that of the parent!

For odd encodings, let's decrement them by 1, first. This would produce an even encoding - the case that we have studied already. Compared to the former odd encoded partition, the latter even encoded partition has an extra component - the trailing 1. Both are partitions of the same number N, however. This can be achieved only if the last component of the odd encoding is the sum of the last component and the last but one component of the even encoding. For example, the partition encoded as 11, i.e. 1+3, has the last component 3 to be represented as 2+1 in the partition number 10, i.e. 1+2+1. Then, by halving the even encoding, the encoding of the younger sibling (relative to the original odd encoded node) will be obtained!

Although these discoveries allow defining an ancestor relation which could serve as a basis for hierarchical queries, performance is still important. It is unclear if the integer partition based encoding schema admits an efficient querying of the node's descendants. Once again, our benchmark is nested intervals encoding, where all the descendants can be accessed via index range scan. It turns out that integer partition encoding is very closely related to nested intervals encoding with dyadic rational numbers; in fact, there is a transparent mapping between them. I will not pursue this venue here any longer; however I will refer an interested reader to a series of my papers published by dbazine.com. Once more, from a practical perspective Farey fractions provide a more economical opportunity to organize a system of nested intervals.

Case Study: Homegrown C Function Call Profiler

Profiler is a common tool for performance analysis. It determines how long certain parts of the program takes to execute, how often those parts are executed, and then generates a tree (or graph) of function calls. Typically this information is used to identify the portions of the program that takes the longest to complete. Then, the time consuming parts are expected to be optimized to run faster.

If the reader begins to suspect that there is plenty of tools doing the job, then (s)he's absolutely right. Those tools, however, are just programs. They build canned reports. We?ll approach the problem from database centric angle. Let's place the profiling data into the database, and then we can query it any way we like!

First, the data must be gathered. Linux/Unix pstack is a primitive utility that does the job. The calls stack can be queried repeatedly with a shell script like this

integer i=0
while ((i <= 999));
do
pstack -F 25672 | tee -a pstack.trc;
(( i = i + 1));
done

where the magic number 25672 is the process number to attach to.

The output is streamed into a file with the content like this:

009d5930 appdrv   (ffbe82d8, 180000, ffbe82d8, feb5e794, 0, ffbe8351)
009fd85c kkofmx   (0, 0, 0, 0, fe7bebb4, 2a38a43e) + 624

009f834c kkonxc   (fe7bebb4, fe4b3124, ?) + 8c
009fc65c kkotap   (200000, 0, 0, 0, 1, 4000) + 1444
009f1154 kkojnp   (0, 0, 0, 0, 108, 0) + 14dc
009ef9e8 kkocnp   (fe7bebb4, 1, 0, 0, 1f, 7c) + f0
?
008f4b5c opidrv   (3c, 4, ffbef504, 2f6c6f67, 0, 2f) + 1ec

001d666c sou2o    (ffbef514, 3c, 4, ffbef504, 0, 0) + 10
001cf59c main     (2, ffbef5dc, ffbef5e8, 30d2000, 0, 0) + ec
001cf488 _start   (0, 0, 0, 0, 0, 0) + 108
00a1c6a4 kkorbp   (ffbe8e3c, 0, ffffffff, 0, 0, ffbe8e61) + 808
00a1f230 kkobrfak (ffbe8e3c, ffbe8e60, fe39a42c, 0, 0, 1) + 8c
00a1d92c kkofbp   (2a3bef38, 0, 2a3bef38, 0, 0, 1) + 584
00a22ddc kkobmp   (10, fe4b39e4, 23, fe4df610, 0, fe4b3ac4) + bc
?
008f4b5c opidrv   (3c, 4, ffbef504, 2f6c6f67, 0, 2f) + 1ec
001d666c sou2o    (ffbef514, 3c, 4, ffbef504, 0, 0) + 10
001cf59c main     (2, ffbef5dc, ffbef5e8, 30d2000, 0, 0) + ec
001cf488 _start   (0, 0, 0, 0, 0, 0) + 108
00a1f218 kkobrf   (ffbe8e3c, ffbe8e60, 0, 0, 0, 0) + 74
00a1d690 kkofbp   (2a3bef38, 0, 2a3bef38, 0, 0, 1) + 2e8
00a22ddc kkobmp   (10, fe4b39e4, 23, fe4ff610, 0, fe4b3ac4) + bc
009fc678 kkotap   (200000, 0, 0, 0, 1, 4000) + 1460
?

The detailed file structure is not important. What is essential for this analysis is the sequence of the function calls (emphasized in bold) within each section (demarcated with italic delimiters). Specifically, we would like to move the data to the database with the following schema:

table function_sequences (
stack_id integer,       -- section
id       integer,       -- sequence #
func     varchar2(100)  -- function name
);

The sample data snippet is now a part of the database table:

select * from function_sequences

The data mover implementation part is rather boring. The input file is read line by line. If the line starts with the magic number 25672, a section delimiter string is being parsed. Then, the stack_id counter is incremented, and the id counter initialized. Otherwise, the function name is read and the id counter incremented.

It was easy to implement the function ids increasing with each line scanned, but their order is inconsistent with stack slots numbering. It is convenient to re-label the functions so that the root function _start is always labeled with id = 1. The new view/table name - Stacks - reflects the fact that the data is now conventionally aligned with the stack data structure:

create table Stacks as
select s.stack_id id, height - s.id + 1 pos, func
from function_sequences s, (
select stack_id, max(id) height from function_sequences
group by stack_id
) ss
where s.stack_id = ss.stack_id
;

We finally arrived to an interesting side of the problem, the reporting. How are all these stacks combined into a meaningful call graph? In particular, what defines the function location in the call graph? Having learned so much about materialized path encoding already, the answer will be hardly surprising. It is a path assembled from of all the names on the call stack that is being sought after.

Technically, functions are concatenated with the list aggregate function:

select id, list('.'||func)
over (partition by id order by pos) path,
pos, func\
from stacks;

The materialized path can now be used to group by the stack tree nodes with the identical path together, and/or order by the nodes to get a nice indented tree layout:

select func, pos-1 depth, count(1) from (
select id, list('.'||func)
over (partition by id order by pos) path, pos, func
from
stacks
) group by path, pos, func
order by path;

The count aggregate is proportional to the time the execution has spent on this particular stack tree node. From there you would typically search for hotspot nodes, where a significant part of the execution time is spent.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning