Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Trees in SQL

SQL Tips by Donald Burleson

Graphs and trees are ubiquitous data structures. They do not easily fit into a Relational model; therefore querying them requires a little bit more ingenuity than the routine select-project-join.

Compared to graphs, trees are relatively simple creatures. They are easy to draw. Almost any problem involving a tree structure is easy to solve. Algorithms on trees are generally fast. Edges, which are very important in graph definition, can be almost completely ignored for a tree. The tree structure could be encrypted in the nodes alone, and those tree encodings could be invented almost on a daily basis.

Most of this chapter will focus on tree encodings. The rest is dedicated to smaller problems such as node ordering by ad-hoc criteria. Yet, several problems are postponed until the next chapter, wherein hierarchical aggregate queries and tree comparisons will be studied. A reader who is primarily looking to develop some intuition with vendor specific hierarchical SQL extensions (be that the connect by, or the recursive with operator) is advised to proceed to the next chapter.

Materialized Path

Tree is a subclass of graph. However, since graphs are more complex entities with their own set of problems, they will not be explored in depth in this chapter.

For all practical purposes a tree can be defined as a set of nodes arranged into a hierarchical structure via tree encoding. The purpose of tree encoding is to assign a special label to each node and to manipulate tree nodes - i.e. query and update - by means of those labels. Informally, each node is equipped with a global positioning device that transmits the node's coordinates. Once each node's geographical position is known, typical queries such as the following can be answered:

Count all the employees who are located south of the ?King?, in other words, who report directly or indirectly to him.

Without a doubt you are already familiar with at least one such encoding: a UNIX directory structure. Each file location in the hierarchy is defined by an absolute pathname -- a chain of directories that a user has to navigate from root to the leaf of the hierarchy. For example, /usr/bin/ls is an absolute pathname. On the top of the directory structure there is a directory called usr, which contains a directory called bin, which contains a file called ls.

This seemingly straightforward idea can be applied to any tree structure. First, discover or cook up some unique key, which would distinguish the node's children. Then, list all the ancestor unique keys as the node's encoding. This list can be represented as a string (if so, there must be agreement upon a string delimiter), or as a collection datatype. This encoding will be referred to as materialized path. The adjective materialized emphasizes the fact that the path is stored. If the path is built dynamically, then the adjective is omitted and thus this dynamically generated encoding is referred to as simply path.

At this moment, there is enough expressive power for basic queries:

An employee JONES and all his (indirect) subordinates:

select e1.ename from emp e1, emp e2
where e1.path like e2.path || '%'
and e2.ename = 'JONES'


An employee FORD and the chain of his supervisors:

select e1.ename from emp e1, emp e2
where e2.path like e1.path || '%'
and e2.ename = 'FORD'

Usually, query performance is unrelated to the form in which the query is written in SQL. In principle, a query optimizer has powerful techniques for transforming any query into an equivalent, better performing expression. Not in this case!

The first query is fine. The matching a string prefix is roughly equivalent to a range check where chr(255) is the last ASCII code:

select e1.ename from emp e1, emp e2
where e1.path between e2.path and e2.path || chr(255)
and e2.ename = 'JONES'

A reasonable execution strategy would be finding the unique employee record e2 matching ename='JONES', first. Finding a unique record is typically done via an index lookup, in other words, extremely fast. The first query execution step establishes the range of paths, which the e1.path encoding has to fall into. If this range of paths does not contain too many paths, then the best way to find them is to iterate via the index range scan. The more subordinates JONES has, the longer it will take to output them. In other words, the speed of this query is determined by the size of the output - there is hardly a more efficient way to express this query.

The equivalent range check rewriting is valid for the second query as well:

select e1.ename from emp e1, emp e2
where e2.path between e1.path and e1.path || chr(255)
and e2.ename = 'FORD'

Unlike the previous case, however, now not only is the interval of paths known, but also the path e2.path itself, which will be matched against all the intervals of the e1 table. Certainly, there would not be that many paths that match with e2.path, because the chain of ancestors in a balanced hierarchy is never too long.

Yet, there is no obvious index that could leverage this idea. The condition of a point belonging to an interval consists of the two predicates e2.path >= e1.path and e2.path <= e1.path || chr(255). A normal B-Tree index on the e1.path column could be leveraged while processing the first predicate only, and it would have to scan half of the records on average.

Finding a Set of Intervals Covering a Point

Querying ranges is asymmetric from a performance perspective. It is easy to answer if a point falls inside some interval, but it is hard to index a set of intervals that contain a given point. Applied to nested sets, there will be difficulty in answering queries about the node's ancestors.

The critical observation is that a chain of ancestors is encoded in the node's materialized path encoding. The database does not have to be accessed in order to tell that the ancestors of nodes 1.5.3.2 are nodes 1.5.3, 1.5, and 1. A simple function could parse the materialized path. This function's ?natural habitat? is the client side. There a dynamic SQL query is built:

select ename from emp
where path in ('1.5.3', '1.5', '1')

On the server side the implementation could be little bit more sophisticated. The list of ancestors can be implemented as a temporary table built by a table function. This sketchy idea will be developed in greater detail in later sections where more elegant encodings than the materialized path will be studied.

This section concludes with a materialized path tree encoding schema design:

table TreeNodes (
   path varchar2(2000),  -
)

This schema leaves the structure of the TreeNodes.path column unspecified.  Ideally, some constraints could be added, but once again, a much nicer development that does not require string parsing techniques awaits us ahead.

 

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.