Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Generating Baskets

SQL Tips by Donald Burleson

Given a set of items, how much, say, would $500 buy?

select * from items;

Even though the problem may sound unbelievably simple, the answer involves finding all sets of items, not just the item records that cost below $500. However, what do those sets have to do with the graph problems that are the focus of this chapter? Well, there is no method on how to generate sets of items, but even if there were, the second challenge would be aggregating a price on these sets.

The problem in terms of graphs is reformulated using the very suggestive Figure 6.14 as the basic idea.

What criterion connects the items in the graph? Anything that does not list an item twice would do. For purposes in this chapter, any two items are connected if the first item precedes lexicographically to the second one. Then, as in the hierarchical total section, the solution has to branch in order to accommodate the differences between various platforms.

In Oracle the idea is adopted from Figure 6.13. Unlike the previous section, however, where values were aggregated on edges, values are added together on nodes. The path is decomposed into the three components as shown in Figure 6.15.

The transitive closure relation together with an aggregation query formally is:

with Sets as (
   select distinct connect_by_root(name) x, name y,
      case when connect_by_root(name) = name then
         ''
      else
         substr(sys_connect_by_path(name,','),
          instr(sys_connect_by_path(name,','),',',2))
      end p
   from items
   connect by nocycle name > prior name
), SetTotals as (
   select t1.x||t1.p||t2.p,
   sum(price) price_sum
   from Sets t1, Items i, Sets t2
   where t1.y = i.name and i.name = t2.x
   group by t1.x||t1.p||t2.p
)

Most of the attention here has been focused on making the path string look right. Without this twisted case condition, joining the paths according to Figure 6.15 would produce strings with duplicate items, e.g. camera, camera, monitor.

The final query is now one step away:

select * from SetTotals
where price_sum < 500


Now that the connect by based solution has been examined, let's see what recursive SQL can offer. The solution is embarrassingly simple:

with Sets (maxName, itemSet, sumPrice) as (
select name, name, price
union
select name, itemSet || ',' name, sumPrice + price
from Sets, T
where name > maxName

and sumPrice + price < 500

) select itemSet, sumPrice from Sets

Not only it is clearer, but it is also more efficient. Imagine a store with 50K items. Calculating all the subsets of the items is not a proposition that is expected to be completed in the remaining lifetime of the universe. The number of baskets with the aggregated cost limited by some modest amount, however, is reasonably small. Once again, applying predicates early is a good idea in general.

Comparing Hierarchies

Being able to navigate graph structures is only a part of the story. Comparing hierarchies is more challenging. First of all, by what criteria are two hierarchical structures considered as equivalent? In particular, is the order of siblings important?

Depending on the context, the reader may lean to one or the other answer.

Likewise, can children become grandchildren?

Even though a reader may lean to the conclusion that dummy intermediate nodes are not allowed, it is easy to provide a counter example when ignoring certain types of intermediate nodes is imperative.

This ambiguity is discouraging. Yet, let's ignore it for a while and try to develop some basic understanding of how trees can be compared. Looking desperately for a bright idea, I pulled the following entry from thesaurus.com

Main Entry:  compare

Synonyms:         analyze, approach, balance, bracket, collate, confront, consider, contemplate, correlate, divide, equal, examine, hang, inspect, juxtapose, match, match up, measure, observe, oppose, parallel, ponder, rival, scan, scrutinize, segregate, separate, set against, size up, study, touch, weigh

The last synonym - weigh - sounds promising. When comparing two things in the physical world, they are measured on some scale or weighed. Perhaps tree weights can be compared somehow?

Let's revisit the idea of the hierarchical total. When calculating the total, a raw node weight is augmented by the augmented weights of its children. The process begins at the root node descending recursively down to the leaves. The trick is to guess the right kind of aggregation, which accommodates both the tree structure and unaugmented node weights. Then, for all comparison purposes a tree can be identified with the augmented weight of its root node.

The easiest aggregation to try is simply the sum of all tree node weights. This na?e method, however, fails to take into account the differences in the tree structure.

Joe Celko noticed that even though the weights at the root nodes are the same, the weights at node 2 are different. Thus a comparison is needed of the two sets of nodes element-by-element with their respective weights. This is easy enough to do; simply store the weights in Table1

and Table2

Then, check if the symmetric difference is empty.

select AggregatedWeight from Table1
minus
select AggregatedWeight from Table2
union
select AggregatedWeight from Table2
minus
select AggregatedWeight from Table1

Still, the idea of comparing scalar values rather than sets of values is very appealing.  Sets of integers are well known to be mapped bijectively to ordinary integers. This may not be a practical solution, but at least it supports the intuition that a scalar based tree comparison is possible.

Why is the method of mapping integer sets into integers unpractical? First, tree nodes can be labeled with values of dataypes other than integers.  This is easily fixable, since the only important property of the label for tree comparison purposes is its identity. Any value of any datatype can be mapped to an integer, and in fact such a map is commonly known as a hash function. Second, a much more difficult problem is that the mapping of sets integers to integers grows very fast. The range of computer integer numbers overflows pretty easily even for sets of a moderate size. A hash function, however, provides a satisfactory solution to the range overflowing problem as well.

The basic premise of any hash-based method is that the (unlikely) hash code collisions are tolerable. Given an object a, the chance that there exists another object x≠a such that hash(a)=hash(x) is considered as negligible. Likewise, for any objects a, b, x and y satisfying the equation hash(a)+hash(b)=hash(x)+hash(y) it must either follow that x=a, y=b or x=b, y=a. This could be contrasted to the ordinary addition wherein the equation a+b=x+y is too ambiguous to determine x and y.

The hierarchical total query can be defined recursively. At each node labeled p, which has children c1, c2, ?, cn, we aggregate the following value:

total(p) = hash(p+total(c1)+total(c2)+?+total(cn))

The recursive definition is the easiest to implement with recursive SQL. Consider the employee hierarchy:

table Employees ( -- tree nodes
   id   integer primary key,
   sal  integer
);

table Subordination ( -- tree edges
   mgr integer references Employees,
   emp integer references Employees
);

Then, the hierarchical total query starts with the leaf nodes:

select id, hash(id) AS total
from Employees
where id not in (select mgr from Subordination)

The recursive step mimics the recurrence definition:

with NodeWeights as (
   select id, hash(id) AS total
   from Employees
   where id not in (select mgr from Subordination)
    union all

   select e.id, e.id+sum(hash(total))
   from Subordination s, Employees e, NodeWeights nw
   where s.mgr = e.id and s.emp = nw.id
   group by e.id

) select weight from NodeWeights
where id not in (select emp from Subordination)

After all hierarchy nodes are weighted, the outermost query selects the weight at the root node.

It is difficult to fit a recursive idea into the connect by framework. Likewise, when nested sets and intervals were studied, the recursive ideas were ignored altogether. Could the hash based tree comparison method be adapted to these contexts as well?

Let's invoke our familiar path encoding - after all, it reflects a tree structure. Admittedly, I do not know how to aggregate paths in a subtree of descendants, but I can suggest adding their hash values instead! More specifically, a hash value of each node can be multiplied by a hash value of the node path and summed up the hierarchy. This definition is free of recursion, and therefore, is exactly what is required for the hierarchical total method to work.  

The method for basic tree comparison can be amended to meet the exact tree equality specification. In the beginning of the section there were two tree equality dilemmas that were left hanging. In the first dilemma, if the reordering of siblings is allowed, the node summation is performed exactly as described. Otherwise, the node weights should be multiplied by the order numbers.

For example, for the left tree in Figure 6.14 the augmented weight of the root node has to be recalculated to become 1*1 + 2*1 + 3*2 = 7. If leaves 2 and 3 are swapped, then the weight at the root node has to change to 1*1 + 3*1 + 2*2 = 8. A similar idea applies to the hash-based comparison method, where the hierarchical total formula becomes:

total(p) = hash(p+1?/span>total(c1)+ 2?/span>total(c2)+?+ n?/span>total(cn))

Likewise, in the second dilemma if intermediate empty nodes in the tree structure are allowed, then Celko's method works as it is. Otherwise, each unaugmented node weight is multiplied by the level. For example, for the left tree in Figure 6.15 the augmented weight of the root node has to be recalculated to become 1*1 + 2*2 + 2*3 = 11. For the right tree it is 1*1 + 2*2 + 2*0 + 3*3 = 14. By contrast, the hash-based tree comparison method considers the tree structures in Figure 6.15 as different.

  

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.