Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Interval Coalesce

SQL Tips by Donald Burleson

Normally, the shorter the problem statement is, the simpler its formal expression in SQL. However, there are notable exceptions and interval coalesce is one of them.

Given a set of intervals, return the smallest set of intervals that cover them.

This deceptively simple formulation leaves a pile of questions.  First, what is the intervals table? This one is easy:

table Intervals (
   x integer, -- start of the interval
   y integer  -- end   of the interval
); 

ALTER TABLE Intervals
ADD CONSTRAINT ends_ordering CHECK( x < y );

Perhaps, the from and to, or maybe, start and end column names may sound more appropriate. Unfortunately, they are SQL keywords.

Next, what is the concept of one interval covering the other? Easy: an interval is a set of points between the interval endpoints x and y. What about the endpoints, are they included into the interval? For our purpose, let's agree that both x and y belong to interval set of points, in other words, the intervals are closed from both ends.  Open and half-open intervals require careful reexamination of all the inequality predicates in the query which we are going to write.  Therefore, formally: A = {p|x≤p≤y}. Finally, interval A covers interval B when B is subset of A: BA.

Armed with these definitions, we are ready to write the query. The Intervals table is the only candidate to be placed into the query from clause, but there is a catch. If we just select some intervals out of the set we have, we won't get the answer. Think about an easy case of two overlapping intervals. The answer to the query is an interval whose endpoints are combined from different records. Therefore, a selfjoin is needed:

select fst.x, lst.y
from Intervals fst, Intervals lst
where fst.x < lst.y
and ?

Next, the ?no gaps? condition is introduced. Consider any interval endpoint between fst.x and fst.y. It has to be covered by some interval!

Actually, if there is a gap in a chain of intervals, then both ends of the gaps would be exposed (that is not covered by other intervals). This observation allows a small optimization, so that only the x endpoints will need checked.

Unfortunately, the ?no gaps? condition cannot be naturally translated into SQL, since it does not support universal quantifiers. The standard approach is converting the clause into existential quantifiers, which will be explained in more detail in the section on Relational Division. Here let's take a little jump and write the whole condition at once:

select fst.x, lst.y
from Intervals fst, Intervals lst
where fst.x < lst.y
and not exists (
   select * from Intervals int
   where int.x > fst.x and int.x < lst.y
   and not exists (
      select * from Intervals cov
      where int.x >= cov.x and int.x =< cov.y
   )
)
and ?

That's not all. The chain of intervals has to be maximal, in other words, it cannot be extended at the either end.

This condition translates into one extra subquery:

select fst.x, lst.y
from Intervals fst, Intervals lst
where fst.x < lst.y
and not exists (
   select * from Intervals int
   where int.x > fst.x and int.x < lst.y
   and not exists (
      select * from Intervals cov
      where int.x >= cov.x and int.x =< cov.y
   )
) and not exists (
   select * from Intervals cov
   where cov.x < fst.x and fst.x <= cov.y
      or cov.x <= lst.y and lst.y < cov.y
)

The final query is surprisingly verbose. Terser expression would involve converting NOT EXISTS subqueries into counting.
 

Is (NOT) EXISTS or (NOT) IN faster?

This is the wrong question. The EXISTS and IN subqueries are equivalent; in fact, the optimizer uses the term semijoin for both. The NOT EXISTS and NOT IN subqueries are different as far as NULL semantics is concerned, but still they are very similar so that the SQL execution engine uses the term antijoin for both. It might be argued that SQL standard should have defined them identically, leaving explicit control of NULL semantics to the end user (by means of IS (NOT) NULL predicate). History proved any attempt to get UNKNOWN value semantics right as futile, therefore it is better just to forget about NULLs and aim for elementary consistency. 

Which NOT EXISTS clause should be used, since two are available? Let's revisit both conditions:

1        A chain of intervals must have no gaps (Figure 1.8)

2        A chain is maximal if it has uncovered ends (Figure 1.9)

These conditions are not entirely independent, since every maximal chain of intervals is delimited by gaps on both ends. A change of perspective is needed.

Consider any pair of intervals fst and lst. Suppose fst.x and lst.y are not covered by other intervals. Are fst and lst the beginning and the end of a chain? Not necessarily, because there might be gaps between them. Let's shift our focus onto a set of fst, lst pairs such that fst.x and lst.y are not covered by other intervals. If there is a gap between some pair of intervals fst1, lst1, then it follows that there has to be another pair fst2, lst2 with the same first element fst2 = fst1 and some other second element lst2 such that lst1.y < lst2.y

In other words, among all the fst, lst pairs with the same fst element, it is the pair with the minimal lst element that defines a chain without gaps.

Let's formalize these ideas. A NOT EXISTS subquery has already been written that defined a set of fst, lst pairs such that fst.x and lst.y are not covered by other intervals. Alternatively, the conditional summation pattern could have been applied.

select fst.x, lst.y, sum(
      case when cover.x < fst.x and fst.x <= cover.y
             or cover.x <= lst.y and lst.y < cover.y
           then 1 else 0 end
)
from intervals fst, intervals lst, intervals cover
where fst.y <= lst.y
group by fst.x, lst.y
 
counts if either fst.x or lst.y is covered by some interval. If the count is 0, then the fst, lst pair defines a maximal chain (possibly with gaps). Therefore, a formal query returning all the maximal chains is:

select fst.x, lst.y
from intervals fst, intervals lst, intervals cover
where fst.y <= lst.y
group by fst.x, lst.y
having sum(
      case when cover.x < fst.x and fst.x <= cover.y
             or cover.x <= lst.y and lst.y < cover.y
           then 1 else 0 end
) = 0

The final step is selecting all the pairs with the minimal lst element. All the pairs with the fixed fst element, which translates into group by are being considered:

select x, min(y) from (
   select fst.x, lst.y
   from intervals fst, intervals lst, intervals cover
   where fst.y <= lst.y
   group by fst.x, lst.y
   having sum(
         case when cover.x < fst.x and fst.x <= cover.y
                or cover.x <= lst.y and lst.y < cover.y
              then 1 else 0 end
   ) = 0
)
group by x

There is an important special case of the interval coalesce problem:

Given a set of integers, partition them into ranges of successive numbers.

This problem can also be called interval packing, and is the reverse of the discrete interval sampling, that will be studied in the next chapter.

If each integer x is represented as a (closed) interval [x,x+1], then the problem reduces to interval coalesce. Rod West suggested a much more elegant solution, however. His key insight was an expression that groups the numbers within the same range. Then, if we know how to group integers, the ranges are defined by taking minimum and maximum inside each group. What criterion identifies each group?

It was shown in the section on counting how to enumerate rows in the increasing order:

select t.x, count(*) seq#
from T t, T tt
where tt.x <= t.x
group by t.x 

It is the x - seq# expression that remains constant within each group!

The rest is straightforward. We group by this expression and calculate min and max aggregate values, which are demarcating the beginning and the end of each interval:

select min(x), max(x) from (
   select x, rank() over(order by x) seq# from T
) group by x-seq
 
Predictably, this problem might occur in a slightly more complicated context. Our input relation can have one more column, say name, so that integers are grouped by the name values. Rod's solution scales up naturally to the new requirements. The additional grouping column just emerges in the appropriate places:

select name, min(x), max(x) from (
   select x, rank() over(partition by name order by x) seq#
   from T
) group by name, x-seq

 

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.