Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

String Decomposition

SQL Tips by Donald Burleson

Given a relation A with a single column LST the query should output a relation with tuples containing individual substrings that are 'separated by commas? in the original strings:

The clause 'separated by commas? is decorated with quotation marks on purpose. While the value 3 is indeed enclosed with commas on both sides, 1 and 7 are delimited from one side only.

To convert this informal requirement into a formal language, the first thing to do is fix this nuisance. It is quite easy to accomplish using:

select ','||LST||',' LST from A

Now, each value is enclosed from both sides. The answer is a substring between n-th and n+1-th delimiter position. Getting a little bit ahead of myself, I?ll mention that a very elegant recursive SQL solution awaits us later. The following is one of Oracle's integer number generators:

with B as (select ','||LST||',' LST from A)
select substr(LST,instr(LST,',',1,num)+1,
                 
instr(LST,',',1,num+1)-instr(LST,',',1,num)-1) PREF
from B,(
  select rownum num from dual connect by rownum < 10
)

 

With this solution, it is instructive to reflect back and notice that without the additional commas in the beginning and at the end of the string the solution would be quite messy. Separate cases would have to be made for the values that are delimited from one side only. In one word, rigorous means simple!  

Why does the query insist on producing exactly 9 integers? Well, 9 is a ?reasonably big? number. Any number greater or equal to LENGTH(LST) would do. If we have more than one row in the original relation A, then we could simply take a maximum of the string lengths.

What about those rows with NULLs? Easy: anything unwanted could be filtered out with a proper predicate.

Admittedly, the last two answers taste like kludges. The solution feels even less satisfying if it is compared with the recursive SQL solution, which I promised before:

with decomposed( pref, post ) as (
  select  '',  lst from A
  union
  select substr(lst, 1, instr(lst,',')),

         substr(lst, instr(lst,',')+1, length(lst)) from decomposed
) select pref from decomposed where pref <> ''

The execution steps can be easily visualized:

Interestingly, no longer is it required to decorate strings with a starting and trailing delimiter.

If your RDBMS of choice supports table functions, you might argue that the task of string parsing could be delegated to some procedural code.  Here is implementation in Oracle posted on the OTN SQL& PL/SQL forum by Scott Swank:

CREATE TYPE Strings AS TABLE OF varchar2(100);

FUNCTION parse (
    text        IN   VARCHAR2,
    delimiter   IN   VARCHAR2 DEFAULT ',')
    RETURN Strings
  IS
    delim_len   CONSTANT PLS_INTEGER      := LENGTH (delimiter);
    tokens               Strings   := Strings ();
    text_to_split        VARCHAR2 (32767);
    delim_pos            PLS_INTEGER;
  BEGIN
    tokens.DELETE;
    text_to_split := text; 

    WHILE text_to_split IS NOT NULL
    LOOP
      delim_pos := INSTR (text_to_split, delimiter); 

      IF delim_pos > 0
      THEN
        tokens.EXTEND;
        tokens (tokens.LAST) :=
                  SUBSTR (text_to_split, 1, delim_pos - 1);
        text_to_split :=
                   SUBSTR (text_to_split, delim_pos + delim_len);
      ELSE
        tokens.EXTEND;
        tokens (tokens.LAST) := text_to_split;
        text_to_split := NULL;
      END IF;
    END LOOP;
 
    RETURN tokens;
END parse;

 

Even though the implementation looks complicated, all that is visible from the SQL query is a function call:

select column_value
from table(parse('1,2,3,4')); 

It might not seem obvious how to call this table function in a context where the input data is a set of strings. A table expression cannot be placed inside some subquery within the select or where clause, since it returns multiple rows.  The only solution is keeping it inside the where clause, although the function parameter has to be correlated with the source table: 

create table sources (
   num integer,
   src varchar2(1000)
);

insert into sources values(1, '1.2.3,4.5,6,7');
insert into sources values(2, '8,9');

select   num,   t.*
from sources, table(parse(src)) t;

This syntax is reminiscent of ANSI SQL lateral views, where the second table expression is allowed to refer to the first one. From a theoretical perspective an expression with a lateral view is an asymmetric join with the join condition pushed inside the second table.   This solution scales up nicely. For example we can parse a comma separated list, first and then parse the result once more

select  num,
  t1.column_value a,
  t2.column_value b
from sources t, table(parse(src)) t1,
table(parse(column_value,'.')) t2;


Enumerating Pairs

Enumerating pairs seems easy. Just build a Cartesian product of the two integers relations:

select i1.num x, i2.num y
from Integers i1, Integers i2

With a finite integers relation bounded by an arbitrary upper limit, there should not be any surprises. What if it is infinite, is it possible to drop the limit predicate on the server side and leverage the pipelining idea?

Pipelined Operators

When executing an SQL statement, the RDBMS engine represents it internally as a tree of operators. Each operator performs a relatively simple task; the complexity lies in the way the operators are combined together.

Each operator works as a black box. It consumes a relation as an input, massages it, and outputs the result to the other operator. If the operator is able to start outputting rows as soon as it consumed one or several rows, it is called pipelined. Pipelined operators do not block the execution flow; whereas blocking operators have to consume the whole input relation before outputting even a single row.

At first, the answer seems to depend on a join method employed by the SQL execution engine for this particular query. Both hash join and sorted merge join materialize their argument relations, therefore the execution plan involving any of those methods is never pipelined.

Nested loops are typically a pipelined method. It iterates via the outer relation, and for each row finds matching rows from the inner relation. In case of the Cartesian product of Integer relations, however, all the rows from the inner relation match and there are an infinite number of them! Therefore, the execution would be stuck scanning the inner table, and would never be able to get past the first row in the outer table. In other words, the nested loop join fails to deliver pipelined set of integer pairs as well.

Let's approach the problem from another angle. When a mathematician says ?enumerating? she really means a one-to-one mapping some set of objects into the integers. Figure 2.2 shows a nearly ubiquitous mapping of integer pairs into integers.

Without further ado, here are the formulas for the (x,y) -> n mapping

and the reverse n -> (x,y) mapping

Proving these formulas is left as an exercise for curious reader. Translating the above formulas into SQL is straightforward:

select n,n-(xplusy+1)*xplusy/2 x,
  xplusy-n+(xplusy+1)*xplusy/2 y
from (
  select FLOOR((SQRT(8*(rownum-1)+1)-1)/2) xplusy,
  rownum-1 n from dual connect by 1=1
)

Even though the implementation side varies from vendor to vendor, it is instructive to see the execution statistics:

Ignoring fancy operator names in the execution statistics above will show that the very first operator produces one row. This row performs a basis for the connect by iteration that starts producing rows one-by-one. Each row is pipelined through two more levels of processing to the top. The client receives each row, and after getting the tenth row, loses any interest in continuing further. The execution statistics is a snapshot at that moment.

Admittedly, using the above pipelined integer pairs implementation inside other queries as an inner view or subquery would only raise eyebrows. Those square root expressions are better be hidden behind a named view:

create view IntegerPairs as
select n,n-(xplusy+1)*xplusy/2 x,
  xplusy-n+(xplusy+1)*xplusy/2 y
from (
  select FLOOR((SQRT(8*(rownum-1)+1)-1)/2) xplusy,
  rownum-1 n from dual connect by 1=1
)


Was anything achieved with this pipelining idea using integer pairs implementation? Consider the following query:

Find positive integers X and Y satisfying both x + y = 10 and x - y = 2  equations

With the IntegerPairs view the solution is immediate:

select x,y from IntegerPairs
where x+y=10 and x-y=2


Would it actually work? A pair x=6 and y=5 is the unique solution of the system of equations, but would the execution really stop after we have it in the result set? Even if the client is not interested in more than one row from the result set, it has no way to communicate this to the server. If the result set is buffered, then the server would continue producing more pairs, and never stop.

Ad-Hoc Constants

If pipelining doesn't really work for constraint problems with two integer variables, why mention it at all? In an alternative approach it would simply be written

select x,y from (

   select num x from Integers where num < 234

), (

   select num y from Integers where num < 567

) where x+y=10 and x-y=2

which produces the correct answer without any risk of getting stuck in the infinite loop. Much left is to be desired for the code clarity, however. Those 'safety? constants are eyesore, and they have to be estimated in advance. The values that work for one problem, might not work for next problem. A cartoon description of pipelining would define it as a programming style that avoids magical constants.

 

 

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.