 |
|
Exotic Products in SQL
SQL Tips by Donald Burleson |
The core of SQL language
is fairly compact. Select-project-join, set operators, nested
subqueries, inner views and aggregation all make up a very short but
expressive list of operators. This is all that most users ever
need for everyday usage. Once in a while, however, there comes a
problem that cannot be solved by these means. Often, such a problem
is evidence of a missing language feature.
User-defined aggregates
and Pivot, which will be studied in the beginning of this
chapter, are undoubtedly at the top of the list of desired features.
Symmetric Difference and Logarithmic Histograms are
two more query patterns that every developer sooner or later will
encounter in their practice. And perhaps once in a lifetime this
developer may come across either Relational Division or
Skyline. Finally, the Outer Union
is so rare
that there is almost no chance an ordinary developer would ever need
it. Yet, we?ll get a chance to leverage it in the next chapter.
List
Aggregate
List aggregate is not
exactly a missing feature. It is implemented as a built-in operator
in Sybase SQL Anywhere and MySQL. Given the original Emp
relation
select
deptno, ename from emp;
the query
select
deptno, list(ename||?, ?)
from emp
group by deptno
is expected to return
The other vendors do not
have a built-in list aggregate, but offer overwhelming
functionality that allows implementing it easily. If your platform
allows programming user-defined aggregate functions, simply search
the code on the net, as it is most likely somebody has already
written the required code. For Oracle the string aggregate function
implementation named stragg may be easily found on the Ask
Tom forum.
User Defined Functions
Originally, SQL intended
to be a ?pure? declarative language. It had some built-in
functions, but soon it was discovered that introducing User
Defined Functions (UDF) makes the SQL engine extensible.
Today, UDF is arguably one of the most abused features. In the
industry, I have seen UDF with 200+ parameters wrapping a trivial
insert statement, UDF used for query purposes, etc. Compare it to
the integer generator UDF from Chapter 2, which was written only
once and was intended to be used in numerous applications.
A User-defined aggregate
is the standard way of implementing list aggregate. Let's
explore alternative solutions. A recursive SQL does not miss an
opportunity to demonstrate its power. The idea is to start with the
empty list for each department and add records with the list
incremented whenever the department has an employee name greater
than last list increment. Among all the list of names, select those
that have maximal length:
with
emp_lists (deptno, list, postfix, length) as
( select distinct deptno, '', '', 0
from emp
union all
select e.deptno, list || ', ' || ename, ename, length+1
from emp_lists el, emp e
where el.deptno = e.deptno
and e.ename > el.postfix
)
select deptno, list from emp_lists e
where length = (select max(length)
from emp_lists ee
where e.deptno = ee.deptno)
Watch out for pitfalls.
It is very tempting to start with the lexicographically minimal
employee name per each department instead of the empty set. But this
would not work for departments with zero employees.
The idea behind the
recursive SQL solution carries over to the connect by
solution:
with
concat_enames as (
select deptno, sys_connect_by_path(ename,',') aggr, level depth
from emp e
start with ename=(select min(ename) from emp ee
where e.deptno=ee.deptno)
connect by ename > prior ename and deptno = prior deptno
) select deptno, aggr from concat_enames e
where depth=(select max(depth) from concat_enames ee
where
ee.deptno = e.deptno);
Next, there go various
methods leveraging collections
CREATE or
replace TYPE strings AS TABLE OF VARCHAR2(100);
/
CREATE or
replace Function CONCAT_LIST ( lst IN strings )
RETURN VARCHAR2 AS
ret varchar2(1000);
BEGIN
FOR j IN 1..lst.LAST LOOP
ret := ret || lst(j);
END LOOP;
RETURN ret;
END;
/
SELECT
deptno,
CONCAT_LIST(
CAST(MULTISET(
SELECT
ename||',' FROM EMP ee WHERE e.deptno=ee.deptno )
AS strings)) empls
FROM emp e
group by deptno;
including the one with a
little bit cleaner syntax:
SELECT
deptno,
CONCAT_LIST(CAST( COLLECT(ename) AS strings
)) empls
FROM emp
group by deptno;
Another variation of the
previous method leverages a function that accepts a cursor as an
argument instead of a collection:
CREATE or replace FUNCTION CONCAT_LIST( cur
SYS_REFCURSOR )
RETURN VARCHAR2 IS
ret VARCHAR2(32000);
tmp VARCHAR2(4000);
BEGIN
loop
fetch cur into tmp;
exit when cur%NOTFOUND;
ret := ret ||
tmp;
end
loop;
RETURN ret;
END;
/
select distinct
deptno,
CONCAT_LIST(CURSOR(
select ename ||',' from emp ee where
e.deptno = ee.deptno
) employees
from emp e;
Admittedly, the function
concatenating values in a collection adds a little bit of a
procedural taste. Again, such a general purpose function should not
be exposed as a part of the solution at all. It is obvious that the
CONCAT_LIST function should already be in the RDBMS library.
So far, half a dozen
various aggregate string concatenation methods have been counted.
And yet, not all of them are created equal from a performance
perspective, of course. The most efficient solutions are the
user-defined aggregation and the leveraging collect function.
Do not hastily dismiss the rest, however. If nothing else, the
problem of writing a list aggregate is still a great job
interview question. A programmer's SQL skills (on the scale 0 to 6)
can be rated by the number of different approaches that she can
count.