Search BC Oracle Sites

# Product

SQL Tips by Donald Burleson

Product is another aggregate function, which is not on the list of built-in SQL functions. Randomly browsing any mathematical book however reveals that the product symbol occurs much less frequently than summation. Therefore, it is unlikely to expect the product to grow outside a very narrow niche. Yet there are at least two meaningful applications.

Factorial

If the product aggregate function were available, it would make the factorial calculation effortless:

select prod(num) from integers where num <=10  -- 10!

As always, a missing feature encourages people to become very inventive. From a math perspective, any multiplicative property can be translated into an additive one if the logarithm function is applied. Therefore, instead of multiplying numbers, their logarithms can be added, and the result exponentiated. The rewritten factorial query is still very succinct:

select exp(sum(ln(num))) from integers where num <=10

At this point most people realize that product via logarithms does not work in all the cases. The ln SQL function throws an exception for negative numbers and zero. After recognizing this fact, however, the problem could be fixed immediately with a case expression.

Numbers in SQL

If you feel uneasy about ?fixing? an elegant exp(sum(ln(?))) expression with case analysis, it's an indicator that you as a programmer reached a certain maturity level. Normally, there are no problems when generalizing clean and simple concepts in math. What exactly is broken?

The problem is that the logarithm is a multivalued function defined on complex numbers.  For example, ln(-1) = i π (taken on principal branch). Then, e i π = -1 as expected!

Writing a function calculating factorial is perhaps one of the most ubiquitous exercises in procedural programming. It is primarily used to demonstrate the recursion. It is very tempting, therefore, to try recursive SQL:

with factorial (n, f) as (
values(1, 1)
union all
select n+1, f*(n+1) from factorial
where n<10
)
select * from factorial

Joe Celko suggested one more method - calculating factorial via Gamma function. A polynomial approximation with an error of less than 3*10-7 for 0 ≤ x ≤ 1 is:

Γ(x+1) = 1 - 0.577191652 x + 0.988205891 x2 - 0.897056937 x3 + 0.918206857 * x4 - 0.756704078 x5 + 0.482199394 x6 -

0.193527818 x7 + 0.035868343 x8

Unfortunately, the error grows significantly outside of the interval 0 ≤ x ≤ 1. For example,

Γ(6) = 59.589686421

Factorial alone could hardly be considered a convincing case for the product aggregate function. While factorial is undoubtedly a cute mathematical object, in practice it is never interesting to such an extent as to warrant a dedicated query. Most commonly, factorial occurs as a part of more complex expression or query, and its value could be calculated on the spot with an unglamorous procedural method.

Interpolation

Interpolation is more pragmatic justification for the product aggregate. Consider the following data:

Can you guess the missing values?

When this SQL puzzle was posted on the Oracle OTN forum, somebody immediately reacted suggesting the lag and leading analytic functions as a basis for calculating intermediate values by averaging them. The problem is that the number of intermediate values is not known in advance, so my first impression was that the solution should require leveraging recursion or a hierarchical SQL extension, at least. A breakthrough came from Gabe Romanescu, who posted the following analytic SQL solution:

select X,Y
,case when Y is null
then yLeft+(rn-1)*(yRight-yLeft)/cnt
else Y end /*as*/ interp_Y
from (
select X, Y
,count(*) over (partition by grpa) cnt
,row_number() over (partition by grpa order by X) rn
,avg(Y) over (partition by grpa) yLeft
,avg(Y) over (partition by grpd) yRight
from (
select X, Y
,sum(case when Y is null then 0 else 1 end)
over (order by X)    grpa
,sum(case when Y is null then 0 else 1 end)
over (order by X desc) grpd
from data
)
);

As usual in case of complex queries with multiple levels of inner views, the best way to understand it is executing the query in small increments. The inner-most query introduces two new columns, with the sole purpose to group spans with missing values together.

select X, Y
,sum(case when Y is null then 0 else 1 end)
over (order by X)    grpa,
,sum(case when Y is null then 0 else 1 end)
over (order by X desc) grpd
from data;

The next step

select X, Y

,count(*) over (partition by grpa) cnt
,row_number() over (partition by grpa order by X) rn
,avg(Y) over (partition by grpa) yLeft
,avg(Y) over (partition by grpd) yRight
from (
select X, Y
,sum(case when Y is null then 0 else 1 end)
over (order by X)    grpa
,sum(case when Y is null then 0 else 1 end)
over (order by X desc) grpd
from data
)

calculates four more columns:

1        cnt - length of each span

2        rn - offset inside the span

3        yLeft - the Y value at the left boundary

4        yRight - the Y value at the right boundary

The aggregate function - avg - is pretty arbitrary. There is only one non NULL value in each partition.

Now, everything is ready for the final step in which the linear interpolation formula is applied:

Note that instead of the xLeft and xRight variables we have cnt = xRight - xLeft and rn = X - xLeft.

A query leveraging an analytic SQL extension can be expressed in standard SQL:

select X, Y,
(select min(bLeft.Y+( bRight.Y - bLeft.Y)*
(bb.X- bLeft.X)/( bRight.X - bLeft.X) )
from data bLeft, data bRight
where bLeft.X < bRight.X
and bLeft.Y is not null and bRight.Y is not null
and not exists(select 0 from data bi
where bLeft.X < X and X < bRight.X
and bi.Y is not null)
and bb.X between bLeft.X and bRight.X) /*as*/ interp_Y
from data bb

Arguably, this solution is more intuitive. The original relation can be scanned and an extra column added as a scalar subquery.

The original relation corresponds to point (X,Y) on the figure. In the subquery points (bLeft.X,bLeft.Y) and (bRight.X,bRight.Y) are the boundaries of the span that we are filling in with values. The min aggregate is bogus. It insures that the subquery in the select clause is scalar. Without aggregation the subquery returns either one value or two identical values.

What if the missing values are not bound with a known value at the end of the range? This can certainly be addressed as a special case and at the cost of making the query more complicated. Alternatively, this snag is nicely solved with non-linear interpolation, in general, and Lagrange Interpolating Polynomial, in particular

Here is the product symbol, at last!

Polynomial is a simpler concept than the piece-wise linear function. A cleaner concept translates into a simpler SQL, but the interpolation result is slightly different, of course.

select x, sum(y*mul) interp_Y from (
select x,j,y, product(a) mul
from (
select bb.X x, bj.X j, bj.Y y, (bb.X-bk.X)/(bj.X-bk.X) a
from data bj,data bk, data bb
where bj.Y is not null and bk.Y is not null
and bj.X!=bk.X
) group by x,j,y
) group by x

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning