 |
|
Indicator and Step
Functions
SQL Tips by Donald Burleson |
An indicator
function 1A(x) maps every element x of a set A into 1,
and any element which is not in A into 0. Formally:
1A(x)
:= if x
∈ A then 1 else 0 endif
Set operations can
be expressed in terms of indicator functions. Intersection is as
simple as multiplication:
1A∩B(x) = 1A(x)
1B(x)
Unlike set theory
where union is dual to intersection, there is no duality between
multiplication and addition of indicator functions. Therefore, union
has to be expressed via the inclusion-exclusion principle as:
1A∪B(x) = 1A(x)
+1B(x) - 1A(x) 1B(x)
In SQL context, the
indicator function's domain is a set of numbers. An indicator
function for a set x∈[0,∞) is a very important special case. It is called
(a primitive) step function, and has a designated notation:
1x
:= 1[0,∞)(x)
Let's also write
the primitive step function definition in pseudo programming
notation, which was used for the indicator function in the very
beginning of the session
1x
:= if x ≥ 0 then 1 else 0 endif
If the abstraction
level is raised and the hard coded constants 0, 1 and yet another 0,
is converted into variables,
if x
≥ x0 then α else β endif
then the pseudo
code starts looking similar to the case operator. This
expression is a generic step function (although it doesn't have any
abbreviated notation).
The generic step
function can be expressed via the primitive step function:
if x
≥ x0 then α else β endif = α 1x-x0 + β 1
x0-x
So far, so good
handling simple case operators, but what about nested case
expressions? Consider the following:
if x
≥ 0 then (if y ≥ 0 then 1 else 0 endif) else 0 endif
Easy: the above
formula for general step function should handle the case where α is
an expression rather than a constant. By substitution we have
if x
≥ 0 then (if y ≥ 0 then 1 else 0 endif) else 0 endif = 1y
1x
as if we had just
an intersection of the x ≥ 0 and y ≥ 0 sets!
However, still
missing is a way to express the step function in SQL. True, we can
trivially utilize the case operator, but then we can hardly
justify learning the step and indicator functions. The key formula
is expressing the step function via the standard numeric sign
function:
step(x) := sign(1+sign(x))
Now queries with
conditional expressions can be written in a peculiar way:
select ename, sal,
sign(1+sign(sal-2000)) SALgt2000 -- i.e. step(x-2000)
from emp
The step function
method was the only game in town a long time ago when the case
operator was not part of the SQL standard yet. Nowadays a seasoned
SQL programmer writes a case expression without giving it a
second thought. Yet, there are rare cases when clever application of
indicator function is still a contender. Consider the following
data:
select
* from Transactions
You are asked to
sum transaction quantities grouped by Date to produce the
output like this:
The first step
towards a solution is recognizing the indicator function hidden in
the above data. The character Type column is very
inconvenient to deal with; can it be transformed into numeric? The
two value column certainly can be coded with numbers, but that is
not what we are looking for. It is the instr(Type, ?debit?)
and instr(Type, ?credit?) expressions that convert the
character column values into indicator functions. We proceed by
simply summarizing the Amount weighted by indicator
functions:
select Date,
sum(Amount*instr(Type,?debit?)) debit,
sum(Amount*instr(Type,?credit?)) credit
from Transactions
group by Date
This nice solution is due to Laurent Schneider. Compare it to a
conditional summation with the case operator:
select Date,
sum(case when Type=?debit? then Amount else 0 end) debit,
sum(case when Type=?credit? then Amount else 0 end) credit
from Transactions
group by Date