Search BC Oracle Sites

# Conditional Summation with CASE Operator

SQL Tips by Donald Burleson

The genesis of the conditional summation idiom is an equivalence between count(*) and sum(1). Formally,

select count(*) from emp

is the same as:

select sum(1) from emp

There is must be something wrong here. How these two queries can be equivalent? They produce different result when the Emp relation is empty.

This is merely a misfortunate historical artifact. Had SQL ANSI standard committee been in a mathematical mood that day, they would certainly fixed the definition to make them equivalent. This would make SQL world a tiny bit simpler.

More important, however, is a common misconception that count should have any arguments at all. First, for most practical purposes count = sum(1), and there is no free variable parameter within the sum(1) expression. Second, think about how the count function may be implemented on a low-level. A reasonable code must look like this:

int count = 0;
for( int i = 0; i< 10; i++)
count = count + 1;

The count variable is updated during each row processing with the unary increment operator +1. Unlike count, any ?normal? aggregation has to use a binary operation during each aggregate incremental computation step

int sum = 0;
for( int i = 0; i< 10; i++)

sum = sum + element[i];

that is, + for sum, for max, for min, etc. Therefore, one argument is needed for normal aggregates, and no arguments for count.

Argument for COUNT

The formal difference between

select count(*) from emp

and

select count(1) from emp

has been the subject of lengthy investigations on some internet forums. If there were indeed any implementation and performance difference between the two, then one can argue that the query optimizer should transform the query accordingly to eliminate it. In other words, this counting syntax quirk is not worth 2 cents.

OK, as far as simple counting is concerned, there does not appear to be any need for an argument. But what about

select count(ename) from emp

where only non-null values of the ename column are counted? The description of count(ename) in the previous sentence translates directly into SQL:

select count(*) from emp where ename is not null

So, count(ename) is no more than a syntax shortcut.

select count(distinct ename) from emp

where the count aggregate function accepts a column expression with a keyword? This is yet another shortcut:

select count(*) from (
select distinct empno from emp
)

What if counting two different values at the same time like this is preferred:

select count(ename), count(*) from emp

Even though it looks like SQL has a dedicated syntax shortcut for every imaginable task, at this point it is easy to argue that these extensions are nifty at least in some practical cases.

Enter the conditional summation pattern. Whenever rows are counted satisfying a certain criteria such as:

select count(*) from emp
where sal < 1000

and the where clause seems to hinder the query's evolution to a more sophisticated form,  it can be rewritten without the where clause:

select sum(case when sal < 1000 then 1 else 0 end)
from emp

Conditional summation queries scale up nicely to accommodate more complex requirements. In the example with the familiar Emp table,

the previous query is transformed to count the small salaries per each department by amending it with group by

select deptno,
sum(case when sal < 1000 then 1 else 0 end) cnt
from emp

group by deptno

The subtle novelty here is that the conditional summation query is no longer equivalent to the former attempt restricting the condition in the where clause:

select deptno,count(*) from emp
where sal < 1000
group by deptno

Zero counts were perfectly legal in the aggregation without the grouping case. Disappearing zeros are a sign of (yet another) SQL inconsistency.

Aggregation without Grouping

An aggregation with no grouping is, in fact, an aggregation within a single group. If SQL syntax allowed grouping by the empty set of columns , then a simple aggregate

select count(*) from T

could also be represented as

select count(*) from T

group by

Without the empty set syntax, we still can write

select count(*) from T

group by 0

The 0 pseudo column is a constant expression, so that the table T is partitioned into a single group effectively the same way as with the empty set.

Perhaps the most important rationalization for the conditional summation idiom is counting by different criteria. Without conditional summation we would have to count by each individual condition in a dedicated query block, and combine those counts with a join. The pivot operator, which will be studied in Chapter 3, is a typical showcase of this idea.

Before the case operator became widely available in off-the-shelf RDBMS systems, a much more ingenious counting method with an indicator function was employed.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning