Search BC Oracle Sites

# Counting in SQL

SQL Tips by Donald Burleson

Counting is one of the basic patterns that an SQL developer learns after learning the basics: selection, projection, join, and subquery. While counting might look deceptively easy in a context of a single table, it becomes intellectually challenging when two tables are joined together and grouping applied.

The first half of this chapter will present information to enable the reader to write counting queries in a complex context.

The second half of the chapter will present information regarding conditional summation. This pattern is a beautiful combination of two SQL constructs: the case operator, and aggregation. Conditional summation has numerous applications, which will also be presented throughout the remainder of the chapter.

Counting Ordered Rows

Let's start with a basic counting problem. Suppose we are given a list of integers, for example:

and want to enumerate all of them sequentially like this:

Enumerating rows in the increasing order is the same as counting the number of rows preceding a given row.

SQL enjoys success unparalleled by any rival query language. The  reason for such popularity might be credited to its proximity to the English language.  Examine the informal idea carefully:

Enumerating rows in increasing order is counting how many rows precede a given row.

Perhaps the most important thing to note is the rows in the source table are referred to twice: first, to a given row, and second, to a preceding row. Therefore, the number list must be joined with itself as shown in Figure 1.1.

Surprisingly, not many basic SQL tutorials, which are so abundant on the web today, mention a Cartesian product. A Cartesian product is a join operator with no join condition

select A.*, B.* from A, B

Carrying over this idea into a formal SQL query is straightforward. As it is the first query in this book, it will be performed step by step. The Cartesian product itself is:

select t.x x, tt.x y
from T t, T tt

The triangle area below the main diagonal is:

select t.x x, tt.x y
from T t, T tt

where tt.x <= t.x

And finally, only one column is needed - t.x - to group the previous result by and count:

select t.x, count(*) seqNum
from T t, T tt
where tt.x <= t.x

group by t.x

There is a certain harmony of group by operator in a joint effort with relational join. There are also drawbacks that we?ll discuss later.

Equivalence Relation and Group By

Almost any other SQL query uses the group by operator. Why is the group by operator so powerful? It is not among the fundamental relational algebra operators. A partial answer to this fascinating efficiency is that group by embodies an equivalence relation. Indeed, it partitions rows into equivalence classes of rows with identical values in a column or a group of columns, and calculates aggregate values per each equivalence class.

So what happens if the programmer modifies the problem slightly and asks for a list of pairs where each number is coupled with its predecessor?

Let me provide a typical mathematician's answer, which is remarkable in a certain way. Given that it has already been shown how to number list elements successively, it might be tempting to reduce the current problem to the previous one:

Enumerate all the numbers in the increasing order and match each sequence number seq# with predecessor seq#-1. Next!

This attitude is, undoubtedly, the most economical way of thinking, although not necessarily producing the most efficient SQL. Therefore, let's revisit our original approach, as illustrated on Figure 1.2.

This translates into the following SQL query:

select t.x, max(tt.x) predecessor
from T t, T tt
where tt.x < t.x
group by t.x

Both solutions are expressed in a standard SQL leveraging join and grouping with aggregation. Alternatively, instead of joining and grouping why not simply calculate the count or max in place as a correlated scalar subquery:

select t.x,
(select count(*) from T tt where tt.x <= t.x) seq#
from T t
group by t.x

The subquery always returns a single value; this is why it is called scalar. The tt.x <= t.x predicate connects it to the outer query; this is why it is called correlated. Arguably, leveraging correlated scalar subqueries is one the most intuitive techniques in writing SQL queries.

Is GROUP BY Redundant?

Chris Date asserts that the group by operator is redundant since

select deptno, avg(sal) from Emp

group by deptno

could be rewritten as:

select distinct deptno,

(select avg(sal) from Emp ee

where e.deptno = ee.deptno)

from Emp e

Unlike Date, who exploits this fact as evidence of SQL deficiencies, it is rather viewed as yet another demonstration of the power of scalar subqueries.

How about counting rows that are not necessarily distinct? This is where the method breaks. It is challenging to distinguish duplicate rows by purely logical means, so various less ?pure? counting methods were devised. They all, however, require extending the SQL syntactically, which was the beginning of slipping along the ever increasing language complexity slope.

Here is how an analytic SQL extension counts rows:

select x, rank() over(order by x) seq# from T; -- first problem
select x, lag() over(order by x) seq# from T; -- second problem

Many people suggest that not only is it more efficient, but more intuitive. The idea that ?analytics rock? can be challenged in many ways. The syntactic clarity has its cost: the SQL programmer has to remember (or, at least, lookup) the list of analytic functions. The performance argument is not evident, since non-analytical queries are a simpler construction from the optimizer perspective. A shorter list of physical execution operators implies fewer query transformation rules, and less dramatic combinatorial explosion of the optimizer search space.

It might even be argued that the syntax could be better. The partition by and order by clauses have similar functionality to the group by and order by clauses in the main query block. Yet one name was reused, and the other had been chosen to receive a new name. Unlike other scalar expressions, which can be placed anywhere in SQL query where scalar values are accepted, the analytics clause lives in the scope of the select clause only. I have never been able to suppress an impression that analytic extension could be designed in more natural way.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.