 |
|
Advanced Oracle SQL: Using DISTINCT Keyword
Oracle Tips by Laurent Schneider
|
Laurent
Schneider is considered one of the top Oracle SQL experts, and
he is the author of the book "Advanced SQL Programming" by Rampant TechPress. The following
is an excerpt from the book.
Distinct
When counting rows, the DISTINCT keyword can be used to
count only distinct, i.e. unique rows.
SELECT
COUNT(
DISTINCT
JOB
)
FROM
EMP;
COUNT(DISTINCTJOB)
------------------
5
The number of different jobs in the employee table is
retrieved. Not all functions support this keyword.
COUNT DISTINCT does not support multiple columns. There is
often a way of achieving the effect of such a query, however. If the
concatenated size of all columns is small, it is possible to concatenate and
select the distinct strings:
SELECT
COUNT(
DISTINCT
CONCAT
(
DUMP(JOB,16),
DUMP(DEPTNO,16)
)
) DISTINCTJOBDEPTNO
FROM
EMP;
DISTINCTJOBDEPTNO
-----------------
9
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 3 (0)| 00:00:01 |
| 1 | SORT GROUP BY | | 1 | 11 | | |
| 2 | TABLE ACCESS FULL| EMP | 14 | 154 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Another approach is to count all rows from a subquery using
DISTINCT:
SELECT
COUNT(*)
FROM
(
SELECT
DISTINCT
JOB,
DEPTNO
FROM
EMP
);
DISTINCTJOBDEPTNO
-----------------
9
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (25)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 11 | | 4 (25)| 00:00:01 |
| 3 | HASH UNIQUE | | 11 | 121 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 154 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
The first query concatenates in an unambiguous string the
representation of the different columns. The second query selects distinct
values of jobs and departments and counts the rows retrieved in the inner query.
Note that COUNT(DISTINCT JOB) does not count the nulls.
Keep
Note: The KEEP syntax requires either DENSE_RANK FIRST ORDER BY
or DENSE_RANK LAST ORDER BY.
It is possible to aggregate only the top or bottom rows
using the KEEP clause.
SELECT
MIN(DEPTNO),
SUM(SAL)
KEEP
(
DENSE_RANK
FIRST
ORDER BY
DEPTNO
) SUMDEPTNO
FROM
EMP;
MIN(DEPTNO) SUMDEPTNO
----------- ----------
10 8750
Before the SUM function is applied, the rows are sorted by department
number and only the salaries of the first department are passed in the aggregate
function. The result is the total salary of department 10.
A common query is to select the employee with the highest
salary. The Oracle 8i approach is to use a nested subquery:
SELECT
ENAME,
DEPTNO,
SAL
FROM
EMP
WHERE
SAL=
(
SELECT
MAX(SAL)
FROM
EMP
);
ENAME DEPTNO
SAL
---------- ---------- ----------
KING 10 5000
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL | EMP | 1 | 13 | 3 (0)| 00:00:01 |
| 2 | SORT AGGREGATE | | 1 | 4 | | |
| 3 | TABLE ACCESS FULL| EMP | 14 | 56 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
1 - filter("SAL"= (SELECT MAX("SAL")
FROM "EMP" "EMP"))
To retrieve the name of the best paid employee, the table is
accessed twice, once to retrieve the maximum salary and once to retrieve the
matching name and department. If there is a tie for the top salary, all rows are
returned.
A more efficient approach is to use the KEEP keyword:
SELECT
MAX(ENAME)
KEEP
(
DENSE_RANK
FIRST
ORDER BY
SAL DESC,
EMPNO
) ENAME,
MAX(DEPTNO)
KEEP
(
DENSE_RANK
FIRST
ORDER BY
SAL DESC,
EMPNO
) DEPTNO,
MAX(SAL) SAL
FROM
EMP;
ENAME DEPTNO
SAL
---------- ---------- ----------
KING 10 5000
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
| 2 | TABLE ACCESS FULL| EMP | 14 | 238 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
The maximum salary is returned. The department and employee
name for the highest salary are also returned. This time, the table is accessed
only once. In case of a tie for the top salary, the one with the lowest employee
number is returned. This way, only one row is returned to the aggregate function
and this unique row is consistent because EMPNO is the primary key and uniquely
identifies the row.
Consider the following query:
INSERT INTO
EMP
(
EMPNO,
ENAME,
DEPTNO,
SAL
)
VALUES
(
1001,
'KATE',
40,
5000
);
SELECT
MAX(ENAME)
KEEP
(
DENSE_RANK
FIRST
ORDER BY
SAL DESC
) ENAME,
MAX(DEPTNO)
KEEP
(
DENSE_RANK
FIRST
ORDER BY
SAL DESC
) DEPTNO,
MAX(SAL) SAL
FROM
EMP;
ENAME DEPTNO
SAL
---------- ---------- ----------
KING 40 5000
ROLLBACK;
For the first rows in descending order of salaries, Kate and
King are passed to the aggregate functions. The MAX functions return the highest
name (KING) and the highest department (40); they do not belong to the same row.