Correlated queries are
often written so that the question in the inner query is one of
existence. We illustrate this by introducing a third table called
section, with the
attributes section_id,
course_number,
semester,
year,
instructor,
bldg, and
room. Now
let us study the query, "Find the names of students who have taken a
computer science class and made a grade of 'B.'" This query can be
done several ways, as discussed below.
As a non-correlated subquery:
SELECT s.sname
FROM
student s
WHERE s.stno
IN
(SELECT
gr.student_number FROM
grade_report, section
WHERE
section.section_id = gr.section_id
/* join condx grade_reportsection */
AND
section.course_num LIKE
'COSC____'
AND
gr.grade = 'B');
You can think of this query as forming the set of student
numbers of students who have made Bs in COSC courses — the inner query. In
the inner query, we need to have both the
grade_report and the
section tables because the grades
are in grade_report and the
course numbers are in section.
Once we form this set of student numbers (i.e., we complete the inner
query), the outer query looks through the student table and SELECTs only
those students who are members of the inner query set. Note that this query
could also be done by creating a double-nested subquery containing two INs
or could be found using a three table join.
Had we chosen to write the query with an unnecessary
correlation, it might look like this:
SELECT s.sname
FROM
student s
WHERE s.stno
IN
(SELECT
gr.student_number
FROM grade_report gr, section
WHERE
section.section_id = gr.section_id
/* join condx grade_report-section */
AND
section.course_num LIKE
'COSC____'
AND
gr.student_number = s.stno
AND
gr.grade = 'B');
In this case, the use of the student table in the
subquery is totally unnecessary. Although correlation is unnecessary, we
provide this example for several reasons:
-
To show when correlation
is necessary
-
To show how to untangle
unnecessarily correlated queries
-
To show how you might
migrate your thought process toward correlation — another query option
We can perform the query a different way using the
keyword EXISTS. There will be situations in which the correlation of a
subquery is necessary. Because this is true, we show another way to do the
correlated query with EXISTS; it looks like this:
SELECT s.sname
FROM
student s
WHERE EXISTS
(SELECT 1
FROM grade_report gr, section
WHERE
section.section_id = gr.section_id
/* join
condx grade_report-section */
AND
section.course_num like
'COSC____'
AND
gr.student_number = s.stno
AND
gr.grade = 'B');
The EXISTS predicate
says, "Choose the row from student in the outer query if the subquery is
TRUE (i.e., if it finds anything)." In the non-correlated case, we tied the
student number in the student table to the inner query by the IN predicate:
SELECT s.stno
FROM
student s
WHERE s.stno
IN
(SELECT
"student number ...)
With EXISTS, we do not use any attribute of the
student table, and we have
indicated using EXISTS with (SELECT 1 … ).
The use of SELECT * in the inner query is common among
SQL programmers. However, from an "internal" standpoint, SELECT * causes the
SQL engine to check the data dictionary unnecessarily. Because the actual
result of the inner query is not important, it is strongly suggested that
you use SELECT 'X' (or SELECT 1) ... instead of SELECT * ... so that a
constant is SELECTed instead of some "sensible" entry. The SELECT 'X' … or
SELECT 1 ... is simply more efficient.
In the EXISTS case, we do not
specify what attributes need to be SELECTed in the inner query's result set;
rather, we use a connection in the WHERE clause of the inner query. EXISTS
forces us to correlate the query. This query:
SELECT s.sname /* exists-uncorrelated */
FROM
student s
WHERE EXISTS
(SELECT 'X'
FROM grade_report gr,section t
WHERE
t.section_id = gr.section_id
/* join
condx grade_report-section */
AND
t.ccourse_num like 'COSC____'
AND
gr.grade = 'B');
(without the correlation, but with
EXISTS) means that for each student tuple, you test the joined
grade_report and
section to see whether there is a
course number like 'COSC' and a grade of 'B' (which, of course, there is) —
you unnecessarily ask the subquery question over and over again. The result
from this latter uncorrelated EXISTS query is the same as:
SELECT s.sname FROM student s
The point is that the correlation is necessary when we
use EXISTS.
Consider another example in which a correlation can be
used. Suppose we want to "Find the names of all students who have three Bs."
A first pass at a query might be something like this:
SELECT s.sname
FROM
student s WHERE "something" IN
(SELECT
"something"
FROM
grade_report
WHERE
"count of grade = 'B'" > 2);
This query can be done with an uncorrelated HAVING
clause, but we want to show how to do this with a correlated query. Suppose
we arrange the subquery to use the student number from
student table as a filter, and
count in the subquery only when a row in
grade_report correlates to that
student. The query (this time with an implied EXISTS) looks like this:
SELECT s.sname
FROM
student s
WHERE 2 <
(SELECT count(*)
FROM grade_report gr
WHERE gr.student_number = s.stno
AND gr.grade = 'B');
Although there is no EXISTS in the query, it is implied.
The syntax of the query does not allow an EXIST, but the sense of the query
is "WHERE EXISTS a COUNT OF 2 WHICH IS LESS THAN … ". In this correlated
query, we have to examine the
grade_report table for each member of the
student table to see whether or
not the student has two Bs. We test the entire
grade_report table for each
student tuple in the other query.
If it were possible, a subquery without the correlation
could be more desirable. The overall query might be:
SELECT s.sname
FROM
student s
WHERE s.stno
in
(subquery that defines a set of students who have made 3
B's)
And we might attempt to do this:
SELECT s.sname
FROM
student s
WHERE s.stno
IN
(SELECT
gr.student_number FROM grade_report gr WHERE gr.grade
=
'B');
However, that would give us only students who had made at
least one B. To get to the 3 B's we could try this:
SELECT s.sname
FROM
student s
WHERE s.stno
IN
(SELECT
gr.student_number, count(*)
FROM
grade_report gr
WHERE
gr.grade = 'B'
GROUP
BY gr.student_number
HAVING count(*) > 2);
This won't work because the subquery cannot have two
attributes unless the main query has two attributes — it has to have only
gr.student_number to match
s.stno. So, we might settle on
this cumbersome query:
SELECT s.sname
FROM
student s
WHERE s.stno
IN
(SELECT
student_number
FROM
(SELECT student_number,
count(*)
FROM
grade_report gr
WHERE
gr.grade = 'B'
GROUP
BY student_number
having count(*) > 2));
There are several ways to query the database with SQL. In
this case, the correlated query may be the easiest to see and perhaps the
most efficient.