Search BC Oracle Sites

# Relational Division

SQL Tips by Donald Burleson

Relational Division is truly an exotic operator. Search any popular database forum for problems involving Relational Division. There certainly would be a few, but chances are they might be not recognized as such. For example, Chapter 1 problem 6 is almost a literal reproduction of the message posted at the newsgroup microsoft.public.sqlserver.programming. If nothing else, this section would help you to quickly identify a problem as Relational Division. After all, using the proper names is very important for effective communication.

Why it is called Division? It is the inverse of the (Cartesian) Product. Given a set of JobApplicants

Note: Not all code is included due to formatting issues, See book for full equations.

and a set of JobRequirements,

then the Cartesian Product JobApplicants - JobRequirements is

Inversely, given the JobApplicants - JobRequirements relation (called the dividend), it could be divided by JobRequirements (the divisor) and JobApplicants (the quotient) would be obtained.

The analogy between relational algebra and arithmetic goes one step further. Relational division is similar to integer division. If the integer dividend x is not a multiple of integer divisor y, the quotient is defined as the maximal number q that satisfies the inequality:

x ≥ q y

or the equality where r is called the remainder.

x = q y + r

In Relational Algebra, given a dividend X and divisor Y, the quotient Q is defined as a maximum relation that satisfies the inequality:

X Q - Y

or the equality:

X = Q - Y R

In the following example, the JobApplicants - JobRequirements relation is reduced to

which is appropriately called  ApplicantSkills. Then,

ApplicantSkills = QualifiedApplicants - JobRequirements UnqualifiedSkills

where the quotient QualifiedApplicants is

and the remainder UnqualifiedSkills is

Informally, division identifies attribute values of the dividend that are associated with every member of the divisor, such as find job applicants who meet all job requirements.

Relational Division is not a fundamental operator. It can be expressed in terms of projection, Cartesian product, and set difference. The critical observation is that multiplying the projection πName(ApplicantSkills) and JobRequirements results in the familiar Cartesian Product:

Subtract ApplicantSkills, and project the result to get the Names of all the applicants who are not qualified. Then, find all the applicants who are qualified as a complement to the set of all applicants. Formally:

QualifiedApplicants = πName(ApplicantSkills)-
-
πName( πName(ApplicantSkills)?JobRequirements - ApplicantSkills )

Translating it into SQL is quite straightforward, although the resulting query gets quite unwieldy:

select distinct Name from ApplicantSkills
minus
select Name from (
select Name, Language from (
select Name from ApplicantSkills
), (
select Language from JobRequirements
)
minus
select Name, Language from ApplicantSkills
)

The minus operator is essentially an anti-join, so it is not surprising that it can be expressed in another form that is popular in the literature:

select distinct Name from ApplicantSkills i
where not exists (
select * from JobRequirements ii
where not exists (
select * from ApplicantSkills iii
where iii.Language = ii.Language
and   iii.Name = i.Name
)
)

Admittedly, this query is difficult to understand because of the double-negative construction - not exists clause inside another not exists. The reason for the double negation is SQL's inability to express universal quantification in the relational division query:

Name the applicants such that for all job requirements there exists a corresponding entry in the applicant skills

Mathematical logic is notorious for formal transformations of one logical expression into another. In this case

x(y f(x,y) )

is re-written into

x(y f(x,y) )

without too much thinking. The Relational division query becomes:

Name the applicants such that there is no job requirement such that there doesn't exists a corresponding entry in the applicant skills

which is a sloppy wording for the SQL query that was being analyzed.

As usual, the most elegant solution requires insight. The critical observation is expressing the informal Relational division query as:

Name the applicants for which the set of all job skills is a subset of their skills

It would be nice if SQL had a subset operator (also called set containment) so that we could write

select distinct Name from ApplicantSkills i
where (select Language from JobRequirements ii
where ii.Name = i.Name)
in (select Language from ApplicantSkills)

Here the SQL syntax has been abused and the already existing in operator leveraged to denote set containment. Without an explicit subset relation available it is expressed as the emptiness of the difference between the two sets. Formally,

A B

is equivalent to

A \ B =

Applied to our case it allows the transformation of the rough first attempt to a legitimate SQL query:

select distinct Name from ApplicantSkills i
where not exists (
select Language from ApplicantSkills
minus
select Language from JobRequirements ii
where ii.Name = i.Name
)

We are not done yet. Instead of checking the sets containment they could be simply counted instead!

select Name from ApplicantSkills s, JobRequirements r
where s.Language = r.Language
group by Name
having count(*) = (select count(*) from JobRequirements)

First, ?irrelevant skills? were excluded by joining ApplicantSkills and JobRequirements. Now that the result is subset of the required skills, they are both counted and compared.

Relational division can have really bizarre applications. Consider finding the Greatest Common Divisor (GCD) for a set of integers.

Given a set of integers find the maximal integer that divides all of them.

The GCD problem is remarkable in a sense that it is connected to many areas of SQL programming. It is very tempting, for example, to define GCD as yet another user defined aggregate function. In this section, however, the GCD problem seems to be out of context, because the only thing that relates GCD and relational division is the word ?division?. It does not mean the same thing in both cases, however. Or does it?

It is the universal quantifier in our informal GCD query which makes the connection to the relational division. For illustration purposes we?ll find the GCD of just three numbers 9, 15 and 42. Let's try making this query fit formally into the relational division framework. First it is a good idea to simplify a query in order to exclude the aggregation:

Given a set of integers, e.g. 9, 15, and 42 find another set of integers such that each of them divides all the numbers from the first set.

Admittedly, this query does not look like relational division at all. The relational division operator has two inputs: the dividend relation and the divisor relation. All there is so far is something that might measure up to the divisor role. Let's suppress the lingering doubt, and promote the relation formally to the divisor. In fact, why not even keep the JobRequirements relation name from the previous development?

Therefore, we have JobRequirements:

What could be the missing ApplicantSkills dividend? The ?Applicants? are really the integers; only those which have a 'skill? to divide all the numbers listed in the JobRequirements would qualify. There we have ApplicantSkills!

This relation is imaginary, of course. The chapter on integer generators presented a way to generate all the integer pairs, and this method could in fact be adopted for this case.   Next, simply plug in JobRequirements and ApplicantSkills into any of the relational division queries that were written in this chapter, say

select Name from ApplicantSkills s, JobRequirements r
where s.Language = r.Language
group by Name
having count(*) = (select count(*) from JobRequirements)

and we have the answer!   There is one little optimization that can simplify the answer. The ApplicantSkills.Language column can be eliminated. Projection of the ApplicantSkills relation to the Name column is the familiar Integers relation. Now that there is no longer any column for the equijoin predicate s.Language = r.Language, the join condition between the Integers and JobRequirements demands the remainder of integer division to be 0. With the names that were mindlessly borrowed from the sample relational division problem, the revised query may enter an obfuscated SQL coding contest.

select Name from (
select num# as Name from Integers
where num# <= (select min(language) from JobRequirements)
), JobRequirements
where mod(Language,Name)=0
group by Name
having count(*) = (select count(*) from JobRequirements)

With one more step, switching to more appropriate names, and we have the final GCD query:

select Divisor from (
select num# as Divisor from Integers
where num# <= (select min(Element) from NumberSet)
), NumberSet
where mod(Element, Divisor)=0
group by Divisor
having count(*) = (select count(*) from NumberSet)

The only remaining action is finding the maximum in the resulting set.

Outer Union

The union operator definition in Relational Algebra has a problem. It can be applied only to compatible relations that have identical attributes. The Outer Union operator, invented by E.F.Codd, can be applied to any pair of relations.  Each relation is extended to the schema that contains all the attributes from both relations. The newly introduced columns are padded with NULLs. The resulting relations have the same schema and their tuples can, therefore, be combined together by an ordinary union.

Example. Suppose we have the Personnel relation

and the Department

The outer union Personnel Department is

This operator seems to have an extremely limited practical scope. The only usage of the outer union operator I?m aware of is the definition of (the full) outer join. Formally, the outer join of the relations A and B is defined in four steps:

1.      Join both relations: A wv    B. In the example above we?ll have

2.      Join both relations, project to the first relation schema and subtract the result from the first relation: A \ psch(A) (A wv    B)

3.      Join both relations, project to the first relation schema and subtract the result from the first relation: B \ psch(B) (A wv    B)

4.      Apply outer union to the results of 1-3.

This is actually the relational algebra versions of the outer union and outer join operations. Perhaps they should be properly named natural outer union and natural outer join. In SQL the two relations are always considered to have disjoint sets of attributes.

 This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim TropashkoYou can buy it direct from the publisher for 30%-off.

��

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:

and include the URL for the page.

 Burleson Consulting The Oracle of Database Support Oracle Performance Tuning