 |
|
Disjoint Sets
SQL Tips by Donald Burleson |
Enforcing that the
two tables have no common records is a very common practical
problem. Consider a typical Object-Oriented design scenario where
the Employee type has two subtypes: FullTimeEmp and
PartTimeEmp. On the database side, suppose that there are only
two tables: FullTimeEmp and PartTimeEmp. What should
be done to enforce the uniqueness of an employee id?
Certainly, both FullTimeEmp.id and PartTimeEmp.id can
be declared to be unique, but how do we guarantee that there is no
employee, who is both full time and part time? These sets have to be
disjointed:
CREATE
MATERIALIZED VIEW fullTime_intersect_partTime
select id
from FullTimeEmp
intersect
select id
from PartTimeEmp
ALTER TABLE
fullTime_intersect_partTime
ADD
CONSTRAINT disjointClasses CHECK(id is null)
There is another,
more straightforward, but somewhat cumbersome way to approach this
problem. Like any other constraint, a unique key can be enforced via
a materialized view as well. Let's start with the smaller and easier
task of defining FullTimeEmp.id unique key. Suppose that
FullTimeEmp has two more columns: name and startedAt.
Again, we begin rephrasing the constraint as an impossible
condition:
The two employee records e1 and e2 are
contradictory whenever e1.id = e2.id, and yet either e1.name ≠
e2.name or e1.startedAt ≠ e2.startedAt. No contradictory employee
records are allowed.
It easy to express
this constraint formally:
CREATE
MATERIALIZED VIEW contradictoryEmployees
select
e1.id id from FullTimeEmp e1, FullTimeEmp e2
where e1.id
= e2.id and (
e1.name <> e2.name or e1.startedAt <> e2.startedAt
)
ALTER TABLE
fullTime_intersect_partTime
ADD
CONSTRAINT disjointClasses CHECK(id is null)
It is obvious that
the number of inequality comparisons would grow with the number of
columns in the table. This is why I referred to this approach as
cumbersome. Some RDBMS implementations introduce a ROWID
pseudo column, which is guaranteed to be unique. This allows
comparing ROWIDs instead of actual values:
CREATE
MATERIALIZED VIEW contradictoryEmployees
select
e1.id id from FullTimeEmp e1, FullTimeEmp e2
where e1.id
= e2.id and e1.rowid <> e2.rowid
ALTER TABLE
fullTime_intersect_partTime
ADD
CONSTRAINT disjointClasses CHECK(id is null)
Without ROWIDs
the constraint is actually weaker than a unique key, as it cannot
distinguish duplicate records. Duplicate records do not exist in set
semantics, but SQL operates with bags. One extra nested materialized
view is needed to exclude duplicates.
The final solution
is only one step away. Simply declare the unique key constraint on a
materialized view, which is the union of FullTimeEmp and
PartTimeEmp tables. However, the union is not defined on
tables which are incompatible,
and in fact, it is easy to imagine that tables corresponding to
different subclasses have different attributes. The concern about
union not being defined for incompatible relations has already been
addressed in the previous chapter, where the outer union was
introduced.
There is another
reason why this solution is less elegant than the one with disjoint
set constraint. The outer join view is not empty, which implies
storage overhead. Although in principle the RDBMS engine could
flatten nested materialized views, do not expect this to be
implemented anytime soon.
Disjoint Intervals
Consider a list of
intervals:
table
Intervals (
head
integer,
tail
integer
)
We would like to
declare these intervals as mutually disjoint.
In math, disjoint
intervals are defined as sets that do not intersect with each other.
This definition, however, is not very useful in this situation since
the intervals disjoint condition must be formulated in terms of
intervals boundaries.
Once more the
constraint is rephrased as an impossible condition:
The intervals [head1,tail1] and
[head2,tail2] overlap whenever head2 is between head1 and tail1. No
overlapping intervals are allowed.
Wait a minute,
something must be wrong here! The interval overlapping condition has
to be symmetric with respect to both intervals, but the formal
statement written does not look symmetric at all. Indeed, what if
interval [head2,tail2] covers [head1,tail1]? Then
head2 is not between head1 and tail1, and yet the
intervals do overlap. This non-symmetry is illusory, however.
Let's write the
constraint formally:
CREATE
MATERIALIZED VIEW overlapping_intervals
select
i1.head h
from
intervals i1, intervals i2
where
i2.head between i1.head and i1.tail
ALTER TABLE
overlapping_intervals
ADD
CONSTRAINT no_overlapping_intervals CHECK(h is null)
This query shows
that i1 and i2 iterate over the set of all intervals,
each pair of intervals [a,b] and [c,d] in the set
would be considered twice: first time when i1.head = a,
i1.tail = b, i2.head = c, i2.tail = d, and second
time when i1.head = c, i1.tail = d, i2.head = a,
i2.tail = b. This reasoning, however, exposes a bug in the
implementation. Could i1 and i2 be the same interval?
One more predicate must be added that excludes this possibility:
CREATE
MATERIALIZED VIEW overlapping_intervals
select
i1.head h
from
intervals i1, intervals i2
where
i2.head between i1.head and i1.tail
and
(i2.head <> i1.head or i2.tail <> i1.tail)
ALTER TABLE
overlapping_intervals
ADD
CONSTRAINT no_overlapping_intervals CHECK(h is null)
Alternatively, an
asymmetric join condition could have been used, and each pair of
intervals only considered once. A
total order
is defined among all the intervals so that for each pair of
intervals i1 and i2 either i1
precedes i2,
or i2 precedes i1. The lexicographical order
comparison predicate i2.head < i1.head or (i2.head = i1.head and
i2.tail < i1.tail) defines a total order. The constraint can now
be written as follows:
Consider all the
pairs of intervals such that [head1,tail1]
precedes [head2,tail2]. They overlap whenever
head2 is less than or equal to tail1. Again,
no overlapping intervals are allowed.
And formally as:
CREATE
MATERIALIZED VIEW overlapping_intervals
select
i1.head h
from
intervals i1, intervals i2
where
(i2.head < i1.head or
i2.head = i1.head and i2.tail < i1.tail)
and
i2.head <= i1.tail
ALTER TABLE
overlapping_intervals
ADD
CONSTRAINT no_overlapping_intervals CHECK(h is null)