 |
|
SQL Design Patterns
SQL Tips by Donald Burleson |
SQL is a very successful language. Yet, is there a place for an
advanced SQL book in the era of ?Learn Technology X in 10 Minutes??
Indeed, the tech industry today emphasizes a shallow knowledge of a
huge number of technical skills, rather than deep knowledge of a
more specialized skill. That's how workers have been getting
jobs, by knowing a tiny bit of many skills employers might need.
Someone who knows how to use SQL in an advanced way does not have a
proportionate advantage in getting a job over someone who knows only
the basics. If this second person, however, also knows J2EE,
XSLT, Ajax, Flash, or any other flash in the pan to a similarly
basic level, then he or she has a greater employment advantage.
The major flaw of this line of reason is equating the
sophistication of SQL to these rather unsound technologies. This
might be surprising to a newcomer who generally finds SQL a little
bit old fashioned compared to the ?modern? programming languages. It
is almost as old as C, which spawned at least 3 newer generation
languages already, and it looks like COBOL, so why isn't it obsolete
yet? Let me assure you that this appearance is misleading. Under the
cover of sloppy and archaic syntax, a high abstraction language is
found.
SQL programming is very unusual from the procedural perspective:
there is no explicit flow control, no loops and no variables either
to apply operations to or to store intermediate results into. SQL
heavily leverages predicates instead, which elevates it to Logic
Programming. Then, the grouping and aggregation syntax blends
naturally into this already formidable logic foundation. Anybody who
rediscovers that matrix or polynomial multiplication can be written
in just three lines of code has a long lasting impression.
These ingredients that make SQL unique partially explain why
advanced SQL programming does not revolve around syntax features,
but demands a SQL programmer to develop an ability to recognize
complex patterns. Yes, beyond a certain point a skill of piling up
subqueries does not give much of a return and one has to study some
rudimentary theory, which classifies known SQL solutions into
patterns.
Patterns in procedural programming became popular a decade ago,
originated with a landmark book by Gamma et. al. Each pattern has
its name so that developers could quickly refer to it by just a
name. ?Oh, that's a singleton!? instead of a lengthy description and
often accompanied with a code snippet.
Patterns received a sour reception in a high abstraction language
community. The prevailing perception was that patterns are a
signature of low level programming. When a programmer sees patterns
in her programs, it is a sign of trouble. The shape of a program
should reflect only the problem it needs to solve. Any other
regularity in the code indicates that abstractions are not powerful
enough.
In reality, however, any language is quite limited in its
abstraction power, declarative languages notwithstanding. Sooner or
later we have to find workarounds for those limitations. This is how
SQL patterns were born!
Patterns greatly improve our communication capabilities. On
internet SQL forums it is not uncommon for people to ask the same
question over and over again. Pattern names such as Interval
coalesce or Relational division, for example, rarely
surface in the discussion thread, giving way to numerous reply
messages pointlessly competing to see who can find a query that does
not look intimidating. Patterns allow succinct replies like this:
?Lookup the Interval coalesce method in the - textbook?.
Establishing common pattern names is the first goal of this book.
Most of the patterns have standard names: Skyline query,
Pivot, or Nested Intervals, for example. Few do not; we
have to work out a name, like the fancy sounding Discrete
interval sampling, for example.
When presenting SQL patterns in this book I decided to dismiss
the standard template form. Template is perfect for reference
material, but is a nuisance for a textbook. More important than this
stylistic comment, however, is the fact that fairly soon you might
stumble into patterns that require little familiarity with
undergraduate level math. Don't be discouraged, however: as John
Garmany suggested, many topics start making sense on second reading.
In the Indicator Functions section, for example, you may want
to skip the theory, first, rewind to the sample problem and SQL
solutions, then, rollback to the theory.
List of patterns
1)
Counting
2)
Conditional summation
3)
Integer generator
4)
String/Collection decomposition
5)
List Aggregate
6)
Enumerating pairs
7)
Enumerating sets
8)
Interval coalesce
9)
Discrete interval sampling
User-defined aggregate
1)
Pivot
2)
Symmetric difference
3)
Histogram
4)
Skyline query
5)
Relational division
6)
Outer union
7)
Complex constraint
8)
Nested intervals
9)
Transitive closure
10)
Hierarchical total