 |
|
Using Oracle Pivot
SQL Tips by Donald Burleson |
Pivot
and Unpivot are two fundamental operators that exchange rows
and columns. Pivot aggregates a set of rows into a single row with
additional columns. Informally, given the Sales relation, the pivot operator
transforms it into a relation with fewer rows, and some column
headings changed:
Unpivot is informally
defined as a reverse operator, which alters this relation back into
the original one.
A reader who is already
familiar with the Conditional Summation idiom from Chapter 1 would
have no difficulty writing a pivot query in standard SQL:
select
Product,
sum(case when Month=?Jan? then Amount else
0 end) Jan,
sum(case when Month=?Feb? then Amount else
0 end) Feb,
sum(case when Month=?Mar? then Amount else 0 end) Mar
from Sales
group by Product
Why aggregation and grouping? Aggregation is a natural way to handle
data collision; when several values map to the same location. For
example, a request made for the record of sales for the month of
January results in two records appearing. The reason for this
collision may be either an error in the input data or the result of
a projection applied in the inner view. The original Sales
relation might include the Month and Day columns, but
the projection simply discarded the day information. Hence,
summation is the right way to get the correct answer.
Unfortunately, the
approach with the straightforward query above quickly shows its
limitations. First, each column has a repetitive syntax which is
impossible to factor in. More important, however, is the inability
to accommodate a dynamic list of values. In this example, the (full)
list of months is static, but change months to years, and we have a
problem.
SQL Server 2005
introduced the pivot operator as syntax extension for a table
expression in the from clause:
select *
from
(Sales pivot (sum(Amount) for Month in (?Jan?, ?Feb?, ?Mar?))
As soon as a new feature is introduced people start wondering if it
can accommodate more complex cases. For example, can two
aggregations be performed at once? Given the Sales relation,
can the sales total amounts be outputted together with sales counts
like this?
The column names had to
be changed in order to accommodate extra columns, and if nothing
else, the changed column names should hint the solution. The other
idea, which should be immediately obvious from the way the table
columns are arranged in the display, is that the result is a join
between the two primitive pivot queries:
Well, what about those
fancy column names? There is nothing like JanCnt in the
original data. Indeed there is not, but transforming the month
column data into the new column with the Cnt postfix can be
done easily with string concatenation. Therefore, the answer to the
problem is:
select
scount.*, ssum.* from (
select * from (
(select product, month || ?Cnt?, amount from Sales)
pivot (count(*) for Month in (?JanCnt?, ?FebCnt?, ?MarCnt?)
) scount, (
select * from (
(select product, month || 'sum?, amount from Sales)
pivot (sum(Amount) for Month in (?JanSum?, ?FebSum?, ?MarSum?)
) ssum
where scount.product = ssum.product
When I posted this solution on the SQL server programming forum,
Adam Machanic objected noting that is very inefficient compared to
the conditional summation version:
select
Product,
sum(case when Month=?Jan? then 1 else 0
end) JanCnt,
sum(case when Month=?Feb? then 1 else 0
end) FebCnt,
sum(case when Month=?Mar? then 1 else 0
end) MarCnt,
sum(case when Month=?Jan? then Amount else
0 end) JanSum,
sum(case when Month=?Feb? then Amount else
0 end) FebSum,
sum(case when Month=?Mar? then Amount else 0 end) MarSum
from Sales
group by Product
The culprit is the pivot clause syntax. It would have been a much
more natural design if
1)
The pivot operator were designed as shorthand for the
conditional summation query, and
2) The
pivot clause were allowed in a select list, rather than being
a part of a table reference in the from clause.
The other popular request
is pivot on multiple columns. Given the Sales relation
extended with one extra column, pivot it over the combination of
Month and Day columns.
This is much easier,
simply concatenate the Month and the Day values and
consider it as a pivot column:
select *
from (
(select product, month || ?_? || day as Month_Day, amount
from Sales)
pivot (count(*) for Month_Day in (?Jan_1?, ?Jan_2?, ?Jan_3?, ?)
)
As this example demonstrates, the number of pivoted values easily
becomes unwieldy, which warrants more syntactic enhancements.
Unpivot in a standard SQL
syntax is:
select
product, ?Jan? as Month, Jan as Amount from PivotedSales
union all
select product, ?Feb? as Month, Feb as Amount from PivotedSales
union all
select product, ?Mar? as Month, Mar as Amount from PivotedSales
Again it is repetitive and not dynamically extensible. In SQL Server
2005 syntax, it becomes:
select *
from
(PivotedSales unpivot (Amount for Month in (?Jan?, ?Feb?, ?Mar?))
The introduction of
pivot and unpivot operators in SQL opens interesting
possibilities for data modeling. It may reincarnate the outcast
Entity-Attribute-Value (EAV) approach. EAV employs a single
?property table? containing just three columns: propertyId,
propertyName, and propertyValue. The EAV styled schema
design has a grave implication on how data in this form can (or
rather cannot) be queried.
Even the simplest
aggregate with group by queries cannot be expressed without
implicitly transforming the data into a more appropriate view.
Indexing is also problematic. Regular tables routinely have
composite indexes, while the EAV table has to be self-joined or
pivoted before even getting to a relation with more than one
property value column. Pivot makes an EAV table look like a regular
table.