|
|
Slowly changing dimensions modeling tips
Oracle Database Tips by Donald Burleson |
By Steve Karam, the world's youngest Oracle ACE
and Oracle certified Master.
Modeling slowly changing dimensions
Oracle is only a computer program, and
therefore does exactly what it is coded to do (most of the time). Quantum
mechanics really do not seem to have any relevance in the Oracle world because
it is coded with specific outcomes to every true-or-false question. Logic gates
are applied to bits, and the programmed outcome takes place. Non-programmed
outcomes result in errors (ORA-600, ORA-7445).
However, things get more complex when we are
modeling new environments because we end up dealing with conceptual outcomes
with multiple possibilities. This is especially true in dimensional modeling,
because dimensions are supposed to hold factual lookup data. How can we record a
fact when important data regarding that fact may change?
For example, imagine a tax company that keeps
track of deductions found in their various offices and wishes to form a star
schema that will allow past and present analysis.
Let CLIENT_DIM record
= E. Schr?inger, 3 dependents
Let TIME_DIM record
= 2006 tax year
Let LOCATION_DIM record = Virginia Beach, VA
The data will come together in our fact table
to show that he had $25,000 in deductions for this combination of dimensional
data. This ?fact? is currently safe. During the next year, E. Schr?inger and
his wife have a lovely baby boy (at least in this universe!).
Let CLIENT_DIM record
= E. Schr?inger, 4 dependents
Let TIME_DIM record
= 2015 tax year
Let LOCATION_DIM record = Virginia Beach, VA
The data will once again come together in our
fact table, this time reporting $28,000 in deductions (I?m not a CPA, don't get
me on tax code) due to the extra dependent. This ?fact? is currently safe, just
as the last one was.
However, we now have a paradox in our data. In 2015, we report $28,000 in deductions, which was based on the fact that E.
Schr?inger had four dependents. In 2006, we reported $25,000 in deductions due
to E. Schr?inger's 3 dependents. When we run our analytic reports for 2015, we
will get great results; we will be able to break down the deductions and the
number of dependents will play a proper role in these calculations.
But when we run our reports against 2006, the
deduction calculations will not compute properly. The CLIENT_DIM record will
show 4 dependents, but the deduction amount for the 2006 year will have been
based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over
time.
This is known as a slowly changing dimension
(or slowly changing dimension). Though we may not realize it, almost every
dimension is in fact slowly changing; stores may move, clients may die
(especially if they hang out with sadistic quantum physicists with ready
supplies of hydrocyanic acid), and even our human definition of time can change
over time (consider the changes to daylight savings this year).
However, we don't have to worry about all of
these possible changes; we only have to worry about the ones pertaining to the
facts on which we are attempting to report. Our business needs, in the end,
determine which dimensions must be slowly changing.
There are three types of slowly changing
dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the
designer of the star schema eliminate paradox from their dimensional model (just
as the three interpretations of the Schr?inger's Cat thought experiment tries
to eliminate the paradox of the living dead).
- Type 1 slowly changing dimension:
Overwrite the old value with the new value and call it a day. This is very
useful when dealing with issues such as typos on the client's name. We don't
care about the history in this case because it was incorrect anyways.
- Type 2 slowly changing dimension:
Create a new record in the dimension with a new primary key. In the example
we?ve given, there would be two records in CLIENT_DIM for E. Schr?inger,
one in which he has 3 dependents and once in which he has 4. Though he is
one person from the business point of view, he is two people from a
dimensional point of view.
- Type 3 slowly changing dimension:
Overwrite the old value with the new value, and add additional data to the
table such as the effective date of the change. This type of slowly changing
dimension resolution would be beneficial if there is a change that can
happen once and only once (such as death).
These three types of slowly changing dimension
resolution usually help in resolving changes to ?factual? lookup data. However,
we can see clear correlations between these three types of resolutions and the
three interpretations of the Schr?inger's Cat thought experiment!
slowly changing dimension Management
Type 1
This clearly matches up with the Copenhagen Interpretation of the Schr?inger's
Cat thought experiment. In the Copenhagen Interpretation, the state of the cat
changes and all other states are discarded as the waveform collapses. Criticism
to this interpretation applies as well to Type 1 slowly changing dimension
resolution. The Copenhagen Interpretation ignores the possibility of
reconstruction; in quantum mechanics, it must be possible to return to any
original state before measurement was taken place. In our star schema, it also
ignores the possibility of reconstruction; we will not be able to return to the
original state or even acknowledge that a previous state existed for the purpose
of analytics.
slowly changing dimension Management
Type 2
This matches with the Many Worlds Interpretation of the Schr?inger's Cat
thought experiment. Instead of completely destroying the other possible
waveform, we simply maintain that the two possibilities decohere and form their
own universes that will no longer share any correlation. This
interpretation shares a similar problem with Type 2 slowly changing dimension
resolution. By spawning new records (universes) as the outcome of a changing
event, we create multiple possibilities that no longer share any correlation.
For instance, if E. Schr?inger has a new
dependent, there will be two resulting rows: one in which has 3 dependents and
one in which he has 4. If he then legally changes his last name to Schroedinger,
we will have to record that change since it is important for tax records.
Because of this, we will have three total records for this one client (E.
Schr?inger with 3 dependents, E. Schr?inger with 4 dependents, and E.
Schroedinger with 4 dependents). These three records will not have any
correlation unless we create some sort of superkey that properly identifies a
single person and their many instances. This will be important if we will be
doing mining that incorporates multiple times, clients, and locations in our
analysis.
slowly changing dimension Management
Type 3
This matches with the Many Histories
Interpretation of the Schr?inger's Cat thought experiment. When an observed
outcome occurs (like the birth of a new child), the old record is changed to
reflect the new ?real? outcome; in this case, the addition of a single
dependent. However, the change is noted by the addition of a column such as an
effective date, to show that this is not the only outcome that has ever
existed, but it is THE outcome that does exist at this time.
In effect, the old outcome (3 dependents) still
exists, but is discarded, ignored, and irretrievable now that the outcomes are
decoherent. In our star schema, this type of resolution will only provide us
with a confusing result of 'this is the case now, but it was not always so. It
changed on ?.? For some situations, as with Type 1 resolution, this will
suffice, such as the death of a client. We only need to record that death once.
However, if any new data enters our model after the death of the client
(post-mortem taxes?) or if the status changes again (miraculous recovery!), we
may have unreliable report output.
Conclusion
In quantum mechanics, we are calculating
multiple events that happen at the same exact point in time (or over a period of
unknown time); whereas in a data warehouse we are dealing with history and fully
acknowledge the changes time may have on our data. However, one cannot help but
notice the correlation between the two forms of paradox and their resolutions.
In fact, in data warehousing the so-called
Schr?inger's Cat paradox becomes even more problematic because we are forced to
not only predict future outcomes based on dimensional data, but to also report
on past/present information based on the same data. Physicists attempting to
provide interpretations of this thought experiment only have to worry about
future conditions; the past and present are unmeasured, and therefore have no
relevance on the problem except that they are in a quantum state.
They seek to explain the future of the cat once
observation/decoherence have taken place. If Schr?inger's Cat were in a data
warehouse, we would have to analyze the entire life of the cat, the cat inside
that horrible box of doom, and try to figure out whether the cat will want dry
food, wet food, or a good burial place after the experiment is finished.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |