A couple of weeks ago I posted an
article asking
whether
dimensional modeling was suitable for all types of reporting,
even operational, and I got the largest amount of feedback
that I'd ever received for an article. The article was in
response to a project I was working on where this question had
come up, and quite a few people asked me to report back after we
put the design together. The work was for a university, so what
I'll be talking about are students, registrations, courses,
admissions and so forth.In short, yes we did put together a
dimensional model, but the key to it was in some circumstances
to "snowflake" the key dimensions, so that we could store facts
at different levels of granularity. However, when we first
approached it, we kept an open mind and it was only after a day
or so that it was clear that a dimensional model would work,
which was nice as I'd tried not to be to ideological about it at
the start but as time went on it became clear that a dimensional
design would work, and would in fact be preferable to a 3NF
design.
Right from the start, it seemed obvious to me that there was
some benefit in separating data items in the warehouse into
those that were "facts" - student admissions, course grades,
vacancies advertised, people appointed to posts - and dimensions
- students, courses, departments, staff, cost centres and so on.
This would make sense whatever model we worked to, even if we
didn't formally state that it was a dimensional warehouse, as it
would tidy up what would otherwise be a difficult data model to
navigate. We carried out some interview and, just as when
putting a normal, OLTP system together, identified the key
entities and relationships within the data. At the same time, we
looked out for natural hierarchies in the data with an eye to
how data would later aggregate in the warehouse.
After a couple of days, the list of entities, and the way
they related together in hierarchies, looked something like
this:

Now, all so far so good, but there were a couple of
interesting bits to this, that initially made us feel that a
dimension model might not be flexible enough for this customer:
- firstly, the facts - registrations, admissions, grades,
assessments and so on - were often stored at differing levels
of granularity. For example, students registered on both
courses and modules, so a single courses dimension, with
module ID the primary key and course details denormalized into
the same dimension table, wouldn't work
- secondly, a lot of the operational reporting would be done
against what would be higher levels in the dimension, meaning
again that if we had a single denormalized dimension table,
users would have to work out that a single course would be in
the table multiple times, and they'd need to select distinct
to return details, and
Still, the essence of a dimensional model was there, but the
issue that was causing us a problem was the idea of trying to
get all the levels of say the courses dimension into one
denormalized dimension table. If we did this, we could only
store facts at the lowest level - module - and if users wished
to print out lists of courses, not modules, they'd have to be
aware that the courses information was denormalized into this
table and contained repeated rows.
The trick to it then was down to a comment that David
Aldridge made to the original article, where he made the point
that, if facts are stored at differing levels of granularity,
you may want to normalize your model more. Therefore, what we
eventually put together was a snowflake model, rather than a
straight star schema, and broke the components of for example
the courses dimension - module, course, school, faculty and so
forth - into separate tables. By doing this, we kept the various
entities in their own tables, making it easier for users to run
operational reports, but we kept the basic idea of splitting
items into facts and dimensions.
Now to do this, I ended up using Oracle Designer 9i, as OWB
10g doesn't support snowflaked dimensions. I was quite surprised
actually at how good Designer was, and certainly in terms of
diagramming and modeling it's a lot better than Oracle Warehouse
Builder. I ended up putting together models to describe the main
facts and dimensions like this:

and the main dimensions like this:

which although normalized, are still keeping
true to the dimensional modeling approach.
My next thought was to build up individual
dimensions by pulling together the individual entity tables into
the relevant dimensions. For example, to create my courses
dimension, I would use the CREATE DIMENSION statement to bring
together columns from the modules, courses, schools and
faculties tables, and to create the staff dimension, I'd use
columns from the staff table and the department table, and then
reuse columns from the schools and faculties tables. However, I
quickly came across the restriction where you can't use a table
column in more than one dimension, which scuppered that idea.
What this then means is that, if I'm going to use the CREATE
DIMENSION statement and create formal dimensions in the
database, I'll need to create a modules_dim table that
denormalizes the courses and faculties columns into it, and then
a courses_dim table that does the same for faculties, and so on.
I'll still be able to use the same table building blocks, and I
guess it'll only benefit performance as I'll be eliminating
table joins at runtime.
So, in summary - yes, we were able to use a
dimensional model, even when the organization generated facts at
differing levels of granularity, and where it reported on
different levels in the dimension hierarchy, but what we had to
do was compromise the dimensional model a little bit and instead
normalize the dimensions, which gave us the flexibility we
needed. Moreover, by keeping to the dimensional model, we kept
the data model simple and hopefully we'll still be able to take
advantage of the star schema optimizations in Oracle such as
star transformations.
Thanks again to everyone that helped out with
this.