Data Warehouse Analysis
There are many differences between
traditional systems analysis and Oracle warehouse systems analysis.
In a traditional systems analysis, the goal is to document all of
the logical processes, describing data transformations, data stores,
and external inputs and outputs from an existing system and a
proposed system. Figure 3.1 displays a traditional systems
development project.
Figure 3.1 A traditional systems
development project.
In contrast, a data warehouse systems
analysis focuses on determining the data requirements and data
sources of a system, and documents how to extract and package data
for end users. Figure 3.2 displays an Oracle data warehouse
development project.
Figure 3.2 A data warehouse development
project.
Another major difference between analysis
for OLTP systems and data warehouse systems lies in the descriptions
of the user interface. In a traditional analysis, careful attention
is given to the method the end user will implement when interacting
with the system. In a data warehouse analysis, developers expect
that most, if not all, queries against the warehouse will be ad hoc.
As such, data warehouse developers have more interest in specifying
the data for the warehouse than for specifying the data access
methods.
Do not confuse the analysis of data access
methods with the analysis of data query requirements. While you may
not care what tool is used to extract queries from your warehouse,
you should care very much about what types of queries will be
executed against your warehouse. We will discuss this issue later in
this chapter.
Another analysis issue comes into play
because Oracle warehouse developers know in advance that they are
using an Oracle database to implement the warehouse. Oracle, by
virtue of being relational, is inherently flexible, and it is
relatively easy to drop or add data columns to table structures. A
consequence of this reality is that developers can often begin
prototyping the data warehouse before a complete analysis of data
sources has been completed. To demonstrate this flexibility,
consider the following code where we quickly alter our fact table to
drop one column and add another:
SQL > alter table fact drop column
customer_type;
column dropped
SQL > alter table fact add column total_sale
int;
column added.
Of course, the flexibility of the relational
architecture is only useful when prototyping the warehouse. The
final data items should be carefully considered before the initial
rollout of the system because altering a very large table can cause
a huge amount of table fragmentation in a production warehouse. To
fully appreciate the differences between traditional systems
analysis and data warehouse analysis, let?s review the analysis
steps for both traditional data processing systems and data
warehouses.
Traditional Systems Analysis
There are three commonly accepted methods
for systems analysis: the Gane & Sarson method, the Ed Yourdon
method, and the Tom DeMarco systems analysis method. Recently, we
have seen a new interest in systems analysis methodologies using
object orientation, with new methodologies being introduced by Jim
Rumbaugh, Peter Coad, Grady Booch, and Schaller & Mellor. Regardless
of the individual theory, all systems analysis models share some
common goals and activities. For example, all the theories state
that before any physical construction of a system may begin, the new
system must be completely analyzed to determine all of the process
logic involved in the system. In addition, all the methodologies
require identification of the functional primitive processes and
documentation of all data stores and data flows among the processes.
The output of a system analysis is logical
because no physical constructs are introduced into the model.
Physical constructs are added in the design phase. For example, the
analysis phase may document a customer file, but developers are
concerned only with defining the data characteristics and not at all
concerned about how this file will be represented physically.
Whether the customer file is stored on Rolodex cards, an ISAM file,
a BDAM file, or an Oracle database is irrelevant in data analysis.
Remember, it is the logical specification that is used as the input
in systems design. But, how does the systems analysis change when a
data warehouse system is being developed?
Regardless of the type of system being
created, a logical analysis must always precede the start of systems
design, and the design must be completed before programming can
begin. In an effort to consolidate the systems development
methodologies, research papers have been published about the proper
way to incorporate data warehouse development into existing analysis
and design methodologies.
Fundamentally, the purpose of any systems
analysis is to logically identify the processes and the data moving
between the processes, and to describe the processing rules and data
items. Only after these items are defined can design begin,
regardless of the physical implementation of the system. To meet
these goals, a data warehouse analysis should begin with the
creation of a structured specification. A structured specification
is a document which describes all of the data, data storage,
external entities, and processes for a system. This document is then
used in the design phase for the creation of the behaviors,
entity/relation model, and class hierarchy.
The Structured Specification
Most of the system analysis methodologies
provide a method for documenting logical processes, data items, and
data stores. These components generally include:
* Data Flow Diagrams--A set of top-down
diagrams which depict all processes within a system, the data flow
among the processes, and the data stores. Figure 3.3 depicts a
sample data flow diagram. The data flow diagrams (DFDs) begin at a
very general level and become progressively more detailed. The
lowest level of processing is called the functional primitive level,
and this primitive level has been traditionally used as the starting
point for systems design.
* Data Dictionary--A description of all
of the logical data items, including all data flows and data stores
(files). The data dictionary describes how all of the data items are
stored and how they have been transformed by the processes. The data
dictionary file specifications also become the foundation for the
relational tables that will comprise the Oracle warehouse.
* Process Logic Specifications
(mini-specs)--A description of all functional primitive processes. A
process is defined as an operation that modifies a data flow. The
tools used to describe processes include pseudocode, procedure
flowcharts, decision trees, and decision tables.
Figure 3.3 A sample data flow diagram.
In a traditional systems analysis, the DFD
does not stand by itself. Rather, the DFD is augmented by a data
dictionary that describes all of the data flows and files, and a set
of process logic specifications that describes how each process
transforms data flows. A process logic specification (sometimes
called a mini-spec) can be expressed as structured English, decision
tress, or any of the many other techniques used to describe how data
flows are being changed. Listing 3.1 shows a sample process logic
specification using structured English.
Listing 3.1 A Level 1 DFD process logic
specification for the place_order behavior.
Minispec for PLACE_ORDER:
IF total_amt > 1000
Check CREDIT_RATING in CUSTOMER
IF CREDIT_RATING = 'BAD' then reject
order
ELSE Store ORDER RECORD
end if
FOR (each item on the order)
-- compare QTY_ORDERED in ORDER with
QTY_ON_HAND in ITEM
IF QTY_ON_HAND < QTY_ORDERED
Remove item from order
Prepare backorder slip
ELSE
-- Add the item to the order
Subtract QTY_ORDERED from QTY_ON_HAND.
Move QTY_ORDERED to QTY in LINE_ITEM.
Store LINE_ITEM record.
NEXT ITEM
In traditional systems analysis, data
dictionary definitions for all data items are normalized, or
grouped, into database entities, which become entity/relational
models in systems design. Eventually, the E/R models become
relational tables during system implementation. The identification
and grouping of data items constitutes the entities that will
establish the basic entity/relation model for the database engine.
For example, consider the interaction between the place_order process and the data files in Figure 3.3. The process place_order
uses the data flow total_amnt to check the credit rating in the
customer file. It also uses qty_ordered to check the inventory level
for the item, then moves qty_ordered to the line_item record, and
finally stores this record in the file.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |