Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Oracle Data Warehouse Analysis

Oracle Data Warehouse Tips by Burleson Consulting

Data Warehouse Analysis

There are many differences between traditional systems analysis and Oracle warehouse systems analysis. In a traditional systems analysis, the goal is to document all of the logical processes, describing data transformations, data stores, and external inputs and outputs from an existing system and a proposed system. Figure 3.1 displays a traditional systems development project.

Figure 3.1 A traditional systems development project.

In contrast, a data warehouse systems analysis focuses on determining the data requirements and data sources of a system, and documents how to extract and package data for end users. Figure 3.2 displays an Oracle data warehouse development project.

Figure 3.2 A data warehouse development project.

Another major difference between analysis for OLTP systems and data warehouse systems lies in the descriptions of the user interface. In a traditional analysis, careful attention is given to the method the end user will implement when interacting with the system. In a data warehouse analysis, developers expect that most, if not all, queries against the warehouse will be ad hoc. As such, data warehouse developers have more interest in specifying the data for the warehouse than for specifying the data access methods.

Do not confuse the analysis of data access methods with the analysis of data query requirements. While you may not care what tool is used to extract queries from your warehouse, you should care very much about what types of queries will be executed against your warehouse. We will discuss this issue later in this chapter.

Another analysis issue comes into play because Oracle warehouse developers know in advance that they are using an Oracle database to implement the warehouse. Oracle, by virtue of being relational, is inherently flexible, and it is relatively easy to drop or add data columns to table structures. A consequence of this reality is that developers can often begin prototyping the data warehouse before a complete analysis of data sources has been completed. To demonstrate this flexibility, consider the following code where we quickly alter our fact table to drop one column and add another:

SQL > alter table fact drop column customer_type;
column dropped
SQL > alter table fact add column total_sale int;
column added.

Of course, the flexibility of the relational architecture is only useful when prototyping the warehouse. The final data items should be carefully considered before the initial rollout of the system because altering a very large table can cause a huge amount of table fragmentation in a production warehouse. To fully appreciate the differences between traditional systems analysis and data warehouse analysis, let?s review the analysis steps for both traditional data processing systems and data warehouses.

Traditional Systems Analysis

There are three commonly accepted methods for systems analysis: the Gane & Sarson method, the Ed Yourdon method, and the Tom DeMarco systems analysis method. Recently, we have seen a new interest in systems analysis methodologies using object orientation, with new methodologies being introduced by Jim Rumbaugh, Peter Coad, Grady Booch, and Schaller & Mellor. Regardless of the individual theory, all systems analysis models share some common goals and activities. For example, all the theories state that before any physical construction of a system may begin, the new system must be completely analyzed to determine all of the process logic involved in the system. In addition, all the methodologies require identification of the functional primitive processes and documentation of all data stores and data flows among the processes.

The output of a system analysis is logical because no physical constructs are introduced into the model. Physical constructs are added in the design phase. For example, the analysis phase may document a customer file, but developers are concerned only with defining the data characteristics and not at all concerned about how this file will be represented physically. Whether the customer file is stored on Rolodex cards, an ISAM file, a BDAM file, or an Oracle database is irrelevant in data analysis. Remember, it is the logical specification that is used as the input in systems design. But, how does the systems analysis change when a data warehouse system is being developed?

Regardless of the type of system being created, a logical analysis must always precede the start of systems design, and the design must be completed before programming can begin. In an effort to consolidate the systems development methodologies, research papers have been published about the proper way to incorporate data warehouse development into existing analysis and design methodologies.

Fundamentally, the purpose of any systems analysis is to logically identify the processes and the data moving between the processes, and to describe the processing rules and data items. Only after these items are defined can design begin, regardless of the physical implementation of the system. To meet these goals, a data warehouse analysis should begin with the creation of a structured specification. A structured specification is a document which describes all of the data, data storage, external entities, and processes for a system. This document is then used in the design phase for the creation of the behaviors, entity/relation model, and class hierarchy.

The Structured Specification

Most of the system analysis methodologies provide a method for documenting logical processes, data items, and data stores. These components generally include:

* Data Flow Diagrams--A set of top-down diagrams which depict all processes within a system, the data flow among the processes, and the data stores. Figure 3.3 depicts a sample data flow diagram. The data flow diagrams (DFDs) begin at a very general level and become progressively more detailed. The lowest level of processing is called the functional primitive level, and this primitive level has been traditionally used as the starting point for systems design.

* Data Dictionary--A description of all of the logical data items, including all data flows and data stores (files). The data dictionary describes how all of the data items are stored and how they have been transformed by the processes. The data dictionary file specifications also become the foundation for the relational tables that will comprise the Oracle warehouse.

* Process Logic Specifications (mini-specs)--A description of all functional primitive processes. A process is defined as an operation that modifies a data flow. The tools used to describe processes include pseudocode, procedure flowcharts, decision trees, and decision tables.

Figure 3.3 A sample data flow diagram.

In a traditional systems analysis, the DFD does not stand by itself. Rather, the DFD is augmented by a data dictionary that describes all of the data flows and files, and a set of process logic specifications that describes how each process transforms data flows. A process logic specification (sometimes called a mini-spec) can be expressed as structured English, decision tress, or any of the many other techniques used to describe how data flows are being changed. Listing 3.1 shows a sample process logic specification using structured English.

Listing 3.1 A Level 1 DFD process logic specification for the place_order behavior.

Minispec for PLACE_ORDER:

IF total_amt > 1000
   Check CREDIT_RATING in CUSTOMER
   IF CREDIT_RATING = 'BAD' then reject order
   ELSE Store ORDER RECORD
end if

FOR (each item on the order)

   -- compare QTY_ORDERED in ORDER with QTY_ON_HAND in ITEM
   IF QTY_ON_HAND < QTY_ORDERED
      Remove item from order
      Prepare backorder slip
   ELSE

   -- Add the item to the order
   Subtract QTY_ORDERED from QTY_ON_HAND.
   Move QTY_ORDERED to QTY in LINE_ITEM.
   Store LINE_ITEM record.
NEXT ITEM

In traditional systems analysis, data dictionary definitions for all data items are normalized, or grouped, into database entities, which become entity/relational models in systems design. Eventually, the E/R models become relational tables during system implementation. The identification and grouping of data items constitutes the entities that will establish the basic entity/relation model for the database engine. For example, consider the interaction between the place_order process and the data files in Figure 3.3. The process place_order uses the data flow total_amnt to check the credit rating in the customer file. It also uses qty_ordered to check the inventory level for the item, then moves qty_ordered to the line_item record, and finally stores this record in the file.

If you like Oracle tuning, see the book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts.

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

��