Decision Support Systems And Data
Warehouses
Decision support systems (DSS) are generally
defined as the class of warehouse system that deals with solving a
semi-structured problem. In other words, the task has a structured
component as well as an unstructured component. In short, the
unstructured component involves human intuition and requires human
interaction with the DSS. The well-structured components of a DSS
are the decision rules stored as the problem-processing system. The
intuitive, or creative, component is left to the user.
The following represent some examples of
semi-structured problems:
* Choosing a spouse. While there are
many structured rules (I want someone of my religion, who is shorter
than me), there is still the unstructured, unquantifiable component
to the process of choosing a spouse.
* Choosing a site for a factory. This is
a nonrecurring problem that has some structured components (cost of
land, availability of workers, and so on), but there are many other
unstructured components in this decision (i.e., quality of life).
* Choosing a stock portfolio. Here the
structured rules are the amount of risk and the performance of
stocks, but the choice of stocks for a portfolio requires human
intuition.
Decision support technology recognizes that
many tasks require human intuition. For example, the process of
choosing a stock portfolio is a task which has both structured and
intuitive components. Certainly, rules are associated with choosing
a stock portfolio, such as diversification of the stocks and
choosing an acceptable level of risk. These factors can be easily
quantified and stored in a database system, allowing the user of the
system to create what-if scenarios. However, just because a system
has well-structured components does not guarantee that the entire
decision process is well-structured.
One of the best ways to tell if a decision
process is semi-structured is to ask the question, Do people with
the same level of knowledge demonstrate different levels of skill?
For example, it?s possible for many stock brokers to have the same
level of knowledge about the stock market. However, these brokers
will clearly demonstrate different levels of skill when assembling
stock portfolios.
Computer simulation is one area used heavily
within the modeling components of decision support systems. In fact,
one of the first object-oriented languages was SIMULA. SIMULA was
used as a driver for these what-if scenarios and was incorporated
into decision support systems so that users could model a particular
situation. The user would create a scenario with objects subjected
to a set of predefined behaviors.
In order to be a DSS, a system must have the
following characteristics:
* A nonrecurring problem needs to be
solved. DSS technology is used primarily for novel and unique
modeling situations that require the user to simulate the behavior
of some real-world problem.
* Human input is required. A DSS makes
decisions with users, unlike an expert system which makes decisions
for users.
* A method is available for testing
hypotheses. A true DSS allows the end user to develop models and
simulate changes to the model. For example, the end user could ask
questions like, ?What will happen to my net return if I exchanged my
IBM stock for Microsoft stock?? or ?How much faster would I be able
to service my customers if I add two more checkout registers??
* Users must have knowledge of the
problem being solved. Unlike an expert system that provides the user
with answers to well-structured questions, decision support systems
require the user to thoroughly understand the problem being solved.
For example, a financial decision support system, such as the DSSF
product, would require the user to understand the concept of a stock
Beta. Beta is the term used to measure the covariance of an
individual stock against the behavior of the market as a whole.
Without an understanding of the concepts, a user would be unable to
effectively use a decision support system.
* Ad hoc data queries are allowed. As
users gather information for their decision, they make repeated
requests to the online database, with one query answer stimulating
another query. Because the purpose of ad hoc query is to allow
free-form queries to decision information, response time is
critical.
* More than one acceptable answer may be
produced. Unlike an expert system, which usually produces a single,
finite answer to a problem, a decision support system deals with
problems that have a domain or range of acceptable solutions. For
example, a user of DSSF may discover that many acceptable stock
portfolios match the selection criteria of the user. Another good
example is a manager who needs to place production machines onto an
empty warehouse floor. The goal would be to maximize the throughput
of work in process from raw materials to finished goods. Clearly,
she could choose from a number of acceptable ways of placing the
machines on the warehouse floor in order to achieve this goal. This
is called the state space approach to problem-solving--first a
solution domain is specified, then the user works to create models
to achieve the desired goal state.
* External data sources are used. For
example, a DSS may require classification of customers by Standard
Industry Code (SIC) or customer addresses by Standard Metropolitan
Statistical Area (SMSA). Many warehouse managers load this external
data into the central warehouse.
Decision support systems also allow the user
to create what-if scenarios. These are essentially modeling tools
that allow the user to define an environment and simulate the
behavior of that environment under changing conditions. For example,
the user of a DSS for finance could create a hypothetical stock
portfolio and then direct the DSS to model the behavior of that
stock portfolio under different market conditions. Once these
behaviors are specified, the user may vary the contents of the
portfolio and view the results.
The types of output from decision support
systems include:
* Management information systems
(MIS)--Standard reports and forecasts of sales.
* Hypothesis testing--Did sales decrease
in the Eastern region last month because of changes in buying
habits? This involves iterative questioning, with one answer leading
to another question.
* Model building--Creating a sales
model, and validating its behavior against the historical data in
the warehouse. Predictive modeling is often used to forecast
behaviors based on historical factors.
* Discovery of unknown trends--For
example, why are sales up in the Eastern region? Data mining tools
answer questions in those instances where you may not even know what
specific questions to ask.
The role of human intuition in this type of
problem solving has stirred great debate. Decision support systems
allow the user to control the decision-making process, applying his
or her own decision-making rules and intuition to the process.
However, the arguments for and against using artificial intelligence
to manage the intuitive component of these systems has strong
proponents on both sides.
Now that expert systems and decision support
systems have been described, let?s take a look at how databases are
used to develop these systems.
Data Warehouses And Multidimensional
Databases
Multidimensional databases are approaching
the DSS market through two methods. The first approach is though
niche servers that use proprietary architecture to model
multidimensional databases. Examples of niche servers include Arbor
and IRI. The second approach is to provide multidimensional front
ends that manage the mapping between the RDBMS and the dimensional
representation of the data. Figure 1.15 offers an overview of the
various multidimensional databases.
Figure 1.15 The major types of
multidimensional databases.
In general, the following definitions apply
to data warehouses:
* Subject-oriented data--Unlike an
online transaction processing application that is focused on a
finite business transaction, a data warehouse attempts to collect
all that is known about a subject area (i.e., sales volume, interest
earned) from all data sources within the organization.
* Read-only during queries--Data
warehouses are loaded during off-hours and are used for read-only
requests during day hours.
* Highly denormalized data
structures--Unlike an OLTP system with many ?narrow? tables, data
warehouses pre-join tables, creating fat tables with highly
redundant columns.
* Data is pre-aggregated--Unlike OLTP,
data warehouses pre-calculate totals to improve runtime performance.
Note that pre-aggregation is anti-relational, meaning that the
relational model advocates building aggregate objects at runtime,
only allowing for the storing of atomic data components.
* Features interactive, ad hoc
query--Data warehouses must be flexible enough to handle spontaneous
queries by users. Consequently, a flexible design is imperative.
When we contrast the data warehouse with a
transaction-oriented, online system, the differences become
apparent. These differences are shown in Table 1.1.
|
OLTP |
Data Warehouse |
Normalization |
High (3NF) |
Low (1NF) |
Table Sizes |
Small |
Large |
Number of rows/table |
Small |
Large |
Size/duration of transactions |
Small |
Large |
Number of online users |
High (1000s) |
Low (< 100) |
Updates |
Frequent |
Nightly |
Full-table scans |
Rarely |
Frequently |
Historical data |
<90 days |
Years |
Table 1.1 Differences between OLTP and data
warehouses.
Aside from the different uses for data
warehouses, many developers are using relational databases to build
their data warehouses and simulate multiple dimensions. Design
techniques are being used for the simulations. This push toward STAR
schema design has been somewhat successful, especially because
designers do not have to buy a multidimensional database or invest
in an expensive front-end tool. In general, using a relational
database for OLAP is achieved by any combination of the following
techniques:
* Pre-joining tables together--This is
an obtuse way of saying that a denormalized table is created from a
normalized online database. A large pre-join of several tables is
sometimes called a fact table in a STAR schema.
* Pre-summarization--This prepares the
data for any drill-down requests that may come from an end user.
Essentially, the different levels of aggregation are identified, and
aggregate tables are computed and populated when the data is loaded.
* Massive denormalization--The side
effect of very inexpensive disks has been the rethinking of the
merits of third normal form. Today, redundancy is widely accepted,
as seen by the popularity of replication tools, snapshot utilities,
and non-first-normal-form databases. If you can pre-create every
possible result table at load time, your end user will enjoy
excellent response time when making queries. The STAR schema is an
example of massive de-normalization.
* Controlled periodic batch
updating--New detail data is rolled into the aggregate table on a
periodic basis while the online system is down, with all
summarization recalculated as the new data is introduced into the
database. While data loading is important, it is only one component
of the tools for loading a warehouse. There are several categories
of tools that can be used to populate warehouses, including:
* Data extraction tools--Different
hardware and databases.
* Metadata repository--Holds
common definitions.
* Data cleaning tools--Tools for
insuring uniform data quality.
* Data sequencing tools--RI rules
for the warehouse.
* Warehouse loading tools--Tools
for populating the data warehouse.
Data Extraction For The Oracle Warehouse
As we know, most data warehouses are loaded
in batch mode after the online system has been shut down. In this
sense, a data warehouse is bimodal, with a highly intensive loading
window, and an intensive read-only window during the day. Because
many data warehouses collect data from non-relational databases such
as IMS or CA-IDMS, no standard methods for extracting data are
available for loading into a warehouse. However, there are a few
common techniques for extracting and loading data, including:
* Log ?sniffing?--Applying archived redo
logs from the OLTP system to a data warehouse.
* Using update, insert, and delete
triggers--Firing-off a distributed update to a data warehouse.
* Using snapshot logs to populate the
data warehouse--Using log files to update replicated table changes.
* Running nightly extract/load
programs--Using extracts to retrieve operational data and load it
into a warehouse.
For details about data extraction and
loading of Oracle warehouses, see Chapter 11, Oracle Data
Warehouse Utilities.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |