| |
 |
|
Oracle data warehouse development best practices
Don Burleson
|
The design approach for any new Oracle data
warehouse will leverage on the Oracle-supplied productivity packages
such as Oracle 10g Discoverer and Oracle Warehouse Builder (OWB).
Rapid development and implementation of basic decision support
functionality is a major priority so that the end-users may begin to
enjoy the benefits of a new Oracle data warehouse system.
The success of all large Oracle warehouse
projects require small, incremental successes and a phased roll-out
with basic functionality delivered first. The main
pre-implementation phases an Oracle data warehouse project will
include:
- Initial design or Re-design of an existing data
warehouse using Oracle 10g with partitioning.
- Migration of existing data into the new
Oracle data warehouse structure.
- Modeling the ETL processes from the data
sources and developing the appropriate schema transformation rules
(star, hybrid or snowflake).
- Implementing a complete ETL using the
Oracle Warehouse Builder (OWB).
- Creation of primary materialized views and
data refresh mechanisms.
Oracle data warehouse projects often suffer
from chronic problems and Oracle data warehouse operational issues:
- Low staffing The existing DBA is
caught-up in operational details and is unable to find the time
to perform the advanced Business Intelligence (BI) and data
mining activities.
- Cumbersome Data Loading The
existing ETL process is non-uniform and requires significant
manual intervention.
- Non-usage of warehouse features
The back-end Oracle data warehouse database does not utilize
Oracle warehouse features such as bitmap indexes, partitioning
or materialized views.
- Difficult Interface Existing data
mining tools are often sub-optimal and the use of SAS, SPSS,
Clementine, Oracle data mining (ODM) and front-end OLAP tools
such as Oracle Discoverer can be greatly improved.
- Non scalable Adding additional
users and/or data with a loss in performance. The existing
database schema doesnt scale due to improper design, bad
indexing, or a combination of factors. By utilizing the latest
techniques and features the new data warehouse will be scalable
as well as performing at its peak efficiency.
- Data Consolidation Issues Sites
often need to incorporate data feeds from dozens of external
data centers. Methods must be created to transform all of the
data into a unified, consistent format and provide a method for
seamless growth.
- Data Transparency Issues For
example: As patients in a hospital system travel between
treatment centers with separate databases it can be difficult to
track their treatments and progress. In a centralized,
consolidated DWH environment this becomes a non-issue.
- Limited Reporting The existing
warehouse does not easily support ad-hoc or customized queries.
Oracle Data Warehouse Query evolution
An evolutionary approach is critical to the
success of this project. Once the database is back-filled, the
end-users will be introduced to basic access. As the end-users
become comfortable with the base functionality we will be able to
provide more advanced query support, as follows:
- Ad-hoc query
- Aggregation and multidimensional display
- Correlation analysis
- Hypothesis testing
- Data Mining
End-user query approach
Once the new data warehouse and ETL has been
created, Oracle 10g Discoverer (web version) can be implemented for
basic OLAP and DSS functionality. Some of the types of queries
required by the new system might include:
-
Ad-hoc query The Discoverer 10g
end-user layer will be configured to allow for the ad-hoc
display of summary and detailed level data.
-
Aggregation and multidimensional display
Develop Oracle warehouse builder structures to summarize,
aggregate and rollup salient information for display using the
Oracle 10g Discoverer interface.
-
Basic correlations The front-end
should allow the end-user to specify dimensions and request a
correlation matrix between the variables with each dimension.
The system will start with one-to-one correlations and evolve to
support multivariate chi-square methods.
-
Identify hidden correlations The
end-users need the ability to identify populations (e.g.
Eskimos with alcoholism) and then track this population across
various external factors (e.g. treatments and drugs). These
Oracle Decision Support System (DSS) interfaces require the
ability for the end-user to refine their decision rules and
change the salient parameters of their domain (i.e. the
confidence interval for the predictions).
The main tasks in any Oracle data warehouse
engagement include mentoring and technical assistance for these
major activities:
- Cost Justification Data warehouses often have a very fast pay-back period as they
can pay for themselves in more effective treatments and reducing
costs.
- Re-Design Architecture The
existing system requires a new structure, and an analysis of
existing (and new) data requirements will allow for the design
of a Star or Snowflake design.
- Implement Physical Features The
redesigned database will utilize important Oracle warehouse
features designed to provide high performance and maintenance.
These will include data partitioning, read-only tablespaces,
bitmap indexes and materialized views. Data warehouse queries
will also be designed to support Oracle Star Transformation
queries.
- Automate Extract, Transformation and
Loading A new ETL system will be implemented to allow for
a minimum of human intervention. A data feed architecture will
be designed and intelligent rules
will be applied to unify the data and allow for a central
repository. There will also be an exception reporting
mechanism to allow for minimal human intervention.
- Improved data mining and BI reporting
The existing Business Intelligence (Oracle Discoverer)
will be reviewed and new methods for identify salient
correlations will be incorporated into the data model.
While no two Oracle data warehouse project are
identical, they share common interfaces and activities. These
compose best-practices and form a foundation for the successful
evolution of the Oracle data warehouse.
|