Oracle has a long history of introducing new releases with huge
fanfare. When Oracle9i was introduced, Oracle Corporation
offered one million dollars to any customer who did not find that
Oracle was the fastest database on the market. As an Oracle
professional, it took me many hours to translate the marketing
hyperbole into a definition of the real features included in the
database. Hopefully, this overview and a follow-up article
detailing enhancements to the database will shed some light on
what these new features will mean for you.
Beginning with Oracle8, Oracle made a commitment to supporting
data warehouse types of applications. In fact, the major new
features in Oracle8 were strongly centered on database
scalability and Oracle's ability to store many terabytes for the
user community. Oracle is continuing this commitment to online
analytical processing (OLAP) and data warehouse tools in Oracle9i.
Let's take a look at some of the new data warehouse features.
Oracle Structured Query Language (SQL) has changed
significantly over the past four years. Oracle Corporation has
been struggling to find a balance between making SQL adhere to
the ANSI standard and providing robust built-in functions to
improve Oracle SQL performance. Oracle 9i has deliberately
chosen to vary from the ANSI standard for SQL and provide a
wealth of built-in functions that can improve the performance of
the Oracle database.
Oracle has also improved the responsiveness of SQL by providing
Java-based interfaces directly to their OLAP engine for data
warehousing. This makes SQL more accessible to external portals
within the Internet world.
In keeping with this commitment to extending Oracle SQL, Oracle
has provided some exciting new features for the analysis of large
volumes of data:
- LEAD and LAG functionsThese SQL extensions enable
complex time-series analysis of data warehouse
information and allows for the easy aggregation of
rolling time periods.
- Ranking functionsOracle9i SQL includes
ranking functions that provide support for common OLAP
rankings, such as the top 10, bottom 10, top 10 percent,
and bottom 10 percent.
- Advanced grouping functionsOracle9i now
provides categorization functions that group values into
buckets, such as age groups or income brackets.
- Statistical functionsOracle 9i offers
statistical functions, including support for correlation,
standard deviation, linear regressions, and other common
In addition to these SQL enhancements, Oracle9i has
improved its OLAP engine and offers tools that simplify the
complex task of loading Oracle data warehouses.
Extract Transform & Load (ETL) tool
ETL functionality has been a long-awaited feature of Oracle
data warehousing. One of the most confounding problems for DBAs
managing an Oracle data warehouse is providing clean and accurate
summarized data from the operational data store and getting it
into the data warehouse in a timely fashion.
Prior to the introduction of ETL software, the extract and
loading of Oracle information tended to be very time-consuming
and cumbersome. Customized programs needed to be written for the
extract. Transformation summarization and aggregation was done
manually. Loading information into the target data warehouse
would often require the dedicated efforts of a full-time DBA.
Many of the ETL functions Oracle9i uses were integrated
through the acquisition of Carleton Software data warehouse
tools. The purpose of the ETL software is to assist in the
loading of data from the operational data stored into the data
warehouse. The functionality includes:
- ExtractThe extract phase of ETL software consists
of a set of Oracle SQL queries that pull the detailed
information from the operational data store.
- TransformThe transformation phase generally
involves aggregating highly detailed data that was taken
from the operational database. Oracle9i claims to
have tools that will assist in the automatic aggregation
of information so that it can be summarized for easy
loading into the data warehouse engine. The
transformation phase sometimes involves collecting
external metadata and cleansing operational data.
- LoadThe load functions of the ETL relates to the
actual loading of the information after it is been
transformed into the Oracle data warehouse. Prior to ETL,
the Oracle DBA would have to write customized SQL*Loader
programs to handle this task.
E-commerce sites have been demanding a tool that will allow
the Web site to survey consumer behavior and make product
recommendations based upon the customers prior searching,
viewing, and purchasing behavior. Oracle9i introduces a
personalization feature that simplifies the tracking of consumer
behavior by collecting individual information for all consumers
who visit the Web site. This data takes the form of WebServer
statistics. The information on pages viewed and searches entered
can be used to target advertising to the consumer.
Obviously, the recommendation engine is aimed at retail systems
where consumers actually sign on and survey a broad range of
products and services. With the recommendation engine, Oracle is
attempting to add business intelligence to the commerce software.
The personalization features will allow the data warehouse engine
to automatically survey past usage trends and make
recommendations for purchasesbased upon the end users' prior
Oracle Parallel Server (OPS) has been around for four years.
Now, Oracle has taken the general idea of OPS and improved upon
it, offering a new product, which they call Cache Fusion.
That term describes the ability of OPS databases to allow
multiple instances to share data buffer storage between the
instances while still accessing a single Oracle database.
The architecture of Oracle Parallel Server was originally
designed for massive parallel processors (MPPs). The idea behind
OPS is that a single database can have many separate instances,
each with its own System Global Area (SGA) and memory regions,
accessing a single database.
An OPS configuration requires that the instances be able to share
the data buffer cache. To allow the instances to communicate,
Oracle created the Integrated Distributed Lock Manager (IDLM),
which transfers data buffer blocks between instances. This is the
Cache Fusion component. In OPS, data blocks are quickly
transferred between instances, thereby fusing the multiple data
buffers into a single cache. The concept of Cache Fusion is
essentially advertising the fact that Oracle's IDLM can rapidly
"ping" data blocks between the many instances, allowing
for these instances to share data buffers in an efficient way.
Oracle has always provided good security within the database,
and it's extending this functionality to incorporate distributed
environments. Oracle has enhanced its Virtual Private Database
(VPD) tool, which controls access to Oracle databases across
multiple Web servers. Oracle also enhanced its encryption
methodologies, including separate authentication servers to
prevent unauthorized access to a Web-centric database.
Java and XML support
The enhancement for Java and XML support services in Oracle9i
further demonstrates Oracle's focus on Internet databases.
Without going into a great level of detail about JavaBeans and
XML communications, these tools enable distributed Oracle
databases on the Web to communicate with each other using a
standard protocol. Oracle has made a commitment to extending its
Java class structures and to provide an easy-to-use XML
Whereas a traditional database requires you to call information
using Net8 database links, a database can communicate with any
other database on the Web using XML. All that's needed to
establish communication are the IP address, port number, and the
XML protocols for communicating with the remote database. This is
an exciting new feature of Oracle9i and promises to make
database communication of commerce systems a commonplace
The Oracle9i data warehouse suite offers data mining
tools. Data mining is a general term that refers to
examining large volumes of data for statistically significant
correlations. The data-mining product that Oracle9i offers
is an enhancement of the Darwin data-mining product, which Oracle
purchased in 1999.
Data mining is the most advanced of all of the Oracle facilities.
Even in the simplest form, data mining tends to be quite
analytical and time-consuming on the target database. Many data
mining tools also have a tremendous amount of overhead because of
their need to survey billions of bytes of information in an
attempt to tease out statistically significant correlations
between information in the data warehouse.
It remains to be seen how well the Oracle data warehouse
community will accept this data-mining tool. Oracle9i
users may instead choose one of the many third-party data
warehouse tools that can be easily attached to the Oracle