 |
|
Oracle 10g Data Warehouse tips
Oracle Tips by Burleson Consulting |
Oracle 10g for the Data Warehouse
The large data buffer caches in most OLTP Oracle systems make them
CPU-bound, but Oracle data warehouses are another story. With
terabytes of information to aggregate and summarize, most Oracle
data warehouses are I/O-bound and the DBA must choose a server that
optimizes disk I/O throughput.
Oracle has always made very large database (VLDB) technology a
priority as evidenced by their introduction of partitioned
structures, advanced bitmap indexing, and materialized views.
Oracle10g provides some features that are ideal for the data
warehouse application:
§
If the DBA has a time-series warehouse where
information eventually becomes static, using tablespace partitions
and marking the older tablespaces as read only can greatly improve
performance. When a tablespace is marked as read only, Oracle can
bypass this read consistency mechanism, reducing overhead and
resulting in faster throughput.
§
The
revolutionary new method for managing the disk I/O subsystem removes
the tedious and time consuming chore of I/O load balancing and disk
management. Oracle10g ASM allows all disks to be logically
clustered together into disk groups and data files spread across all
devices using the Oracle10g SAME (Stripe and Mirror Everywhere)
standard. By making the disk backend a JBOD (Just a Bunch of Disks),
Oracle10g manages this critical aspect of the data warehouse.
§
Oracle now has
multi-level intelligent partitioning methods that allow Oracle to
store data in a precise scheme. By controlling where data is stored
on disk, Oracle10g SQL can reduce the disk I/O required to service
any query.
§
: Using the Oracle10g multiple block sizes and KEEP
pool, the DBA can pre-assign warehouse objects to separate data
buffers and ensure that the working set of frequently referenced
data is always cached. Oracle10g also offers Automatic Memory
Management (AMM) whereby Oracle10g will automatically re-assign RAM
frames between the db_cache_size
and the pga_aggregate_target
region to maximize throughput of the data warehouse.
§
Oracle’s materialized views (MV) use Oracle replication to allow the
DBA to pre-summarize and pre-join tables. Best of all, Oracle MV’s
are integrated with the Oracle 10g query rewrite facility, so that
any queries that might benefit from the pre-summarization will be
automatically rewritten to reference the aggregate view. This will
avoid a very expensive and unnecessary large-table full-table scan.
§
The AWR provides a time-series
component to warehouse tuning that is critical for the
identification of materialized views and holistic warehouse tuning.
The most important data warehouse tracking with AWR includes
tracking large-table-full-table scans, hash joins which might be
replaced with STAR joins, and tracking of RAM usage within the
pga_aggregate_target
region.
It is easy to identify when an Oracle warehouse is disk I/O bound.
In the AWR report below, comparable to a STATSPACK report for
Oracle9i and earlier, it is clear that this typical data warehouse
system is clearly constrained by disk I/O, resulting from the high
percentage of full-table and full-index scans.
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
------------------------- ------------ ----------- --------
db file scattered read 2,598 7,146 58.54
db file sequential read 25,519 3,246 12.04
library cache load lock 673 1,363 9.26
CPU time 1,154 7.83
log file parallel write 19,157 837 5.68
This listing shows that scattered reads, as full-table scans,
constitute the majority of the total database time. This is very
typical of a data warehouse that performs aggregations via SQL. It
is also common during the refresh period for Oracle materialized
views. The problem is the I/O bottleneck that is introduced during
these periods.
Due to the fact that the typical data warehouse is so data
intensive, there is always a problem fully utilizing the CPU power.
UNISYS has addressed this issue by leveraging on Non-Uniform Memory
Access (NUMA), whereby Windows and Oracle10g are automatically
configured to exploit NUMA to keep the CPUs busy. The data buffer
hit ratio is not relevant for data warehouses, systems that commonly
perform full-table scans, or those that use
all_rows SQL optimization.
While a 30 GB db_cache_size
might be appropriate for an OLTP shop or a shop that uses a large
working set, a large SGA does not benefit data warehouse and
decision support systems (DSS) where most data access is performed
by a parallelized full-table scan. When Oracle performs a parallel
full-table scan, the database blocks are read directly into the
program global area (PGA), bypassing the data buffer RAM as
illustrated in Figure 18.4.

Figure 18.4:
Parallel
Full Scans Bypass SGA data buffers
The figure above shows that having a large
db_cache_size does not
benefit parallel large-table full-table scans, as this requires
memory in the
pga_aggregate_target region instead. With Oracle 10g, the
multiple data buffer features can be used to segregate and cache
dimension tables and indexes, all while providing sufficient RAM for
the full scans. When the processing mode changes during evening
Extract, Transform, and Load (ETL) and rollups, Oracle10g AMM will
automatically detect the change in data access and re-allocate the
RAM regions to accommodate the current processing.
All 64-bit servers have a larger word size (2 to the 64th power)
that allows for up to 18 billion GB of addressable RAM. DBAs may be
tempted to create a super large RAM data buffer. Data warehouse
systems tend to bypass the data buffers because of parallel
full-table scans, and maximizing disk I/O throughput is the single
most critical bottleneck.
Most SMP servers have a specialized high speed RAM called a L2 cache
that is localized near the CPUs as shown in Figure 18.5.

Figure 18.5:
The
non-uniform memory access architecture
Best of all, Oracle10g has been enhanced to recognize NUMA systems
and adjust memory and scheduling operations accordingly. NUMA
technology allows for faster communication between distributed
memory in a multi-processor server. Better than the archaic UNIX
implementations of the past decade, NUMA is fully supported by Linux
and Windows Advanced Server 2003 and Oracle can now better exploit
high end NUMA hardware in SMP servers.
Now that the use of Oracle10g has been justified for the warehouse,
it is important to examine why many shops are moving from Linux to
Windows for their mission critical warehouse applications.
Note: AMM and dynamic Oracle
memory management has measurable overhead. See
my important notes on
Oracle
dynamic memory management.
Now, while the automated features of Oracle10g AMM, ASM and
automatic query rewrite simplify the role of the Oracle DBA, savvy
Oracle10g DBAs leverage other advanced Oracle10g features to get
fast data warehouse performance:
§
The Oracle10g
dbms_advisor utility will
automatically detect and recommend, then a materialized view will
reduce disk I/O.
§
The AWR is a critical component for data
warehouse predictive tools such as the
dbms_advisor package. AWR
allows the DBA to run time-series reports of SQL access paths and
intelligently create the most efficient materialized views for the
warehouse.
§
All data warehouse indexes that are accessed via range scans
and Oracle objects that must be accessed via full-table or
full-index scans should be in a 32k blocksize.
§
Small, frequently referenced dimension tables should be cached using
the Oracle10g KEEP pool.
§
The Oracle10g STAR query features make it easy
to make complex DSS queries run at fast speeds.
§
Change data capture allows incremental
extraction, which allows only changed data to be extracted easily.
For example, if a data warehouse extracts data from an operational
system on a weekly basis, the data warehouse requires only the data
that has changed since the last extraction, which would be only the
data that has been modified in the past 7 days.
§
Streams based feed mechanisms can capture the necessary data changes
from the operational database and send it to the destination data
warehouse. The use of redo information by the Streams capture
process avoids unnecessary overhead on the production database.
Oracle Database 10g is even more optimized for Itanium2 architecture
than Oracle 9i was and many Oracle experts consider Intel-based
servers as the best choice for running Oracle data warehouses. The
following section explores this more closely.
SEE CODE DEPOT FOR FULL SCRIPTS
 |
This is an excerpt from my latest book "Oracle
Tuning: The Definitive Reference".
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts: |
http://www.rampant-books.com/book_2005_1_awr_proactive_tuning.htm
|