Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Oracle Real Application Clusters and Data Warehouse Applications

Don Burleson
(Last Updated 5 September 2005)

Oracle has marketed the Real Application Clusters (RAC) option as appropriate for all types of applications, but there is some debate about whether RAC server blades are better than the same amount of computing resources in a single server for Oracle data warehouses.

Oracle RAC is used primarily for continuous availability of mission-critical systems (i.e. banking applications) and for transparent scalability of massive online transactions systems (i.e. Amazon). However, it is may not always the best architecture for data warehouse that require large materialized view rollups because they often require large, cohesive RAM regions (large data sorts) and tightly-coupled CPU's (fast parallel query).

This Oracle whitepaper claims that special tricks to address above-the-line SGA RAM in a 32-bit Linux environment, and correctly notes that parallelized full-tables scans use PGA RAM:

Memory for hash joins and sort operations is allocated out of the PGA (Program Global Area), not the SGA. PGA memory is not bound by the 1.7GB SGA. A 4 CPU system running a degree of parallelism of 8 uses typically less than 3.2GB of PGA.

But the question remain whether the RAC interconnect will be slower than a monolithic server for large-scale parallel queries with large RAM demands, the type of queries that are common for aggregation, rollups and materialized view refreshing.

Some experts say that Oracle RAC is not the best solution to data warehouse applications because they require large banks of CPU's to perform parallel table-scan operations. Mike Ault, noted Oracle data warehouse consultant notes "I have never heard of any shop using RAC for a data warehouse, and I could imaging large parallel operations clobbering the interconnect as to make RAC somewhat sub-optimal for data warehouse processing". Ault says that the Oracle Server Tuning manual makes this very clear (emphasis added):

Parallel query can dramatically improve performance for data-intensive data warehousing operations. It helps systems scale in performance when adding hardware resources.

The greatest performance benefits are on symmetric multiprocessing (SMP), clustered, or massively parallel systems where query processing can be effectively spread out among many CPUs on a single system.

In the Oracle manual "Parallelism and Partitioning in Data Warehouses" see see the caution that Oracle parallel query requires SMP or MPP servers, not the 2-way or 4-way CPU's used in RAC databases:

Parallel execution benefits systems that have all of the following characteristics:

Symmetric multi-processors (SMP), clusters, or massively parallel systems

Sufficient I/O bandwidth

Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)

Sufficient memory to support additional memory-intensive processes such as sorts, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance.

To see the relationship between the number of CPU's and the degree of parallelism, click here. Let's take a close look at the problems of parallel query on RAC databases and see why large monolithic server are a more optimal choice. This article offers advice on the basics of configuration for Oracle data warehouses, and this tip emphasizes the importance of using multiple block sizes for all VLDB Oracle warehouse systems.

The problem of parallel query on RAC

Oracle data warehouse applications require high-parallelism to ensure fast reads of multi-gigabyte tables using Oracle parallel query (OPQ). In Oracle 10g, automatic parallelism can be enabled and the degree of parallelism is controlled internally by interrogating the number of CPU's on the server (the cpu_count parameter) and setting the automatic degree of parallelism based on the CPU count.

At the risk of re-stating the obvious, research and the Oracle documentation discuss the problems with query parallelism when the CPU's are distributed across many servers.

For a complete discussion of the types of Oracle parallel query, click here. According to the research "Modelling Parallel Oracle for Performance Prediction" (Distributed and Parallel Databases, 13, 251–269, 2003), the inter-node parallelism is more complex that in-the-box Oracle parallel query:

The flow of a query through the PQO starts with the user process issuing a query or transaction. The dedicated server process parses and executes it. It assigns work to a number of query servers depending upon the degree of parallelism.

The query servers split the workload and return the result data back to the dedicated server process. The dedicated server assembles the data and returns the results to the user process.

The Oracle manual Database Data Warehousing Guide for Oracle 10g also explains the performance issue of having distributed CPUs for parallel operations:

Each server in the producer execution process set has a connection to each server in the consumer set. This means that the number of virtual connections between parallel execution servers increases as the square of the DOP.

Each communication channel has at least one, and sometimes up to four memory buffers. Multiple memory buffers facilitate asynchronous communication among the parallel execution servers.

A single-instance environment uses at most three buffers for each communication channel. An Oracle Real Application Clusters environment uses at most four buffers for each channel.

However, the problem of parallel query on Real Application Clusters is just the tip of the iceberg. We also encounter massive performance issues when attempting to sort a result set from a RAC-based parallel query.

The problem of sorting parallel results

We have a huge overhead in sorting the result set because the RAC nodes must transfer the data back to the parallel query coordinator, which will reassemble the data, perform a sort if required, and return the results back to the end user. While this operation is transparent within a single server, in a RAC environment, billions of bytes of table data must be passed to the coordinator, often with disastrous performance. This diagram from the Oracle 10g Data Warehouse manual illustrates the process:

In a RAC cluster (where each node has only two or four processors), Oracle parallel query cannot perform as-fast as a single monolithic server. For example, a 8-node RAC configuration with 2 processors each (16 CPUs) will be far slower for large-table scans than a single server with 16 processors. However, research by Richmond Shee suggests that intelligent setting of sort_area_size pga_aggregate_target) may allow one-pass sorts to happen nearly as fast as optimal (in-RAM). sorts.

Even more important, it has been suggested that using high-speed RAM-SAM (solid-state disk) for the RAC TEMP tablespace might alleviate this sorting issue on RAC systems.

One hallmark of data warehouse applications is the requirement to perform large-table scans quickly, especially during the critical ETL (Extract, transformation, and Load) processing and during online analytical processing, where Oracle must scan large volumes of data very quickly. On RAC systems with small numbers of processors on each node, there parameters are set based upon the cpu_count of each node.

fast_start_parallel_rollback
parallel_max_servers
log_buffer
db_block_lru_latches

Conversely, the same number of processors would result in different settings for these parameters, often resulting in faster performance.

Parallel Query and the Data Warehouse

The "degree" of parallelism for Oracle parallel query is dependent upon the number of processors and the disk configuration, but the advent of the SAME (Stripe And Mirror Everywhere) has made the number of CPU's the primary consideration in large-table scan performance, as noted in the book "Oracle9i RAC":

Huge amounts of data require large numbers of disk drives, large amounts of memory, and a significant number of CPUs if answers are to be obtained in a timely manner.

Oracle achieves high-speed table scans in a "divide and conquer" approach by dividing the table into equal pieces and dedicates a separate CPU process to handle each table section. When all processors reside within the same server, large-table scans

For example, the speed of large-table full-table scans is far slower on RAC systems with small numbers of processors on each node. The inter-node parallel query on a RAC cluster system has far higher overhead because the parallel query coordinator must communicate with the slave processes over the network and because the table results will have to be transferred across the "cache fusion" bus.

Not for Everyone?

Some hardware vendors may claim that RAC is appropriate for every types of application. DM Review magazine noted several issues with Oracle RAC and suggests that the primary motive for using RAC should be to achieve continuous availability and transparent failover:

As the overall leader in the unsegmented data warehousing market, the game is Oracle's to win or lose. Oracle has strength, but potential vulnerabilities include:

The paradox of high availability. In its earlier incarnation, Oracle's RAC showed what might be described as the paradox of high availability (HA). The more moving parts that are added to maintain system availability, the greater the likelihood that, in the worst case, one of them will fail, resulting in the very scenario against which the HA was supposed to be the antidote.

In sum, Oracle Real Application Clusters is a wonderful tool for mission-critical databases that must have continuous availability and for scalability of super-large OLTP systems, but the jury is still out about whether server blades with Oracle RAC is an optimal choice for data warehouse applications that require high-speed table scan performance.

A Benchmark Plan

A definitive answer to this monolithic vs. RAC question requires a benchmark test to examine optimal parallelism in a solid-state RAC environment vs. monolithic servers. This test could be done on a two-node RAC cluster with two hyper-threaded 64-bit CPU's on each node.

The benchmark test should be a TPC-H (lots of large-table full-table scans) with inter-node parallelism (2 CPU's on the same node), intra-node parallelism (2 CPU's, one on each node), and full parallelism (all CPU's on all two nodes). The full parallelism test would capture elapsed time the TPC-H run with both nodes, with intra-node parallel query.

Oracle Grid and Real Application Clusters

See working examples of Oracle Grid and RAC in the book Oracle Grid and Real Application Clusters.

Order directly from Rampant and save 30%.

��