This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
An Oracle relational database
system is designed to take advantage of the parallel architecture.
The database is a multi-process system as set up in UNIX systems and
is a multi-threaded application in the Windows architecture. In
general, the databases are accessed by a large number of concurrent
users or connections. Many of these users, with their own data and
instructions, take advantage of the multi-processor availability to
perform database processing. Also, a single user task, such as a SQL
query, can be paralleled to achieve higher speed and throughput by
using multiple processors.
The relational model consists of
structured tables with rows and columns. Usually, the SQL query aims
at extracting or updating target data, which is a set of rows and
columns based on a given condition. Typically, any SQL database
operation gets divided into multiple database sub-operations such as
SELECTION, JOIN, GROUP, SORT, PROJECTION, etc. Thus, the
sub-operations become excellent candidates for simultaneous or
parallel execution. This makes the RDBMS system ideal for the
implementation of parallel processing software.
Databases have a component
called the query optimizer that selects a sequence of inputs, joins,
and scans to produce the desired output table or data set. The query
optimizer is aware of the underlying hardware architecture and finds
a suitable parallel execution path. Hence, from the database
perspective, parallel execution is useful for many types of
operations that access significant amounts of data.
Generally, parallel execution
improves performance for:
* Queries.
* Creation of large indexes.
* Bulk INSERTs, UPDATEs, and
DELETEs.
* Aggregations and copying.
The Oracle database application
can take advantage of the underlying parallel computer architecture
to process the SQL statements that are created at the basic user or
client initiated interfaces. SQL statements are processed in
parallel whenever possible by Oracle. Oracle adapts very well to the
available number of multiple processors, whether they are SMP, NUMA,
or MPP architecture. The Oracle Database 10g RAC architecture takes
advantage of the existence of multiple nodes, and therefore multiple
SMP entities, in the cluster to provide high performance computing.
It has the added advantage of engaging processors from all nodes in
the cluster for a given query.
To achieve better parallelism in
Oracle, it is essential to have sufficient I/O bandwidth and
additional or sporadically used CPUs. Also, adequate memory to
support additional memory-intensive processes, such as sorts,
hashing, and I/O buffers is required.
Parallel Execution Mechanism
A SQL statement is executed in
parallel using multiple parallel processes. The user process acts as
the parallel execution coordinator (PEC), and it dispatches the
statement to several parallel execution servers and coordinates the
end results. The results from all of the server processes are sent
back to the user. The basic unit of work in parallelism is called a
granule. Oracle divides the operation being paralleled, such as a
table scan, table update, or index creation, into granules. Parallel
processes execute the operation one granule at a time.
Granules for Parallelism
There are two types of granules:
block ranges and partition ranges:
* Block Range Granules: These
are the ranges of physical blocks from a table. Block range granules
are the basic unit of most parallel operations. The size of the
object table and the degree of parallelism (DOP) determine the size
of the granule at runtime. Block range granules do not depend on
static pre-allocation of tables or indexes. During the computation
of the granules, Oracle takes the DOP into account and tries to
assign granules from different data files to each of the parallel
execution servers, avoiding contention whenever possible. Thus, the
tables involved in the query are divided dynamically into granules
and a single parallel execution server reads each granule. PEC
manages this process.
* Partition Granules: A query
server process works on an entire partition or sub-partition of a
table or index. Partition granules are the basic unit of parallel
index range scans and of parallel operations that modify multiple
partitions of a partitioned table or index. These operations include
parallel update, parallel delete, parallel creation of partitioned
indexes, and parallel creation of partitioned tables. This is
collectively known as parallel data manipulation language or PDML.