This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
Modern relational database
systems are typically architected with parallel capable software
that is well suited to take advantage of the parallel architecture
of SMP systems. The Oracle database system is a multi-process
application in UNIX systems, and is a multi-threaded application
under the Windows architecture.
In general, Databases are
accessed by a large number of simultaneous users or connections.
Many users with their own data and instructions take advantage of
the multi-processor server availability to perform database
processing. Moreover, a single user task, such as a SQL query, can
be parallelized to achieve higher speed and throughput by using
multiple processors.
The relational model involves
the utilization of structured tables with rows and columns. Usually
the SQL query aims at extracting or updating some target data, which
is again a set of rows and columns based on a condition. Typically,
any SQL database operation gets divided into multiple database
sub-operations such as selection, join, group, sort, projection,
etc. Thus, the sub-operations become excellent candidates for
simultaneous or parallel execution, making the RDBMS system ideal
parallel processing software.
Databases have a component
called the query optimizer which selects a sequence of inputs,
joins, and scans to produce the desired output table or data set.
The query optimizer is aware of the underlying hardware architecture
to utilize the suitable path for invoking parallel execution. Thus,
from the Database point-of-view, parallel execution is useful for
all types of operations that access significant amounts of data.
Generally, parallel execution
improves performance for:
* Queries
* Creation of large indexes
* Bulk inserts, updates, and
deletes
* Aggregations and copying
Parallel Processing involves the
use of multiple processors to reduce the time needed to complete a
given task. Instead of one processor executing an entire task,
several processors work on separate tasks that are subordinate to
the main task. There are several architectural approaches for
multiple processor systems, they are:
* Symmetric Multi-Processors (SMP)
* Clustered Systems
* NUMA (or DSM - Distributed
Share Model) servers
* MPP (Massively Parallel
Processing)
Types of Parallelism
There are two types of
parallelism that database users can utilize. They are inter-query
parallelism and intra-query parallelism. The differences between
these two types of parallelism will be covered next.
Inter-Query Parallelism
- Individual transactions are independent, and no transaction
requires the output of another transaction to complete. Many CPUs
can be kept busy by assigning each task or query to a separate CPU.
This type of parallelism, many separate independent queries active
at the same time, is called inter-query parallelism. In OLTP
environment, each query is fairly small, small enough to complete on
a single process utilizing a single CPU.
Intra-Query Parallelism
- To speed up execution of a large, complex query it must first be
decomposed into smaller problems and these smaller problems execute
concurrently (in parallel) by assigning each sub-problem
concurrently to its CPUs. This is called intra-query parallelism.
Decision support systems (DSS) need this kind of facility. Data
warehousing applications often deal with huge data sets, involving
data capture, analysis and summaries, so these operations also
require this capability.
More details about parallel
execution in oracle are covered in the chapter on Parallel
Execution.