Parallel Databases
Modern relational database systems are typically architected with
parallel capable software that is well suited to take advantage of the
parallel architecture
of SMP systems. The Oracle database system is a multi-process
application in UNIX systems and is a multi-threaded application under
the Windows architecture.
Databases have a component called the query optimizer, which selects a
sequence of inputs, joins, and scans to produce the desired output
table or data set. The query optimizer is aware of the underlying
hardware architecture to utilize the suitable path for invoking
parallel execution. Thus, from the database point-of-view, parallel
execution is useful for all types of operations that access
significant amounts of data.
Generally, parallel execution
improves performance for:
-
queries
-
creation of large indexes
-
bulk inserts, updates, and deletes
-
aggregations and copying
Parallel processing involves the use of multiple processors to reduce
the time needed to complete a given task. Instead of one processor
executing an entire task, several processors work on separate tasks
which are subordinate to the main task. There are several
architectural approaches for multiple processor systems. They are:
-
Symmetric Multi-Processors
(SMP)
-
Clustered Systems
-
NUMA (or DSM - Distributed Share Model)
servers
-
MPP (Massively Parallel Processing)
Types of Parallelism
There are two types of parallelism.
They are
inter-query parallelism
and
intra-query parallelism.
-
Inter-Query Parallelism
-
Individual transactions are independent, and no transaction
requires the output of another transaction to complete. Many CPUs
can be kept busy by assigning
each task or query to a separate CPU. This type of parallelism,
where many separate independent queries are active at the same
time, is called inter-query parallelism. In an OLTP environment,
each query is fairly small, small enough to complete on
a single process utilizing a single CPU.
-
Intra-Query Parallelism
- To
speed up execution of a large, complex query, it must first be
decomposed into smaller problems, and these smaller problems
execute concurrently (in parallel)
by assigning each sub-problem concurrently to its CPUs. This is
called intra-query parallelism.
Decision support systems (DSS) need this kind of facility.
Data warehousing applications often deal with huge data sets,
involving data capture, analysis and summaries, so
these operations also require this capability.