 |
|
Inside Oracle Sorting
Oracle Database Tips by Donald BurlesonOctober 15, 2015
|
Oracle sorting is a
very important component of Oracle tuning, yet sorting optimization is often
overlooked. The process of sequencing database output goes back to the 1960's
and in earlier databases the data was extracted in an unsorted fashion and
sorted externally:
//GOFORIT JOB (CLASS=A)
//STEP1
EXEC PGM=IDMSCOB
//SYSOUT DD DSN='ram_disk_dataset',DISP=(,DELETE,KEEP)
//STEP2 EXEC PGM=SORT
//SORTIN DD DSN='ram_disk_dataset',
// DISP=(OLD,DELETE, DELETE)
//SORTOUT DD DSN='new_sorted_dataset',
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(100,100),RLSE),UNIT=SYSDA
By the way, this same process can be used in
Linux/UNIX with Oracle. This allows the DBA to redirect the unsorted
result to super-fast RAM disk (SSD) where it can be sorted hundreds of times
faster than platter disk (you can also make your TEMP tablespace in RAM disk).
Do do an external sort with Oracle, just add a separate step for sorting after
the SQL*Plus extraction of the unsorted rows, something like this:
#!/bin/ksh
sqlplus /nolog @/u01/emp.sql > /u01/ssd/unsorted.txt 2>&1 |sort > /u01/ssd/sorted.txt
An Oracle database will automatically perform
sorting operations on row data under the following circumstances:
- When an index is created
- When using the ORDER BY clause in SQL
- When using the GROUP BY clause in SQL
There are several ways to sequence Oracle output
rows:
- Oracle internal sort (in
sort_area_size or TEMP tablespace)
- Retrieve the rows in pre-sorted order
(using an index)
- Using a third-party sort product -
For shops that have to sequence millions of rows of output on an hourly
basis, external sort products can be faster than using Oracle to sort the
data.
Some shops purchase 3rd party vendor eternal sorting
tools for Oracle, which employ specialized sorting algorithms for special
data characteristics. See this
great web site for
visualization of different sorting algorithms!
Sort at query time or sort after retrieval?
Oracle will always use the cheapest method for
sequencing a result set, and the optimizer will use index retrieval (extracting
the rows in sorted order) if it consumes fewer resources than a back-end sort.
Remember, a sort that cannot fit into RAM will have to be done in the TEMP
tablespace, very slow with lots of disk I/O.
For testing performance, you can manually override the optimizer and force index
access with an /*+ index */ hint.
The choice of pre-sorting or post-sorting depends on several factors:
0 - The optimizer_mode parameter - The all_rows optimizer mode favors non-index sorting while
first_rows optimization favors index access and retrieval in index order
1 - The cpu_cost parameter favors indexes, the io_cost setting
favors sorting
2 - The size of your sort_area_size (or pga_aggregate_target,
if used)
3 - The clustering_factor of the index (see dba_indexes view)
4 - The db_block_size and avg_row_len
5 - The estimated size of the sorted result set
6 - The quality of your optimizer statistics, as computed by the
dbms_stats package
7 - The quality of CBO histograms for estimating the size of the final
result set
Let's take a close look at the internals of Oracle sorting.
Allocating resources for sorting
Under the covers Oracle probably uses several
sorting algorithms and the Oracle documentation notes that all sorts begin in
the RAM defined by pga_aggregate_target or sort_area_size, and
then spill-off onto disk (the TEMP tablespace), if the RAM is exhausted.
The algorithms are a competitive advantage for Oracle and are a closely held
secret, but we can see that their sorting changes by release. Joze
Senegacnik notes a new method for sorting in Oracle 10g release 2:
?Oracle10gRw introduced a new sort algorithm which is
using less memory and CPU resources [9],[10].
A hidden parameter _newsort_enabled = {TRUE|FALSE} governs
whether the new sort algorithm will be used.?
Managing Oracle sorting
At the time a session is established with
Oracle, a private sort area is allocated in memory for use by the session for
sorting, based on the value of the sort_area_size initialization
parameter. Unfortunately, the amount of memory must be the same for all
sessions, and it's not possible to add additional sort areas for tasks that are
sort intensive. Therefore, the DBA must strike a balance between allocating
enough sort area to avoid disk sorts for the large sorting tasks, keeping in
mind that the extra sort area will be allocated and not used by tasks that do
not require intensive sorting.
Whenever a sort can't be completed within the assigned space, a disk sort is
invoked using the temporary tablespace for the Oracle instance. A sort in the
temporary tablespace is very I/O intensive and can slow down the entire
database.
As a general rule, the sort_area_size
should be large enough that only index creation and ORDER BY clauses using
functions should be allowed to use a disk sort. However, operations on large
tables will always perform disk sorts. For example, the following query will
sort the salaries for all 100,000 employees at Oracle Corporation:
select salary
from employee
order by salary;
Oracle always tries to sort in the RAM space
within sort_area_size and only goes to a disk sort when the RAM memory is
exhausted.
Disk sorts are expensive for several reasons. First, they consume resources in
the temporary tablespaces. Oracle must also allocate buffer pool blocks to hold
the blocks in the temporary tablespace. In-memory sorts are always preferable to
disk sorts, and disk sorts will surely slow down an individual task, as well as
affect concurrent tasks on the Oracle instance.
Also, excessive disk sorting will cause a high value for free buffer waits,
paging other tasks' data blocks out of the buffer. You can see the amount of
disk and in-memory sorts by issuing the query shown here:

Here, you can see that there were 49 sorts to
disk. Out of a total of 7,019 sorts, this is well below 1 percent and is
probably acceptable for the system.
The sort information can be captured in STATSPACK tables and plotted to
determine the times when disk sorts are experienced by the instance, as shown
here:

Super sizing sort areas
Many Oracle
professionals do not know the important limitations of pga_aggregate_target,
and they do not know that they can dedicate large amounts of RAM to avoid a
time-consuming disk sort in the TEMP tablespace:
-
The total
work area cannot exceed 200 megabytes of RAM because of the default setting
for _pga_max_size.
-
No RAM sort
may use more than 5% of pga_aggregate_target or _pga_max_size,
whichever is smaller. This means that no task may use more than 10 megabytes
for sorting.
The following settings would increase the default sizes for large sorts.
pga_aggregate_target = 1000m
_pga_max_size = 1000m
_smm_px_max_size = 333m
With these hidden parameters set we see a 5 times larger size increase for sorts and
a RAM sort may now have up to 50 megabytes (5% of pga_aggregate_target) a
5x increase.
Tip!
In some cases Oracle is able to bypass a sort by reading the data in
sorted order from the index. Oracle will even read data in reverse order
from an index to avoid an in-memory sort.
For complete details on tuning Oracle sorting, see my book "Oracle
Tuning: The Definitive Reference".
Also, see
my related notes on Oracle sorting here: