 |
|
Optimize Oracle SQL*Loader Performance
Don Burleson
|
1.
Use Direct Path
Loads - The conventional path loader essentially loads the data by
using standard insert statements. The direct path loader (direct=true)
loads directly into the Oracle data files and creates blocks in Oracle
database block format. The fact that SQL is not being issued makes
the entire process much less taxing on the database. There are
certain cases, however, in which direct path loads cannot be used
(clustered tables). To prepare the database for direct path loads,
the script $ORACLE_HOME/rdbms/admin/catldr.sql.sql must be
executed.
2.
Disable Indexes
and Constraints. For conventional data loads only, the disabling
of indexes and constraints can greatly enhance the performance of
SQL*Loader.
3.
Use a Larger
Bind Array. For conventional data loads only, larger bind arrays
limit the number of calls to the database and increase performance.
The size of the bind array is specified using the bindsize parameter. The
bind array's size is equivalent to the number of rows it contains (rows=)
times the maximum length of each row. Also see the columnarrayrows
and streamsize parameters.
4.
Use ROWS=n
. For conventional data loads only,
rows specifies the number of rows per commit and is related to
bindsize. Issuing fewer
commits will enhance performance, and the larger rows parameter
affects performance (see benchmark below).
5.
Use Parallel
Loads. Available with direct path data loads only, this option
allows multiple SQL*Loader jobs to execute concurrently.
$ sqlldr
control=first.ctl parallel=true direct=true
$ sqlldr
control=second.ctl parallel=true direct=true
6.
Use Fixed Width
Data. Fixed width data format saves Oracle some processing when
parsing the data. The savings can be tremendous, depending on the
type of data and number of rows.
7.
Disable
Archiving During Load. While this may not be feasible in certain
environments, disabling database archiving can increase performance
considerably.
8.
Use
unrecoverable. The
unrecoverable option (unrecoverable load data) disables the writing of
the data to the redo logs. This option is available for direct path
loads only.
Benchmark data
From the book "Advanced
Oracle
Utilities" we see a valid benchmark of SQL*Loader performance.
"Using the
table table_with_one_million_rows, the following benchmark tests
were performed with the various SQL*Loader options. The table
was truncated after each test.
SQL*Loader Option |
Elapsed Time (Seconds) |
Time Reduction |
direct=false
rows=64
|
135
|
-
|
direct=false
bindsize=512000
rows=10000
|
92
|
32%
|
direct=false
bindsize=512000
rows=10000
DB in
noarchivelog mode
|
85
|
37%
|
direct=true
|
47
|
65%
|
direct=true
unrecoverable
|
41
|
70%
|
direct=true
unrecoverable
fixed
width data
|
41
|
70%
|
SQL*Loader test results
indicate conventional path loads take longest.
The results above indicate that conventional path loads take the
longest. However, the bindsize and rows parameters
can aid the performance under these loads. The test involving
the conventional load didn't come close to the performance of
the direct path load with the unrecoverable option
specified.
It is also worth noting that the fastest import time achieved
for this table (earlier) was 67 seconds, compared to 41 for
SQL*Loader direct path - a 39% reduction in execution time. This
proves that SQL*Loader can load the same data faster than
import.
These tests did not compensate for indexes. All database load
operations will execute faster when indexes are disabled.
Another SQL*Loader benchmark test
This benchmark by
Warren Koch tests a SQL*Loader (sqlldr)
import of 2m rows, using direct path with index skip set for
the baseline. Here's results for differing values for column
array and stream size in SQL*Loader timings.
CONTROL FILE SNIPPET:
OPTIONS (DIRECT=TRUE, SKIP=TRUE, ERRORS=50, rows=500000,
COLUMNARRAYROWS=xx, STREAMSIZE=yy)
UNRECOVERABLE LOAD DATA
TRUNCATE
into table F15_ADPLS_NEXTASSY
fields terminated by X'9' optionally enclosed by X'1F'
TRAILING NULLCOLS {…}
This is for a load of 1,964,601
rows (about 2 million) from a delimited file. I know I could achieve
much higher speeds going to a fixed width format but my data source
precludes that.
Report headings:
columns =
COLUMNARRAYROWS Parameter Setting (xx)
Stream = STREAMSIZE parameter setting (yy)
CPU = CPU time in minutes (from sqlldr log)
Elapsed = Elapsed time in minutes (from sqlldr log)
Main = Total stream buffers loaded by SQL*Loader main thread (from
sqlldr log)
Load = Total stream buffers loaded by SQL*Loader load thread (from
sqlldr log)
Columns Stream Elap time CPU
Main Load
100
256,000 04:12.0 02:52.3 19908
0
1,000
256,000
04:14.4 02:57.3 2219 218
5,000
256,000 04:17.3 03:05.9 515 1,350
50,000 256,000
04:19.1 03:09.9 515 1,350
5,000
500,000
04:15.9 03:02.8 515 512
5,000
512,000
04:14.6 03:02.7 515 512
5,000
128,000
04:11.0 03:05.5 515 2,908
5,000
64,000
04:10.0 03:06.7 515 5876
|
|
|
Get the Complete
Oracle Utility Information
The landmark book
"Advanced Oracle
Utilities The Definitive Reference" contains over 600 pages of
filled with valuable information on Oracle's secret utilities.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|
|
|
|