Maximizing Oracle Import (impdp) Performance
Regardless of which options were used when the data was
exported, it has no influence on how the data is imported. For example, it is
irrelevant to the import process whether it was a direct path export or not,
since it is a plain export file, be it generated from direct or conventional
means.
Unfortunately, there is no
direct option available for imports (only for export and SQL*loader). The
import process has more tuning limitations than other utilities. The DBA should
consider the following when trying to optimize import performance:
Analyze once after the load - Set
analyze=n and analyze with dbms_stats after the load has completed.
Use Solid-state disk - For a fully-tuned
import job, only faster devices can speed-up import rates. Many large
companies used partitioned tables, and keep the current partition on SSD for
fast imports.
Increase recordlength - Many set
recordlength to 64k, but it needs to be a multiple of your I/O chunk size
and db_block_size (or your multiple block size, e.g. db_32k_block_size).
Set commit=n
For tables that can afford not to commit until the end of the load, this
option provides a significant performance increase. Larger tables may not be
suitable for this option due to the required rollback/undo space.
Dedicate a single, large rollback segment - Many professionals
create a single large rollback segment and take all others offline during
the import.
Set indexes=n
Index creation can be postponed until after import completes, by specifying indexes=n. If indexes for the target table already exist at the time of
execution, import performs index maintenance when data is inserted into the
table. Setting indexes=n eliminates this maintenance overhead.
You can also Use the indexfile parm to rebuild all the indexes once,
after the data is loaded.
Defer CBO stats - Using impdp with the parameter
exclude=statistics will greatly improve the import speed, but statistics
will need to be re-analyzed or imported later.
Use the buffer parameter
By using a larger buffer setting, import can do more work before disk access
is performed.
Disable logging - You can also use the hidden parameter _disable_logging =
true to reduce redo, but beware that the resulting import will be
unrecoverable.
|
Import Option |
Elapsed Time (Seconds) |
Time
Reduction |
|
commit=y |
120 |
- |
|
commit=y
buffer=64000 |
100
|
17% |
|
commit=n
buffer=30720 |
72 |
40% |
|
commit=N
buffer = 64000 |
67 |
44% |
Also see Metalink Note
155477.1 on improving importing speed:
You may be able to improve performance by increasing the
value of the RECORDLENGTH parameter when you invoke a direct path
Export.
Your exact performance gain depends upon the following
factors:
- DB_BLOCK_SIZE
- the types of columns in your table
- your I/O layout (the drive receiving the export file
should be separate from the disk drive where the database files
reside)
For example, invoking a Direct path Export with a
maximum I/O buffer of 64kb can improve the performance of the Export with
almost 50%.
This can be achieved by specifying the additional Export
parameters DIRECT and RECORDLENGTH. e.g.:
exp userid=system/manager full=y direct=y recordlength=65535
file=exp_full.dmp log=exp_full.log
imp userid=system/manager full=y recordlength=65535 file=exp_full.dmp log=imp_full.log
Of all of the performance techniques for database loading,
the most important is the nologging operation, and the undocumented parameter
_disable_logging=TRUE.
Note
this discussion
where Oracle expert Mark Bobak notes that you cannot re-start a failed instance
when using _disable_logging=true, and Don Burleson notes that
_disable_logging should only be used after taking a full, recoverable
backup:
"The main things to consider about NOLOGGING:
- Supported feature of Oracle
- Works only with direct load insert
- Still maintains minimal logging, i.e., space transactions, extent
invalidation records for loaded extents, etc.
Contrast with _disable_logging=TRUE:
- Undocumented, unsupported parameter
- all redo writes to log buffer still happen
- writes of redo buffer down to disk are disabled.
- Not limited to direct load insert. All operations are no longer protected
by redo on disk.
The big difference is, if your instance dies, (crash or
shutdown abort) there's no data in the online redo logs to even do instance
recovery. This is very bad. Your database could suffer a corruption from
which you can't recover."
All experienced DBA's know that You MUST get approval from
Oracle Technical Support for all usage of undocumented parameters, but in the
hands of a well-trained DBA, the hidden parameters are often a Godsend,
especially for speeding-up rebuilds during tight maintenance windows with
limited downtime.
The backup sandwich:
-
Take a dump (full system backup)
-
Load everything I can in PARALLEL NOLOGGONG mode (ONLY
during a scheduled downtime window).
-
Take a dump. Take a full system backup right
before bringing the system online.
Most DBA's call this high-speed maintenance a "backup sandwich", it's
quite common in my world where you have super-tight maintenance windows and
every second counts.
Remember, unless you take a full backup after your
maintenance, using _disable_logging=TRUE, an instance crash or a shutdown
abort WILL trash your Oracle database.
Next, lets examine tips and techniques to improve the
performance of all types of data loading, including data pump import (impdp),
sql*Loader and SQL INSERT DML.
Monitoring import performance
Monitoring Oracle Data Pump import performance is simple
select
substr(sql_text,instr(sql_text,'into "'),30) table_name,
rows_processed, round((sysdate-to_date(first_load_time,'yyyy-mm-dd
hh24:mi:ss'))*24*60,1) minutes,
trunc(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd
hh24:mi:ss'))*24*60)) rows_per_minute
from
sys.v_$sqlarea
where
sql_text like 'insert %into "%' and command_type = 2 and open_versions
> 0;
Maximizing SQL*Loader
Performance
Oracle SQL*Loader is flexible and offers many options that
should be considered to maximize the speed of data loads. These include:
1. Use Direct Path Loads - The conventional path
loader essentially loads the data by using standard insert statements. The
direct path loader (direct=true) loads directly into the Oracle data
files and creates blocks in Oracle database block format. The fact that SQL is
not being issued makes the entire process much less taxing on the database.
There are certain cases, however, in which direct path loads cannot be used
(clustered tables). To prepare the database for direct path loads, the script
$ORACLE_HOME/rdbms/admin/catldr.sql.sql must be executed.
2. Disable Indexes and Constraints.
For
conventional data loads only, the disabling of indexes and constraints can
greatly enhance the performance of SQL*Loader.
3. Use a Larger Bind Array. For conventional data
loads only, larger bind arrays limit the number of calls to the database and
increase performance. The size of the bind array is specified using the bindsize parameter. The bind array's size is equivalent to
the number of rows it contains (rows=) times the maximum length of each
row.
4. Use ROWS=n to Commit Less Frequently.
For conventional data loads only, rows specifies the number of rows per
commit. Issuing fewer commits will enhance performance.
5. Use Parallel Loads. Available with direct path
data loads only, this option allows multiple SQL*Loader jobs to execute
concurrently.
$ sqlldr control=first.ctl parallel=true direct=true
$ sqlldr control=second.ctl parallel=true direct=true
6. Use Fixed Width Data. Fixed width data format
saves Oracle some processing when parsing the data. The savings can be
tremendous, depending on the type of data and number of rows.
7. Disable Archiving During Load.
While this may
not be feasible in certain environments, disabling database archiving can
increase performance considerably.
8. Use unrecoverable. The unrecoverable
option (unrecoverable load data) disables the writing of the data to the redo
logs. This option is available for direct path loads only.
Optimizing Oracle INSERT performance
When using standard SQL statements to load Oracle data
tables, there are several tuning approaches:
a - Manage segment header contention for parallel inserts -
Make sure to define multiple freelist (or freelist groups) to remove contention
for the table header. Multiple freelists add additional segment header blocks,
removing the bottleneck. You can also use
Automatic Segment Space Management
(bitmap freelists) to support parallel DML, but ASSM has
some limitations.
b - Parallelize the load - You can invoke parallel DML (i.e. using the PARALLEL
hint) to have multiple inserts into the same table. For this INSERT
optimization, make sure to define multiple freelists or freelist groups.
Mark Bobak notes that if you submit
parallel jobs to insert against the table at the same time, using the APPEND
hint may cause serialization, removing the benefit of parallel jobstreams.
c - APPEND into tables - By using the APPEND hint, you
ensure that Oracle always grabs "fresh" data blocks by raising the
high-water-mark for the table. If you are doing parallel insert DML, the Append
mode is the default and you don't need to specify an APPEND hint.
Mark Bobak notes "Also, if you're going
w/ APPEND, consider putting the table into NOLOGGING mode, which will allow
Oracle to avoid almost all redo logging."
insert /*+
append */ into customer values ('hello',';there');
d - Use a large blocksize - By defining large (i.e. 32k)
blocksizes for the target table, you reduce I/O because more rows fit onto a
block before a "block full" condition (as set by PCTFREE) unlinks the block from
the freelist.
e - Disable/drop indexes - It's far faster to rebuild
indexes after the data load, all at-once. Also indexes will rebuild cleaner, and
with less I/O if they reside in a tablespace with a large block size.
Using SSD for
insert tablespaces
For
databases that require high-speed loads, some shops define the insert table
partition on
solid-state disk
(later moving it to platter disk). Mike Ault notes in his book
"Oracle
Solid-State Disk Tuning",
a respectable 30% improvement in load speed:
In the SSD verses ATA benchmark the gains for insert and update processing as shown
in the database loading and index build scenarios was a respectable 30%.
This 30% was due to the CPU overhead involved in the insert and update
activities.
If the Oracle level processing for insert and update activities could be
optimized for SSD, significant performance
gains might be realized during these activities."
Also see:
2008 Market Survey of SSD vendors for
Oracle:
There are many vendors who offer rack-mount solid-state disk that
work with Oracle databases, and the competitive market ensures that
product offerings will continuously improve while prices fall.
SearchStorage notes that SSD is will soon replace platter disks and that
hundreds of SSD vendors may enter the market:
"The number of vendors in this category could rise to several
hundred in the next 3 years as enterprise users become more familiar
with the benefits of this type of storage."
As of June 2008, many of the major hardware vendors (including Sun and
EMC) are replacing slow disks with RAM-based disks, and
Sun announced that all
of their large servers will offer SSD.
As of June 2008, here are the major SSD vendors for Oracle databases
(vendors are listed alphabetically):
2008 rack mount SSD Performance Statistics
SearchStorage has done a comprehensive survey of rack mount SSD
vendors, and lists these SSD rack mount vendors, with this showing the
fastest rack-mount SSD devices (as of May 15, 2008):
|
manufacturer |
model |
technology |
interface |
performance metrics and notes |
|
Texas Memory Systems |
RamSan-400 |
RAM SSD |
Fibre
Channel
InfiniBand |
3,000MB/s random
sustained external throughput, 400,000 random IOPS |
|
Violin Memory |
Violin 1010 |
RAM SSD
|
PCIe |
1,400MB/s read,
1,00MB/s write with Χ4 PCIe, 3 microseconds latency |
|
Solid Access Technologies |
USSD 200FC |
RAM SSD |
Fibre Channel
SAS
SCSI |
391MB/s random
sustained read or write per port (full duplex is 719MB/s), with
8 x 4Gbps FC ports aggregated throughput is approx 2,000MB/s,
320,000 IOPS |
|
Curtis |
HyperXCLR R1000 |
RAM SSD |
Fibre Channel
|
197MB/s sustained
R/W transfer rate, 35,000 IOPS |
Choosing the right SSD for Oracle
When evaluating SSD for Oracle databases you need
to consider performance (throughput and response time), reliability (Mean Time Between failures) and
TCO (total cost of ownership). Most SSD vendors will provide a
test RAM disk array for benchmark testing so that you can choose the
vendor who offers the best price/performance ratio.
Burleson Consulting does not partner with any SSD vendors and we
provide independent advice in this constantly-changing market. BC
was one of the earliest adopters of SSD for Oracle and we have been
deploying SSD on Oracle database since 2005 and we have experienced SSD
experts to help any Oracle shop evaluate whether SSD
is right for your application. BC experts can also help you choose
the SSD that is best for your database. Just
call 800-766-1884 or e-mail.:
for
SSD support details. |