Loading Techniques
In Oracle, the
standard way to load a data warehouse involves multiple steps. The
steps involved are usually:
1. Extract data from source database(s)
2. Load data into temporary work tables
performing any possible transformation/clean-up and aggregation.
3. Use internal/external scripts and stored
objects to transform, aggregate and load the data
The benefits of using temporary tables is
that the data is loaded fairly fast into the environment where it
can be cleaned, transformed and aggregated using internal procedures
which can be a highly automated process.
Alternatively the data can be transformed
and aggregated as it is extracted using scripts and procedures and
then can be loaded into the database using SQL*Loader. SQL*Loader
can be used with a direct load option which pre-builds data blocks
and inserts them into the database.
If data resides in non-Oracle data sources,
there are gateways which Oracle provides that should be used to make
these sources available to your processes:
* Procedural Gateway to APPC – applications
that use IBM's Advanced program-to program communication.
* Transparent Gateway for IBM's DB2.
* Transparent Gateway for IBM's DRDA using
the DRDA standard APPC/LU 6.2 protocol
* Transparent Gateway for EDA/SQL that
allows transparent access to 15 IBM MVS mainframe databases.
* Transparent Gateway to Informix
* Transparent Gateway to RMS on Digital
(DEC) (Now Compaq) VAX
* Transparent Gateway to Sybase
* Transparent Gateway to Teradata
Data can be sucked across these gateways
into a holding table in an Oracle database for either manipulation
or extraction into a flat file depending on how you wish to load the
results into the warehouse.
Alternatively flat files can be
extracted from the data source and transferred to the data warehouse
server. If you can have IBM files translated from EBCIDIC format
prior to your having to deal with it makes things much easier.
SQL*Loader direct load options
SQL*Loader is usually the work horse of the
data warehouse loading effort. SQL*Loader can do the following:
* Load data from disk or tape
* Support a large number of data types
* Load into one or more tables based on
specified criteria
* Load fixed or variable length records.
* Generate unique derived numeric keys as
the data warehouse is loaded
* Support several high performance options
* Produce detailed error reports and reload
tables to allow the isolation and correction of errors quickly and
easily
* Preprocess data as it is loading it into
the database.
If you specify the SQL*Loader command for
your platform at the command line without specifying any arguments a
usage guide will be printed. Here is an example:
SQL*Loader: Release 8.1.5.0.0 - Production
on Sun Aug 29 20:10:28 1999
(c) Copyright 1999 Oracle Corporation.
All rights reserved.
Usage: SQLLOAD keyword=value
[,keyword=value,...]
Valid Keywords:
userid -- ORACLE
username/password
control -- Control file
name
log -- Log file name
bad -- Bad file
name
data -- Data file
name
discard -- Discard file
name
discardmax -- Number of discards to
allow (Default all)
skip -- Number of logical records to
skip (Default 0)
load -- Number of logical records to
load (Default all)
errors -- Number of errors to
allow (Default 50)
rows -- Number of rows in conventional
path bind array or between
direct path data saves
(Default: Conventional path 64, Direct path all)
bindsize -- Size of conventional path bind
array in bytes (Default 65536)
silent -- Suppress messages during run (header,feedback,errors,discards,partitions)
direct -- use direct
path (Default FALSE)
parfile -- parameter file: name of file
that contains parameter specifications
parallel -- do parallel
load (Default FALSE)
file -- File to allocate extents
from
skip_unusable_indexes -- disallow/allow
unusable indexes or index partitions
(Default FALSE)
skip_index_maintenance -- do not maintain
indexes, mark affected indexes as
unusable (Default FALSE)
commit_discontinued -- commit loaded rows
when load is discontinued (Default FALSE)
readsize -- Size of Read
buffer (Default 65535)
PLEASE NOTE: Command-line parameters may be
specified either by position or by keywords. An example of the
former case is 'sqlload scott/tiger foo'; an example of the latter
is 'sqlload control=foouserid=scott/tiger'. One may specify
parameters by position before but not after parameters specified by
keywords. For example,
'sqlload scott/tiger control=foo logfile=log'
is allowed, but 'sqlload scott/tiger control=foo log' is
not, even though the position of the parameter 'log' is correct.
One thing to notice when discussing the
command line parameters for the SQL*Loader command is the caveat
under the ROWS parameter. A direct path load by default commits only
after the complete load is completed. What this means for a direct
load without a ROWS specification is that you better have enough
rollback area to handle the entire load transaction volume.
A direct
path load is the fastest way to load data into an Oracle database.
You specify a direct path load by use of the DIRECT=TRUE keyword
pair. By splitting input record sets into multiple files and then
using the PARALLEL=TRUE keyword pair you can use multiple SQL*Loader
sessions to load data into the same table (each will get a data
segment equal to the setting of the INITIAL parameter to load into,
plan for this) allowing almost a direct multiplication of throughput
with an equally linear decrease in loading time.
Another key to speeding the load process
using SQL*Loader is to eliminate logging as much as possible. You
can eliminate all but data dictionary action redo logging by using
the NOLOGGING option against the tables to be loaded before
beginning the load. To put is simply, issue the
ALTER TABLE table-name NOLOGGING;
command against the table before beginning
the load. Another method is to place the keyword UNRECOVERABLE in
the start of the SQL*Loader control file.
A SQL*Loader session will produce a log file
and a discard file. The discard and log file will be named the same
as the load file with the endings ".bad" and ".log" unless the tool
is told otherwise. The arguments can be specified by position or
through the use of the keyword with or without an equal sign.
To summarize, in order to minimize the time
required to perform a SQL*Load operation you:
1. Set all involved tables to NOLOGGING
using ALTER TABLE
2. Split the input data into multiple files
3. Place the UNRECOVERABLE keyword in the
control file(s) for the load
4. Use the DIRECT=TRUE and PARALLEL=TRUE
keywords
IMPORT-EXPORT
Import and export are two reflexive
products, one undoes what the other does. Export extracts a logical
copy of the objects and data while Import uses the data from Export
to rebuild and reload objects and data. An export file taken from
any recent version of Oracle can usually be loaded into any current
version at least back one or two main point releases or more.
This logical unload and reload of data
performed in EXPORT and IMPORT can be used to move both small and
large tables from source Oracle databases to the data warehouse to
load into temporary tables for further processing. Generally you
will not load the data into the same user as it was extracted from
so a FROMUSER-TOUSER IMPORT is generally used. Using the same method
as we did with SQL*Loader (except you must specify help=y) here are
quick synopsis of the IMP and EXP command sets on an NT version of
Oracle:
E:\ exp help=y
Export : Release 8.1.5.0 – Production on Sun
Aug 29 20:58:21 1999
(c) Copyright 1999 Oracle Corporation. All
rights reserved.
You can let Export prompt you for parameters
by entering the EXP
Command followed by your username/password:
Example: EXP SCOTT/TIGER
Or, you can control how Export runs by
entering the EXP command followed by various arguments. To specify
parameters, you use keywords:
Format: EXP KEYWORD=value or
KEYWORD=(value2, value2,…,valueN)
Example: EXP SCOTT/TIGER GRANTS=Y
TABLES=(EMP,DEPT,MGR)
Or TABLES=(T1:P1,TI:P2) if on
Oracle8 and T1 is partitioned.
USERID must be the first parameter on the
command line.
Keyword Description (Default)
Keyword Description(Default)
USERID userid/password
FULL export entire file (N)
BUFFER size of data buffer
OWNER list of owner usernames
FILE output files (EXPDAT.DMP)
TABLES list of table names
COMPRESS import into one extent
RECORDLENGTH length of IO record
GRANTS export grants (Y) INCTYPE
incremental export type
INDEXES export indexes (Y)
RECORD track incr. Export (Y)
ROWS export data rows (Y) PARFILE
parameter filename
CONSTRAINTS export constraints (Y)
CONSISTENT cross-table consistency
LOG log file of screen output
STATISTICS analyze objects
DIRECT direct path
(N) (ESTIMATE)
FEEDBACK display progress every X rows (0)
Oracle8 additions:
POINT_IN_TIME_RECOVER Tablespace point in
time recovery (N)
RECOVERY_TABLESPACES List of tablespaces to
recover
Oracle8i Additions:
FILESIZE maximum size of each dumpfile
QUERY select clause used to export a
subset of a table
The following keywords only apply to
transportable tablespaces (O8i)
TRANSPORT_TABLESPACE export transportable
tablespace metadata (N)
TABLESPACES list of tablespaces to transport
Export terminated successfully without
warnings
E:\ imp help=y
Import: Release 8.1.5.0.0 – Production on
Sun Aug 29 1999 20:58:21 1999
(c) Copyright 1999 Oracle Corporation. All
rights reserved.
You can let Export prompt you for parameters
by entering the EXP
Command followed by your username/password:
Example: IMP SCOTT/TIGER
Or, you can control how Import runs by
entering the IMP command followed by various arguments. To specify
parameters, you use keywords:
Format: IMP KEYWORD=value or
KEYWORD=(value2, value2,…,valueN)
Example: IMP SCOTT/TIGER IGNORE=Y
TABLES=(EMP,DEPT) FULL=N
Or TABLES=(T1:P1,TI:P2) if on
Oracle8 and T1 is partitioned.
USERID must be the first parameter on the
command line.
Keyword Description (Default)
Keyword Description(Default)
USERID username/password
FULL import entire file (N)
BUFFER size of data buffer FROMUSER
list of owner usernames
FILE input files (EXPDAT.DMP) TOUSER
list of usernames
SHOW just list file contents (N)
TABLES list of table names
IGNORE ignore create errors (N)
RECORDLENGTH length of IO record
GRANTS import grants (Y) INCTYPE
incremental import type
INDEXES import indexes (Y)
COMMIT commit array insert (N)
ROWS import data rows (Y) PARFILE
parameter filename
LOG log file of screen output
CONSTRAINTS import constraints (Y)
DESTROY overwrite tablespace data file (N)
INDEXFILE write table/index info to
specified file
SKIP_UNUSABLE_INDEXES skip maintenance of
unusable indexes (N)
Oracle8 additions:
CHARSET character set of export file (NLS_LANG)
POINT_IN_TIME_RECOVER tablespace
point-in-time recovery (N)
ANALYZE execute ANALYZE statments in dump
file (Y)
FEEDBACK display progress every x rows (0)
Oracle8i additions:
TOID_VALIDATE skip validation of specified
type ids
FILESIZE maximum size of each dump file
RECALCULATE_STATISTICS recalculate
statistics (N)
The following keywords only apply to
transportable tablespaces
TRANSPORT_TABLESPACE import transportable
tablespace metadata (N)
DATAFILES datafiles to be transported into
tablespace
ITS_OWNERS users that own data in the
transportable tablespace set
Import terminated successfully without
warnings.
Both import and export can be run as FULL,
OWNER or TABLE level activities. Notice how partitions are specified
with the colon separating the table name from the partition name.
This use of partition name in the export import utility and the
ability to specify it in selects and other command in Oracle8 argue
for not letting Oracle do default naming of partitions.
Notice the COMPRESS keyword. The COMPRESS
keyword forces all data for the tables in the export to be forced
into one extent, the next extent is sized the same as in the
original table. If you aren't careful using COMPRESS in export and
import you could exceed the size of your import tablespace data
files for large tables. Another important IMPORT keyword is IGNORE,
if the tables already exist in the target database use the IGNORE
keyword to ignore creation errors when objects already exist.
Another IMPORT keyword is COMMIT, use COMMIT to specify to commit
after the number of rows that will fit into a single buffer are
imported. By committing after a specified number of rows instead of
the default which is after a full table is imported you control the
size of rollback segment required. If you use commit use an initial
buffer specification of around 1,000,000 to start with and move up
from there. On some platforms such as SUN Solaris after a certain
buffer size is reached no further increases change the number of
rows committed.
|
|
|
|
Get the Complete
Oracle Utility Information
The landmark book
"Advanced Oracle
Utilities The Definitive Reference" contains over 600 pages of
filled with valuable information on Oracle's secret utilities.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|
|
|