WHAT IS AN ETL PROCESS?
Oracle Tips by Burleson Consulting
May 21, 2004
WHAT IS AN ETL PROCESS?
ETL process - acronymic for
extraction, transformation and loading operations are a fundamental
phenomenon in a data warehouse. Whenever DML (data manipulation
language) operations such as INSERT, UPDATE OR DELETE are issued on the
source database, data extraction occurs.
After data extraction and transformation have taken place, data are
loaded into the data warehouse. Incremental loading is beneficial in the
sense that only that have changed after the last data extraction and
transformation are loaded.
DATA CAPTURE FRAMEWORK
The change data framework is
designed for capturing only insert, delete and update operations on the
oracle database, that is to say they are 'DML sensitive'. Below is
architecture of change data capture framework. Below is architecture
illustrating the flow of information in an oracle data capture
Figure 1.Change data capture framework
Implementing oracle change
data capture is very simple. Following the following steps, guides you
through the whole implementation process.
Source table identification: Firstly, the source
tables must be identified.
Choose a publisher: The publisher is responsible
for creating and managing the change tables. Note that the publisher
must be granted SELECT_CATALOG_ROLE, which
enables the publisher to select data from any SYS-owned dictionary
tables or views and EXECUTE_CATALOG_ROLE, which
enables the publisher to receive execute privileges on any SYS-owned
packages. He also needs select privilege on the source tables
Change tables creation: When data extraction
occurs, change data are stored in the change tables. Also stored in
the change tables are system metadata, imperative for the smooth
functioning of the change tables. In order to create the change
tables, the procedure
DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE is executed. It is
important to note that each source table must have its own change
Choose the subscriber: The publisher must grant
select privilege on the change tables and source tables to the
subscriber. You might have more than one subscriber as the case may
Subscription handle creation: Creating the
subscription handle is very pertinent because it is used to
specifically identify a particular subscription. Irrespective of the
number of tables subscribed to, one and only one subscription handle
must be created. To create a subscription handle, first define a
variable, and then execute the
DBMS_LOGMNR_CDC_SUBSCRIBE.GET_SUBSCRIPTION HANDLE procedure.
Subscribe to the change tables: The data in the
change tables are usually enormous, thus only data of interest should
be subscribed to. To subscribe, the
DBMS_LOGMNR_CDC_SUBSCRIBE.SUBSCRIBE procedure is executed.
Subscription activation: Subscription is
activated only once and after activation, subscription cannot be
modified. Activate your subscription using the
Subscription window creation: Since subscription
to the change tables does not stop data extraction from the source
table, a window is set up using the
DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW procedure. However, it
is to be noted that changes effected on the source system after this
procedure is executed will not be available until the window is
flushed and re-extended.
Subscription views creation: In order to view
and query the change data, a subscriber view is prepared for
individual source tables that the subscriber subscribes to using
procedure. However, you need to define the variable in which the
subscriber view name would be returned. Also, you would be prompted
for the subscription handle, source schema name and source table name.
Query the change tables: Resident in the
subscriber view are not only the change data needed but also metadata,
fundamental to the efficient use of the change data such as
OPERATION$, CSCN$, USERNAME$
etc. Since you already know the view name, you can describe the view
and then query it using the conventional select statement.
Drop the subscriber view: The dropping of the
subscriber view is carried out only when you are sure you are done
with the data in the view and they are no longer needed (i.e. they've
been viewed and extracted). It is imperative to note that each
subscriber view must be dropped individually using the
Purge the subscription view: To facilitate the
extraction of change data again, the subscription window must be
purged using the DBMS_LOGMNR_CDC_SUBSCRIBE.PURGE_WINDOW
Get the Complete
Oracle SQL Tuning Information
The landmark book
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
for 30% off directly from the publisher.