WHAT IS AN ETL PROCESS?

May 21,  2004
John Garmany

 


WHAT IS AN ETL PROCESS?

ETL process - acronymic for extraction, transformation and loading operations are a fundamental phenomenon in a data warehouse. Whenever DML (data manipulation language) operations such as INSERT, UPDATE OR DELETE are issued on the source database, data extraction occurs.
After data extraction and transformation have taken place, data are loaded into the data warehouse. Incremental loading is beneficial in the sense that only that have changed after the last data extraction and transformation are loaded.

ORACLE CHANGE DATA CAPTURE FRAMEWORK

The change data framework is designed for capturing only insert, delete and update operations on the oracle database, that is to say they are 'DML sensitive'. Below is architecture of change data capture framework. Below is architecture illustrating the flow of information in an oracle data capture framework.





Figure 1.Change data capture framework architecture.

Implementing oracle change data capture is very simple. Following the following steps, guides you through the whole implementation process.

  1. Source table identification: Firstly, the source tables must be identified.
  2. Choose a publisher: The publisher is responsible for creating and managing the change tables. Note that the publisher must be granted SELECT_CATALOG_ROLE, which enables the publisher to select data from any SYS-owned dictionary tables or views and EXECUTE_CATALOG_ROLE, which enables the publisher to receive execute privileges on any SYS-owned packages. He also needs select privilege on the source tables
  3. Change tables creation: When data extraction occurs, change data are stored in the change tables. Also stored in the change tables are system metadata, imperative for the smooth functioning of the change tables. In order to create the change tables, the procedure DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE is executed. It is important to note that each source table must have its own change table.
  4. Choose the subscriber: The publisher must grant select privilege on the change tables and source tables to the subscriber. You might have more than one subscriber as the case may be.
  5. Subscription handle creation: Creating the subscription handle is very pertinent because it is used to specifically identify a particular subscription. Irrespective of the number of tables subscribed to, one and only one subscription handle must be created. To create a subscription handle, first define a variable, and then execute the DBMS_LOGMNR_CDC_SUBSCRIBE.GET_SUBSCRIPTION HANDLE procedure.
  6. Subscribe to the change tables: The data in the change tables are usually enormous, thus only data of interest should be subscribed to. To subscribe, the DBMS_LOGMNR_CDC_SUBSCRIBE.SUBSCRIBE procedure is executed.
  7. Subscription activation: Subscription is activated only once and after activation, subscription cannot be modified. Activate your subscription using the DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION procedure.
  8. Subscription window creation: Since subscription to the change tables does not stop data extraction from the source table, a window is set up using the DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW procedure. However, it is to be noted that changes effected on the source system after this procedure is executed will not be available until the window is flushed and re-extended.
  9. Subscription views creation: In order to view and query the change data, a subscriber view is prepared for individual source tables that the subscriber subscribes to using DBMS_LOGMNR_CDC_SUBSCRIBE.PREPARE_SUBSCRIBER_VIEW procedure. However, you need to define the variable in which the subscriber view name would be returned. Also, you would be prompted for the subscription handle, source schema name and source table name.
  10. Query the change tables: Resident in the subscriber view are not only the change data needed but also metadata, fundamental to the efficient use of the change data such as OPERATION$, CSCN$, USERNAME$ etc. Since you already know the view name, you can describe the view and then query it using the conventional select statement.
  11. Drop the subscriber view: The dropping of the subscriber view is carried out only when you are sure you are done with the data in the view and they are no longer needed (i.e. they've been viewed and extracted). It is imperative to note that each subscriber view must be dropped individually using the DBMS_LOGMNR_CDC_SUBSCRIBE.DROP_SUBSCRIBE_VIEW procedure.
  12. Purge the subscription view: To facilitate the extraction of change data again, the subscription window must be purged using the DBMS_LOGMNR_CDC_SUBSCRIBE.PURGE_WINDOW procedure.

The above text is an excerpt from Mark Rittman's Oracle Weblog

http://www.rittman.net/archives/000901.html

About the author

Kehinde O Eseyin is a computer scientist, an Oracle DBA and an Oracle certified professional working with the Nigerian Ports Authority, Delta ports, Warri. Nigeria.



    Need an Oracle Health Check?
  • Do you have bad performance after an upgrade?
     
  • Need to certify that your database follows best practices?

BC Oracle performance gurus can quickly certify every aspect of your Oracle database and provide a complete verification that your database is fully optimized.

 

Regards,

John Garmany

John Garmany
Burleson Oracle Consulting

Kittrell, NC, USA, 27544
www.dba-oracle.com
www.remote-dba.net 

 

 
 

 

 

Burleson Consulting
One Burleson Plaza - First Floor - RN3
 2729 Rocky Ford Road • Kittrell, NC, 27544

Email: info@remote-dba.net • Phone (252) 431-0049

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002 by Burleson Enterprises, Inc. All rights reserved.