Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

Free Oracle Tips

HTML Text

 Home
 E-mail Us
 Oracle Articles



 Oracle Training
 Oracle News

 Oracle Forum
 Class Catalog


 Our Staff
 Our Prices
 Help Wanted!

 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 UNIX
 Oracle UNIX
 Linux
 Oracle Linux
 Monitoring
 Remote help

 Remote plans
 Remote
services
 Oracle C++
 Oracle Java
 Apache
 JDeveloper
 App Server

 Applications
 Oracle Forms
 Oracle Portal
 11i Upgrades
 SQL Server
 Oracle Concepts
 HTML-DB Tips
 Software Help

 Remote Help  
 Development  

 Implementation


 Financials Training
 Oracle 11i
 Oracle Apps 11i
 Oracle Workflow
 Oracle AR 11i Class
 Oracle AP 11i class
 Oracle GL 11i class
 Oracle HR 11i class
 Oracle FA 11i class
 11i Project Mgt
 11i procurement
 11i collections


 Oracle Posters
 Oracle Books

 Oracle Tuning Book
 Oracle RAC Book
 Oracle Security
 Easy Oracle Books
 Oracle Scripts
 SQL Server DBA
 SQL Design Patterns
 WISE
 Excel-DB   


 BC Oracle News


 Rednecks!
 Dress code
 Arabian Stallion

 Burleson Arabians
 Guide Horses
 Don Burleson Blog
 Golf & Travel


 Privacy Policy
 

 

 

 
 

Enterprise Solutions with Oracle ETL

Oracle Tips by V.J. Jain

 

Enterprise Solutions with Oracle ETL

By V.J. Jain, July 2007 (varun.jain@varunjaininc.com)
Varun Jain, Inc. (an Oracle Partner)

AbstractOracle 9i and 10g have introduced tools that allow for the composition of an enterprise ETL solution that is more preferable than off-the-shelf products.  The features include:

  • setup and use of an external table and analytic SQL which loads data from a complex XML and text file, allowing 1,674 records to be loaded from the file into a relational table in just 6.85 seconds

  • setup and use of a pipelined table function to vertically load more complex files with dynamic columns.  This allows as many as 31,233 records to be inserted from the file into a relational table in only 0.96 seconds

  • application of an itemized enterprise ETL solution after the functionality and efficiency of the previous features have been achieved

  • an exhibition of the dynamic properties of external tables, illustrate through altering file source by DDL. This includes ETL procedures which design load files dynamically to accept the file path as a parameter. 

  • use of NFS drives for space efficiency to achieve dynamic alterations

  • availability of a custom queue table, created by using an Oracle directory and a custom procedure, which populates based on the file listing by using a semaphore for external procedures

  • creation of DBMS_SCHEDULER job that monitors and processes the queue table. 

It is the combination of these components which result in a highly functional enterprise solution, offering benefits that surpass off-the-shelf products such as Informatica, Data Stage, or Ab Initio. 

Furthermore, Oracle 9i and 10g have added some very powerful tools to support high performance ETL.  While there may be familiarity with Oracle external tables and pipelined table functions, the full potential of Oracle can go unrealized.

The demand for high performance ETL is growing as organizations realize the importance of business intelligence and the need for a comprehensive data warehouse.  The majority of these organizations use expensive off-the-shelf products without realizing that they have already licensed the necessary tools to build a potentially better solution.  This article discusses the use of Oracle ETL features with Oracle Scheduler to provide a real-time, high scale, and high performance ETL solution.

 

Familiarity with external tables and pipelined table functions is necessary to fully understand the solution:

External Tables

External tables are used to provide relational access to the contents of a file from an Oracle directory.  The external table script defines the rules for reading the file, and once the table is created, the external table is accessed just like a regular table.  Here’s an example:


Create directory DAT_DIR as ‘/usr/tmp’;

 

File: sample.csv saved to /usr/tmp

ID, DATE, FIRST_NAME, LAST_NAME, STATUS

1, 17-JUL-07, John, Adams, Active

2, 17-JUL-07, Tyler, Howell, Active

3, 17-JUL-07, Jim, Lopez, Active

4, 17-JUL-07, Carlos, White, Inactive

5, 17-JUL-07, Scott, Tiger, Active

External table definition:

 

CREATE TABLE SAMPLE_EXT
(
ID NUMBER,
DATE VARCHAR2(10),  
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
STATUS VARCHAR2(10)
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
            records delimited by newline LOAD WHEN (ID != ‘ID’)
            badfile BAD_DIR:'SAMPLE_EXT%a_%p.bad'
            logfile LOG_DIR:'SAMPLE_EXT%a_%p.log'
fields terminated by ',' optionally enclosed by '"' LRTRIM
            MISSING FIELD VALUES ARE NULL
   )
   LOCATION (‘sample.csv')
 )
 PARALLEL 4
 REJECT LIMIT 1000;

 

Note:  Several of the properties in this definition are optional.  For example, if “skip 1” could have been used in the table definition, but the method above can be a better alternative if it cannot be guaranteed that all file sources will use a header line. 

Now that the external table has been created, and the file has been placed on the server, it is possible to  access the data using a SQL such as:

 
Select * from SAMPLE_EXT
/
 
ID  DATE  FIRST_NAME  LAST_NAME  STATUS
1  17-JUL-07  John  Adams  Active
2  17-JUL-07  Tyler  Howell  Active
3  17-JUL-07  Jim  Lopez  Active
4  17-JUL-07  Carlos  White  Inactive
5  17-JUL-07  Scott  Tiger  Active
 
Select * from SAMPLE_EXT where STATUS = ‘Inactive’
/
 
ID  DATE  FIRST_NAME  LAST_NAME  STATUS
4  17-JUL-07  Carlos  White  Inactive

 

This exhibits the relational access to this file in completion, even though the file is outside of the database.  Loading the contents of this file into a relational table is also easy to accomplish.

First, create a matching relational table as exemplified below: 

 

CREATE TABLE SAMPLE_TAB
(
ID NUMBER,
DATE VARCHAR2(10),  
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
STATUS VARCHAR2(10)
);

 

 

Next, insert the data from the external table to the relational table:

 

Insert into sample_tab
Select * from sample_ext
/

 

 

It is a common misconception that external tables are only useful for simple formats such as comma or tab delimited files.  In loading a CSV file easily however, one can see that this is far from the truth.

External tables can also be used for complex files.  For example, because external tables are relationally accessible, the full flexibility of the SQL language can be utilized to get the data in the desired format.  If the file format is too complex for SQL to structure appropriately, a pipelined table function can be used.

When using external tables for more complex file formats, placing each row from the file into single columned row in the external table may offer great flexibility for transforming the data. 

Consider the following file:

sample2.dat (actual file is several megabytes)

23:18:00 PST 18 Feb 2007
<Inventory name=" CORVETTE">
            <AppId id="1"/>
            <Color>Yellow</Color>
</Inventory>
 
23:18:00 PST 18 Feb 2007
<Inventory name="CAMARO">
            <AppId id="2"/>
            <Color>White</Color>
</Inventory>
 
23:18:00 PST 18 Feb 2007
<Inventory name="PROWLER">
            <AppId id="3"/>
            <Color>Blue</Color>
</Inventory>

This file is obviously a blend between text and XML.  Though this is not a preferred format, in ETL there is often no control over the data sources.  How could an external table to efficiently load a file structured like this be used?  The solution is probably easier than assumed.

First, create the external table:

 

CREATE TABLE SAMPLE2_EXT (
field1 varchar2(4000)
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
             records delimited by newline
             badfile DAT_DIR:'SAMPLE2_EXT%a_%p.bad'
             logfile DAT_DIR:'SAMPLE2_EXT%a_%p.log'
             fields LRTRIM REJECT ROWS WITH ALL NULL FIELDS (field1 char(4000))
   )
   LOCATION ('sample2.dat')
 )
REJECT LIMIT 10;

 

Next, create the relational table:

 

CREATE TABLE SAMPLE2
(
  BEGIN_DATE  DATE,
  INVENTORY   VARCHAR2(30),
  APPID_ID    VARCHAR2(30),
  COLOR       VARCHAR2(30)
);

 

Now extract, transform, and load data from external table into the relational table:

 
SQL> insert /* append */ into sample2
  2  select /*+ materialize */
  3    max(BEGIN_DATE) BEGIN_DATE,
  4    max(INVENTORY) INVENTORY,
  5    max(APPID_ID) APPID_ID
  6    ,max(COLOR) COLOR
  7    FROM
  8   (
  9   select r, grp,
 10     case when (length(translate(substr(field1,1,1),' 0123456789',' '))) is null then to_date(to_
char(to_timestamp_tz(field1, 'HH24:MI:SS TZD DD Mon YYYY'), 'DD-MON-YYYY HH:MI:SS AM'), 'DD-MON-YYYY
 HH:MI:SS AM') end BEGIN_DATE,
 11     case when (instr(field1,'<Inventory')>0) then extractValue(XMLType(replace(field1, '">', '"/
>')), '/Inventory/@name') end INVENTORY,
 12    case when (instr(field1,'<AppId')>0) then extractValue(XMLType(replace(field1, '">', '"/>')),
 '/AppId/@id') end APPID_ID,
 13    case when (instr(field1,'<Color')>0) then extractValue(XMLType(field1), '/Color') end COLOR
 14    from
 15    (   
 16   select r, field1, nvl(max(grp) over (order by r),0) grp FROM
 17     (
 18      select A.r, A.field1, case when (length(translate(substr(field1,1,1),' 0123456789',' '))) i
s null then A.r end grp from
 19      (
 20       select rownum r, field1 from SAMPLE2_EXT A
 21      ) A
 22     )      
 23   )
 24   )
 25  group by grp
 26  /
 
1674 rows created.
 
Elapsed: 00:00:06.85
SQL>
 

 

The query above can extract, transform, and load 1,674 records into a relational table in under 7 seconds.  This is good performance considering the format has XML and non-XML content. 

Here is a sample of what is in our relational table:

 

select * from sample2
where rownum <=3
/
 
BEGIN_DATE    INVENTORY      APPID_ID          COLOR
 
2/18/2007 11:18:00 PM  CORVETTE       1          Yellow
2/18/2007 11:18:00 PM  CAMARO          2          White
2/18/2007 11:18:00 PM  PROWLER        3          Blue

 

The example above demonstrates a relatively simple use of a single columned external table transforming raw data into a relational format, but what happens when the format is more complex?

Pipelined Table Functions

Though they can get quite complicated, a table function is simply a function that returns rows.  Inside the table function, the row is prepared and each row is piped back to the session when they are ready.  The output rows are based on a type definition. 

Here is a simple example of a table function that accepts a VARCHAR2 input and returns rows with 1 character per row. 

 
CREATE TYPE row_typ AS OBJECT
       (p_row_output varchar2(10));
/
 
CREATE TYPE row_typ_tab AS TABLE OF row_typ;
/
 
 
create or replace function example_tab_fnc (pi_value IN varchar2)
return row_typ_tab PIPELINED AS
l_length number;
l_index        PLS_INTEGER := 1;
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
LOOP
            l_length := length(pi_value);
            EXIT WHEN l_index > l_length;
            PIPE ROW ( row_typ(substr(pi_value, l_index, 1)));
            l_index := l_index + 1;
END LOOP;
RETURN;
 
END;
 
select * from table (example_tab_fnc('ABCDE'))
/
 
P_ROW_OUTPUT
 
A
B
C
D
E

 

It is obvious that this feature could certainly prove useful in an ETL environment.  While the demonstration above is simplified, pipelined table functions can be used to transform complex files into a usable format.  For example, as input, a table function can accept a REF cursor pointing at an external table.  Consider the following file:

Sample3.dat

[CLASS_XXX]
[COLUMN_LISTING]
COL1,COL2,COL3,COL4,COL5
[COLUMN_VALUES]
XXX 219/  1/  0/  0/ 1,95,0,0,0,0
XXX 219/  1/  0/  1/ 1,23,0,0,0,0
XXX 219/  1/  0/  2/ 1,4,0,0,0,4
XXX 219/  1/  0/  3/ 1,37,1,0,0,33
[CLASS_YYY]
[COLUMN_LISTING]
COL21,COL22,COL23,COL24
[COLUMN_VALUES]
YYY 219/ 1/ 1/ 1/ 1,11092,49.52,627,179827
YYY 219/ 1/ 2/ 1/ 1,11092,199.52,628,179828
YYY 219/ 1/ 3/ 1/ 1,11092,459.52,629,179829
[CLASS_ZZZ]
[COLUMN_LISTING]
COL31,COL32,COL33,COL34,COL35,COL36,COL37
[COLUMN_VALUES]
ZZZ 219/  0/  0/ 1/ 1,0,0.00,0,0,7201,0,0
ZZZ 219/  0/  1/ 2/ 1,0,1.00,0,1,7202,0,2
ZZZ 219/  0/  2/ 1/ 2,0,2.00,0,2,7203,0,3

This file details the inventory of three different warehouses included in the same file, with each in its own class.  Every class has a dynamic listing of columns and column values.  Preceding the values line is a string such as “XXX 219/  1/  0/  3/ 1”, “YYY 219/ 1/ 1/ 1/ 1”, and “ZZZ 219/  0/  0/ 1/ 1”  .  Each of the values in these strings map to source properties that form an inventory locator, and must be stored for each row.

For example, “ZZZ 219/  0/  0/ 1/ 1”  maps to

Class: ZZZ

Store Number: 219

Pallet: 0

Lot: 0

Case: 1

Item: 1 

It is not feasible to have a separate table column for each possibility because there could be thousands of different columns for each class.  The best solution is to store the data vertically, by using a standard label for column name and column value. A desired relational storage would include the following columns:

CLASS STORE_NUMBER PALLET LOT CASE_NUMBER ITEM COL_NAME COL_VALUE

In order to have table function pipe rows with these columns, we need to define an appropriate type:

 

CREATE TYPE sample3_typ AS OBJECT
(
            CLASS varchar2(10),
            STORE_NUMBER number(6),
            PALLET number(6),
            LOT number(6),
            CASE_NUMBER number(6),
            ITEM number(6),
            COL_NAME varchar2(100),
            COL_VALUE varchar2(100)
);
/
 
CREATE TYPE sample3_typ_tab AS TABLE OF sample3_typ;
/

 

If the number of columns were fixed, we might try defining an external table that leverages the comma delimiters in the file format.  However, since the number of columns is unknown, we will define our external table to have a single column. 

In the original file, the column listing was over 16,000 characters long; to allow for this condition, we must define the column to be of type CLOB:

 

CREATE TABLE SAMPLE3_EXT (
field1 clob
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
             records delimited by newline
             badfile DAT_DIR:'SAMPLE3_EXT%a_%p.bad'
             logfile DAT_DIR:'SAMPLE3_EXT%a_%p.log'
             fields LRTRIM REJECT ROWS WITH ALL NULL FIELDS (field1 char(32000))
   )
   LOCATION ('sample3.dat')
 )
 PARALLEL 10
 REJECT LIMIT 10;

 

And create the relational table:

 

CREATE TABLE SAMPLE3
(
  CLASS         VARCHAR2(10 BYTE),
  STORE_NUMBER  NUMBER(6),
  PALLET        NUMBER(6),
  LOT           NUMBER(6),
  CASE_NUMBER   NUMBER(6),
  ITEM          NUMBER(6),
  COL_NAME      VARCHAR2(100 BYTE),
  COL_VALUE     VARCHAR2(100 BYTE)
);
 

In order to transform raw data into the relational format, it may be beneficial to try to using SQL instead of a table function.  However, since the number of columns is unknown, and data is being stored vertically, the best solution is to use a pipelined table function to prepare the rows.  

Now that there are external table and type definitions,  table function that accepts a REF cursor (pointing to the external table) as an input parameter needs to be created.  This function will start at the beginning of the CURSOR (the first row in the external table), then loop through the rows of the external table, and simultaneously return rows to the session. 

The keywords CLASS, COLUMN_LISTING, and COLUMN_VALUES instruct the actions of the function on the next loop.  The CLASS keyword sets the local variable for class, the COLUMN_LISTING keyword builds a collection to store the column names, and the COLUMN_VALUES keyword causes the preparation and piping of each row as they are parsed.  

 

CREATE OR REPLACE function sample3_tab_fnc (pi_row IN sys_refcursor)
return sample3_typ_tab PIPELINED AS
l_string       LONG;
l_comma_index  PLS_INTEGER;
l_index        PLS_INTEGER := 1;
TYPE ListTyp IS TABLE OF VARCHAR2(100);
l_names ListTyp := ListTyp();
l_namesIter NUMBER :=1;
l_Iter NUMBER :=1;
l_cIter NUMBER :=1;
l_element varchar2(100);
l_field1 varchar2(32000);
pi_rownumber number := 1;
 
l_class varchar2(10);
l_store_number varchar2(6);
l_pallet varchar2(6);
l_lot varchar2(6);
l_case varchar2(6);
l_item varchar2(6);
l_action varchar2(10);
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
LOOP
     FETCH pi_row INTO l_field1;
     EXIT WHEN pi_row%NOTFOUND;
             if instr(l_field1, ',', -1) = length(l_field1) then
                         l_string  := l_field1;
             else
                         l_string  := l_field1 || ',';
             end if;
             l_index :=1;
             l_Iter :=1;
             l_cIter :=1;
           
             if l_field1 = '[CLASS_XXX]' then
                        l_class := 'XXX';
                        l_action := null;
                        l_names  := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;           
             elsif l_field1 = '[CLASS_YYY]' then
                        l_class := 'YYY';
                        l_action := null;
                        l_names := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;           
             elsif l_field1 = '[CLASS_ZZZ]' then
                        l_class := 'ZZZ';
                        l_action := null;
                        l_names := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;
             elsif l_field1 = '[COLUMN_LISTING]' then
                        l_action := 'COLUMNS';
                        l_namesIter :=1;
                        GOTO end_loop;
             elsif l_field1 = '[COLUMN_VALUES]' then
                        l_action := 'VALUES';
                        GOTO end_loop;
             end if;
             
            --Get Value Names into l_names
            if l_action = 'COLUMNS' then
                        LOOP
                        l_comma_index := INSTR(l_string, ',', l_index);
                        EXIT WHEN l_comma_index = 0;
                                    l_names.EXTEND;
                                    l_names(l_namesIter) := SUBSTR(l_string, l_index, l_comma_index - l_index);
                                    l_index := l_comma_index + 1;
                                    l_namesIter := l_namesIter + 1;
                        END LOOP;
            end if;
           
            --Get Element and Pipe Rows
            if l_action = 'VALUES' then
                        LOOP
                                    l_comma_index := INSTR(l_string, ',', l_index);
                                    EXIT WHEN l_comma_index = 0;
                                                if l_Iter = 1 then
                                                            l_element := SUBSTR(l_string, l_index, l_comma_index - l_index);
                                                            if l_class IN ('XXX', 'YYY', 'ZZZ') then
                                                                        l_store_number := trim(substr(l_element,4,instr(l_element, '/',1,1)-4));
                                                                        l_pallet := trim(substr(l_element,instr(l_element, '/',1,1)+1,instr(l_element, '/',1,2)-instr(l_element, '/',1,1)-1 ));
                                                                        l_lot := trim(substr(l_element,instr(l_element, '/',1,2)+1,instr(l_element, '/',1,3)-instr(l_element, '/',1,2)-1 ));
                                                                        l_case := trim(substr(l_element,instr(l_element, '/',1,3)+1,instr(l_element, '/',1,4)-instr(l_element, '/',1,3)-1 ));
                                                                        l_item := trim(substr(l_element,instr(l_element, '/',1,4)+1,instr(l_element, '/',1,5)-instr(l_element, '/',1,4)-1 ));                                                                     
                                                            end if;                                                              
                                                else
                                                            PIPE ROW ( sample3_typ(trim(l_class), l_store_number, l_pallet, l_lot, l_case, l_item, trim(l_names(l_cIter)), trim(SUBSTR(l_string, l_index, l_comma_index - l_index))) );
                                                            l_cIter := l_cIter + 1;
                                                end if;
                                                l_Iter := 2; --move out of element row
                                                l_index := l_comma_index + 1;
                        END LOOP;
            end if;
<<end_loop>>
null;     
END LOOP;
RETURN;
END;
/

 

 

The data can be selected by using SQL

 

select * from table (sample3_tab_fnc(CURSOR(select * from SAMPLE3_EXT)))

/

 

CLASS STORE_NUMBER         PALLET            LOT      CASE_NUMBER           ITEM     COL_NAME      COL_VALUE

 

XXX       219       1          0          0                      COL1    95

XXX       219       1          0          0                      COL2    0

XXX       219       1          0          0                      COL3    0

XXX       219       1          0          0                      COL4    0

XXX       219       1          0          0                      COL5    0

XXX       219       1          0          1                      COL1    23

XXX       219       1          0          1                      COL2    0

XXX       219       1          0          1                      COL3    0

XXX       219       1          0          1                      COL4    0

XXX       219       1          0          1                      COL5    0

XXX       219       1          0          2                      COL1    4

XXX       219       1          0          2                      COL2    0

XXX       219       1          0          2                      COL3    0

XXX       219       1          0          2                      COL4    0

XXX       219       1          0          2                      COL5    4

XXX       219       1          0          3                      COL1    37

XXX       219       1          0          3                      COL2    1

XXX       219       1          0          3                      COL3    0

XXX       219       1          0          3                      COL4    0

XXX       219       1          0          3                      COL5    33

YYY     219       1          1          1                      COL21  11092

YYY     219       1          1          1                      COL22  49.52

YYY     219       1          1          1                      COL23  627

YYY     219       1          1          1                      COL24  179827

YYY     219       1          2          1                      COL21  11092

YYY     219       1          2          1                      COL22  199.52

YYY     219       1          2          1                      COL23  628

YYY     219       1          2          1                      COL24  179828

YYY     219       1          3          1                      COL21  11092

YYY     219       1          3          1                      COL22  459.52

YYY     219       1          3          1                      COL23  629

YYY     219       1          3          1                      COL24  179829

ZZZ       219       0          0          1                      COL31  0

ZZZ       219       0          0          1                      COL32  0.00

ZZZ       219       0          0          1                      COL33  0

ZZZ       219       0          0          1                      COL34  0

ZZZ       219       0          0          1                      COL35  7201

ZZZ       219       0          0          1                      COL36  0

ZZZ       219       0          0          1                      COL37  0

ZZZ       219       0          1          2                      COL31  0

ZZZ       219       0          1          2                      COL32  1.00

ZZZ       219       0          1          2                      COL33  0

ZZZ       219       0          1          2                      COL34  1

ZZZ       219       0          1          2                      COL35  7202

ZZZ       219       0          1          2                      COL36  0

ZZZ       219       0          1          2                      COL37  2

ZZZ       219       0          2          1                      COL31  0

ZZZ       219       0          2          1                      COL32  2.00

ZZZ       219       0          2          1                      COL33  0

ZZZ       219       0          2          1                      COL34  2

ZZZ       219       0          2          1                      COL35  7203

ZZZ       219       0          2          1                      COL36  0

ZZZ       219       0          2          1                      COL37  3

 

51 rows selected.

 

Elapsed: 00:00:00.76

SQL>

 

 

 

Using the same procedure on a larger file, the query above extracts, transforms, and loads 31,233 records into the relational table in under 1 second. 

 

SQL> insert /*+ append */ into sample3

  2  select * from table (sample3_tab_fnc(CURSOR(select * from SAMPLE3_EXT)))