Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 
 

Enterprise Solutions with Oracle ETL

Oracle Tips by V.J. Jain

 

Enterprise Solutions with Oracle ETL

By V.J. Jain, July 2015 (varun.jain@varunjaininc.com)
Varun Jain, Inc. (an Oracle Partner)

AbstractOracle 9i and 10g have introduced tools that allow for the composition of an enterprise ETL solution that is more preferable than off-the-shelf products.  The features include:

  • setup and use of an external table and analytic SQL which loads data from a complex XML and text file, allowing 1,674 records to be loaded from the file into a relational table in just 6.85 seconds

  • setup and use of a pipelined table function to vertically load more complex files with dynamic columns.  This allows as many as 31,233 records to be inserted from the file into a relational table in only 0.96 seconds

  • application of an itemized enterprise ETL solution after the functionality and efficiency of the previous features have been achieved

  • an exhibition of the dynamic properties of external tables, illustrate through altering file source by DDL. This includes ETL procedures which design load files dynamically to accept the file path as a parameter. 

  • use of NFS drives for space efficiency to achieve dynamic alterations

  • availability of a custom queue table, created by using an Oracle directory and a custom procedure, which populates based on the file listing by using a semaphore for external procedures

  • creation of DBMS_SCHEDULER job that monitors and processes the queue table. 

It is the combination of these components which result in a highly functional enterprise solution, offering benefits that surpass off-the-shelf products such as Informatica, Data Stage, or Ab Initio. 

Furthermore, Oracle 9i and 10g have added some very powerful tools to support high performance ETL.  While there may be familiarity with Oracle external tables and pipelined table functions, the full potential of Oracle can go unrealized.

The demand for high performance ETL is growing as organizations realize the importance of business intelligence and the need for a comprehensive data warehouse.  The majority of these organizations use expensive off-the-shelf products without realizing that they have already licensed the necessary tools to build a potentially better solution.  This article discusses the use of Oracle ETL features with Oracle Scheduler to provide a real-time, high scale, and high performance ETL solution.

 

Familiarity with external tables and pipelined table functions is necessary to fully understand the solution:

External Tables

External tables are used to provide relational access to the contents of a file from an Oracle directory.  The external table script defines the rules for reading the file, and once the table is created, the external table is accessed just like a regular table.  Here's an example:


Create directory DAT_DIR as ?/usr/tmp?;

 

File: sample.csv saved to /usr/tmp

ID, DATE, FIRST_NAME, LAST_NAME, STATUS

1, 17-JUL-07, John, Adams, Active

2, 17-JUL-07, Tyler, Howell, Active

3, 17-JUL-07, Jim, Lopez, Active

4, 17-JUL-07, Carlos, White, Inactive

5, 17-JUL-07, Scott, Tiger, Active

External table definition:

 

CREATE TABLE SAMPLE_EXT
(
ID NUMBER,
DATE VARCHAR2(10),  
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
STATUS VARCHAR2(10)
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
            records delimited by newline LOAD WHEN (ID != ?ID?)
            badfile BAD_DIR:'SAMPLE_EXT%a_%p.bad'
            logfile LOG_DIR:'SAMPLE_EXT%a_%p.log'
fields terminated by ',' optionally enclosed by '"' LRTRIM
            MISSING FIELD VALUES ARE NULL
   )
   LOCATION ('sample.csv')
 )
 PARALLEL 4
 REJECT LIMIT 1000;

 

Note:  Several of the properties in this definition are optional.  For example, if 'skip 1? could have been used in the table definition, but the method above can be a better alternative if it cannot be guaranteed that all file sources will use a header line. 

Now that the external table has been created, and the file has been placed on the server, it is possible to  access the data using a SQL such as:

 
Select * from SAMPLE_EXT
/
 
ID  DATE  FIRST_NAME  LAST_NAME  STATUS
1  17-JUL-07  John  Adams  Active
2  17-JUL-07  Tyler  Howell  Active
3  17-JUL-07  Jim  Lopez  Active
4  17-JUL-07  Carlos  White  Inactive
5  17-JUL-07  Scott  Tiger  Active
 
Select * from SAMPLE_EXT where STATUS = ?Inactive?
/
 
ID  DATE  FIRST_NAME  LAST_NAME  STATUS
4  17-JUL-07  Carlos  White  Inactive

 

This exhibits the relational access to this file in completion, even though the file is outside of the database.  Loading the contents of this file into a relational table is also easy to accomplish.

First, create a matching relational table as exemplified below: 

 

CREATE TABLE SAMPLE_TAB
(
ID NUMBER,
DATE VARCHAR2(10),  
FIRST_NAME VARCHAR2(30),
LAST_NAME VARCHAR2(30),
STATUS VARCHAR2(10)
);

 

 

Next, insert the data from the external table to the relational table:

 

Insert into sample_tab
Select * from sample_ext
/

 

 

It is a common misconception that external tables are only useful for simple formats such as comma or tab delimited files.  In loading a CSV file easily however, one can see that this is far from the truth.

External tables can also be used for complex files.  For example, because external tables are relationally accessible, the full flexibility of the SQL language can be utilized to get the data in the desired format.  If the file format is too complex for SQL to structure appropriately, a pipelined table function can be used.

When using external tables for more complex file formats, placing each row from the file into single columned row in the external table may offer great flexibility for transforming the data. 

Consider the following file:

sample2.dat (actual file is several megabytes)

23:18:00 PST 18 Feb 2015
<Inventory name=" CORVETTE">
            <AppId id="1"/>
            <Color>Yellow</Color>
</Inventory>
 
23:18:00 PST 18 Feb 2015
<Inventory name="CAMARO">
            <AppId id="2"/>
            <Color>White</Color>
</Inventory>
 
23:18:00 PST 18 Feb 2015
<Inventory name="PROWLER">
            <AppId id="3"/>
            <Color>Blue</Color>
</Inventory>

This file is obviously a blend between text and XML.  Though this is not a preferred format, in ETL there is often no control over the data sources.  How could an external table to efficiently load a file structured like this be used?  The solution is probably easier than assumed.

First, create the external table:

 

CREATE TABLE SAMPLE2_EXT (
field1 varchar2(4000)
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
             records delimited by newline
             badfile DAT_DIR:'SAMPLE2_EXT%a_%p.bad'
             logfile DAT_DIR:'SAMPLE2_EXT%a_%p.log'
             fields LRTRIM REJECT ROWS WITH ALL NULL FIELDS (field1 char(4000))
   )
   LOCATION ('sample2.dat')
 )
REJECT LIMIT 10;

 

Next, create the relational table:

 

CREATE TABLE SAMPLE2
(
  BEGIN_DATE  DATE,
  INVENTORY   VARCHAR2(30),
  APPID_ID    VARCHAR2(30),
  COLOR       VARCHAR2(30)
);

 

Now extract, transform, and load data from external table into the relational table:

 
SQL> insert /* append */ into sample2
  2  select /*+ materialize */
  3    max(BEGIN_DATE) BEGIN_DATE,
  4    max(INVENTORY) INVENTORY,
  5    max(APPID_ID) APPID_ID
  6    ,max(COLOR) COLOR
  7    FROM
  8   (
  9   select r, grp,
 10     case when (length(translate(substr(field1,1,1),' 0123456789',' '))) is null then to_date(to_
char(to_timestamp_tz(field1, 'HH24:MI:SS TZD DD Mon YYYY'), 'DD-MON-YYYY HH:MI:SS AM'), 'DD-MON-YYYY
 HH:MI:SS AM') end BEGIN_DATE,
 11     case when (instr(field1,'<Inventory')>0) then extractValue(XMLType(replace(field1, '">', '"/
>')), '/Inventory/@name') end INVENTORY,
 12    case when (instr(field1,'<AppId')>0) then extractValue(XMLType(replace(field1, '">', '"/>')),
 '/AppId/@id') end APPID_ID,
 13    case when (instr(field1,'<Color')>0) then extractValue(XMLType(field1), '/Color') end COLOR
 14    from
 15    (   
 16   select r, field1, nvl(max(grp) over (order by r),0) grp FROM
 17     (
 18      select A.r, A.field1, case when (length(translate(substr(field1,1,1),' 0123456789',' '))) i
s null then A.r end grp from
 19      (
 20       select rownum r, field1 from SAMPLE2_EXT A
 21      ) A
 22     )      
 23   )
 24   )
 25  group by grp
 26  /
 
1674 rows created.
 
Elapsed: 00:00:06.85
SQL>
 

 

The query above can extract, transform, and load 1,674 records into a relational table in under 7 seconds.  This is good performance considering the format has XML and non-XML content. 

Here is a sample of what is in our relational table:

 

select * from sample2
where rownum <=3
/
 
BEGIN_DATE    INVENTORY      APPID_ID          COLOR
 
2/18/2007 11:18:00 PM  CORVETTE       1          Yellow
2/18/2007 11:18:00 PM  CAMARO          2          White
2/18/2007 11:18:00 PM  PROWLER        3          Blue

 

The example above demonstrates a relatively simple use of a single columned external table transforming raw data into a relational format, but what happens when the format is more complex?

Pipelined Table Functions

Though they can get quite complicated, a table function is simply a function that returns rows.  Inside the table function, the row is prepared and each row is piped back to the session when they are ready.  The output rows are based on a type definition. 

Here is a simple example of a table function that accepts a VARCHAR2 input and returns rows with 1 character per row. 

 
CREATE TYPE row_typ AS OBJECT
       (p_row_output varchar2(10));
/
 
CREATE TYPE row_typ_tab AS TABLE OF row_typ;
/
 
 
create or replace function example_tab_fnc (pi_value IN varchar2)
return row_typ_tab PIPELINED AS
l_length number;
l_index        PLS_INTEGER := 1;
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
LOOP
            l_length := length(pi_value);
            EXIT WHEN l_index > l_length;
            PIPE ROW ( row_typ(substr(pi_value, l_index, 1)));
            l_index := l_index + 1;
END LOOP;
RETURN;
 
END;
 
select * from table (example_tab_fnc('ABCDE'))
/
 
P_ROW_OUTPUT
 
A
B
C
D
E

 

It is obvious that this feature could certainly prove useful in an ETL environment.  While the demonstration above is simplified, pipelined table functions can be used to transform complex files into a usable format.  For example, as input, a table function can accept a REF cursor pointing at an external table.  Consider the following file:

Sample3.dat

[CLASS_XXX]
[COLUMN_LISTING]
COL1,COL2,COL3,COL4,COL5
[COLUMN_VALUES]
XXX 219/  1/  0/  0/ 1,95,0,0,0,0
XXX 219/  1/  0/  1/ 1,23,0,0,0,0
XXX 219/  1/  0/  2/ 1,4,0,0,0,4
XXX 219/  1/  0/  3/ 1,37,1,0,0,33
[CLASS_YYY]
[COLUMN_LISTING]
COL21,COL22,COL23,COL24
[COLUMN_VALUES]
YYY 219/ 1/ 1/ 1/ 1,11092,49.52,627,179827
YYY 219/ 1/ 2/ 1/ 1,11092,199.52,628,179828
YYY 219/ 1/ 3/ 1/ 1,11092,459.52,629,179829
[CLASS_ZZZ]
[COLUMN_LISTING]
COL31,COL32,COL33,COL34,COL35,COL36,COL37
[COLUMN_VALUES]
ZZZ 219/  0/  0/ 1/ 1,0,0.00,0,0,7201,0,0
ZZZ 219/  0/  1/ 2/ 1,0,1.00,0,1,7202,0,2
ZZZ 219/  0/  2/ 1/ 2,0,2.00,0,2,7203,0,3

This file details the inventory of three different warehouses included in the same file, with each in its own class.  Every class has a dynamic listing of columns and column values.  Preceding the values line is a string such as ?XXX 219/  1/  0/  3/ 1?, ?YYY 219/ 1/ 1/ 1/ 1?, and ?ZZZ 219/  0/  0/ 1/ 1?  .  Each of the values in these strings map to source properties that form an inventory locator, and must be stored for each row.

For example, ?ZZZ 219/  0/  0/ 1/ 1?  maps to

Class: ZZZ

Store Number: 219

Pallet: 0

Lot: 0

Case: 1

Item: 1 

It is not feasible to have a separate table column for each possibility because there could be thousands of different columns for each class.  The best solution is to store the data vertically, by using a standard label for column name and column value. A desired relational storage would include the following columns:

CLASS STORE_NUMBER PALLET LOT CASE_NUMBER ITEM COL_NAME COL_VALUE

In order to have table function pipe rows with these columns, we need to define an appropriate type:

 

CREATE TYPE sample3_typ AS OBJECT
(
            CLASS varchar2(10),
            STORE_NUMBER number(6),
            PALLET number(6),
            LOT number(6),
            CASE_NUMBER number(6),
            ITEM number(6),
            COL_NAME varchar2(100),
            COL_VALUE varchar2(100)
);
/
 
CREATE TYPE sample3_typ_tab AS TABLE OF sample3_typ;
/

 

If the number of columns were fixed, we might try defining an external table that leverages the comma delimiters in the file format.  However, since the number of columns is unknown, we will define our external table to have a single column. 

In the original file, the column listing was over 16,000 characters long; to allow for this condition, we must define the column to be of type CLOB:

 

CREATE TABLE SAMPLE3_EXT (
field1 clob
)                      
 ORGANIZATION EXTERNAL
 (
   TYPE ORACLE_LOADER
   DEFAULT DIRECTORY DAT_DIR
   ACCESS PARAMETERS
   (
             records delimited by newline
             badfile DAT_DIR:'SAMPLE3_EXT%a_%p.bad'
             logfile DAT_DIR:'SAMPLE3_EXT%a_%p.log'
             fields LRTRIM REJECT ROWS WITH ALL NULL FIELDS (field1 char(32000))
   )
   LOCATION ('sample3.dat')
 )
 PARALLEL 10
 REJECT LIMIT 10;

 

And create the relational table:

 

CREATE TABLE SAMPLE3
(
  CLASS         VARCHAR2(10 BYTE),
  STORE_NUMBER  NUMBER(6),
  PALLET        NUMBER(6),
  LOT           NUMBER(6),
  CASE_NUMBER   NUMBER(6),
  ITEM          NUMBER(6),
  COL_NAME      VARCHAR2(100 BYTE),
  COL_VALUE     VARCHAR2(100 BYTE)
);
 

In order to transform raw data into the relational format, it may be beneficial to try to using SQL instead of a table function.  However, since the number of columns is unknown, and data is being stored vertically, the best solution is to use a pipelined table function to prepare the rows.  

Now that there are external table and type definitions,  table function that accepts a REF cursor (pointing to the external table) as an input parameter needs to be created.  This function will start at the beginning of the CURSOR (the first row in the external table), then loop through the rows of the external table, and simultaneously return rows to the session. 

The keywords CLASS, COLUMN_LISTING, and COLUMN_VALUES instruct the actions of the function on the next loop.  The CLASS keyword sets the local variable for class, the COLUMN_LISTING keyword builds a collection to store the column names, and the COLUMN_VALUES keyword causes the preparation and piping of each row as they are parsed.  

 

CREATE OR REPLACE function sample3_tab_fnc (pi_row IN sys_refcursor)
return sample3_typ_tab PIPELINED AS
l_string       LONG;
l_comma_index  PLS_INTEGER;
l_index        PLS_INTEGER := 1;
TYPE ListTyp IS TABLE OF VARCHAR2(100);
l_names ListTyp := ListTyp();
l_namesIter NUMBER :=1;
l_Iter NUMBER :=1;
l_cIter NUMBER :=1;
l_element varchar2(100);
l_field1 varchar2(32000);
pi_rownumber number := 1;
 
l_class varchar2(10);
l_store_number varchar2(6);
l_pallet varchar2(6);
l_lot varchar2(6);
l_case varchar2(6);
l_item varchar2(6);
l_action varchar2(10);
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
LOOP
     FETCH pi_row INTO l_field1;
     EXIT WHEN pi_row%NOTFOUND;
             if instr(l_field1, ',', -1) = length(l_field1) then
                         l_string  := l_field1;
             else
                         l_string  := l_field1 || ',';
             end if;
             l_index :=1;
             l_Iter :=1;
             l_cIter :=1;
           
             if l_field1 = '[CLASS_XXX]' then
                        l_class := 'XXX';
                        l_action := null;
                        l_names  := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;           
             elsif l_field1 = '[CLASS_YYY]' then
                        l_class := 'YYY';
                        l_action := null;
                        l_names := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;           
             elsif l_field1 = '[CLASS_ZZZ]' then
                        l_class := 'ZZZ';
                        l_action := null;
                        l_names := ListTyp();
                        l_namesIter :=1;
                        GOTO end_loop;
             elsif l_field1 = '[COLUMN_LISTING]' then
                        l_action := 'COLUMNS';
                        l_namesIter :=1;
                        GOTO end_loop;
             elsif l_field1 = '[COLUMN_VALUES]' then
                        l_action := 'VALUES';
                        GOTO end_loop;
             end if;
             
            --Get Value Names into l_names
            if l_action = 'COLUMNS' then
                        LOOP
                        l_comma_index := INSTR(l_string, ',', l_index);
                        EXIT WHEN l_comma_index = 0;
                                    l_names.EXTEND;
                                    l_names(l_namesIter) := SUBSTR(l_string, l_index, l_comma_index - l_index);
                                    l_index := l_comma_index + 1;
                                    l_namesIter := l_namesIter + 1;
                        END LOOP;
            end if;
           
            --Get Element and Pipe Rows
            if l_action = 'VALUES' then
                        LOOP
                                    l_comma_index := INSTR(l_string, ',', l_index);
                                    EXIT WHEN l_comma_index = 0;
                                                if l_Iter = 1 then
                                                            l_element := SUBSTR(l_string, l_index, l_comma_index - l_index);
                                                            if l_class IN ('XXX', 'YYY', 'ZZZ') then
                                                                        l_store_number := trim(substr(l_element,4,instr(l_element, '/',1,1)-4));
                                                                        l_pallet := trim(substr(l_element,instr(l_element, '/',1,1)+1,instr(l_element, '/',1,2)-instr(l_element, '/',1,1)-1 ));
                                                                        l_lot := trim(substr(l_element,instr(l_element, '/',1,2)+1,instr(l_element, '/',1,3)-instr(l_element, '/',1,2)-1 ));
                                                                        l_case := trim(substr(l_element,instr(l_element, '/',1,3)+1,instr(l_element, '/',1,4)-instr(l_element, '/',1,3)-1 ));
                                                                        l_item := trim(substr(l_element,instr(l_element, '/',1,4)+1,instr(l_element, '/',1,5)-instr(l_element, '/',1,4)-1 ));                                                                     
                                                            end if;                                                              
                                                else
                                                            PIPE ROW ( sample3_typ(trim(l_class), l_store_number, l_pallet, l_lot, l_case, l_item, trim(l_names(l_cIter)), trim(SUBSTR(l_string, l_index, l_comma_index - l_index))) );
                                                            l_cIter := l_cIter + 1;
                                                end if;
                                                l_Iter := 2; --move out of element row
                                                l_index := l_comma_index + 1;
                        END LOOP;
            end if;
<<end_loop>>
null;     
END LOOP;
RETURN;
END;
/

 

 

The data can be selected by using SQL

 

select * from table (sample3_tab_fnc(CURSOR(select * from SAMPLE3_EXT)))

/

 

CLASS STORE_NUMBER         PALLET            LOT      CASE_NUMBER           ITEM     COL_NAME      COL_VALUE

 

XXX       219       1          0          0                      COL1    95

XXX       219       1          0          0                      COL2    0

XXX       219       1          0          0                      COL3    0

XXX       219       1          0          0                      COL4    0

XXX       219       1          0          0                      COL5    0

XXX       219       1          0          1                      COL1    23

XXX       219       1          0          1                      COL2    0

XXX       219       1          0          1                      COL3    0

XXX       219       1          0          1                      COL4    0

XXX       219       1          0          1                      COL5    0

XXX       219       1          0          2                      COL1    4

XXX       219       1          0          2                      COL2    0

XXX       219       1          0          2                      COL3    0

XXX       219       1          0          2                      COL4    0

XXX       219       1          0          2                      COL5    4

XXX       219       1          0          3                      COL1    37

XXX       219       1          0          3                      COL2    1

XXX       219       1          0          3                      COL3    0

XXX       219       1          0          3                      COL4    0

XXX       219       1          0          3                      COL5    33

YYY     219       1          1          1                      COL21  11092

YYY     219       1          1          1                      COL22  49.52

YYY     219       1          1          1                      COL23  627

YYY     219       1          1          1                      COL24  179827

YYY     219       1          2          1                      COL21  11092

YYY     219       1          2          1                      COL22  199.52

YYY     219       1          2          1                      COL23  628

YYY     219       1          2          1                      COL24  179828

YYY     219       1          3          1                      COL21  11092

YYY     219       1          3          1                      COL22  459.52

YYY     219       1          3          1                      COL23  629

YYY     219       1          3          1                      COL24  179829

ZZZ       219       0          0          1                      COL31  0

ZZZ       219       0          0          1                      COL32  0.00

ZZZ       219       0          0          1                      COL33  0

ZZZ       219       0          0          1                      COL34  0

ZZZ       219       0          0          1                      COL35  7201

ZZZ       219       0          0          1                      COL36  0

ZZZ       219       0          0          1                      COL37  0

ZZZ       219       0          1          2                      COL31  0

ZZZ       219       0          1          2                      COL32  1.00

ZZZ       219       0          1          2                      COL33  0

ZZZ       219       0          1          2                      COL34  1

ZZZ       219       0          1          2                      COL35  7202

ZZZ       219       0          1          2                      COL36  0

ZZZ       219       0          1          2                      COL37  2

ZZZ       219       0          2          1                      COL31  0

ZZZ       219       0          2          1                      COL32  2.00

ZZZ       219       0          2          1                      COL33  0

ZZZ       219       0          2          1                      COL34  2

ZZZ       219       0          2          1                      COL35  7203

ZZZ       219       0          2          1                      COL36  0

ZZZ       219       0          2          1                      COL37  3

 

51 rows selected.

 

Elapsed: 00:00:00.76

SQL>

 

 

 

Using the same procedure on a larger file, the query above extracts, transforms, and loads 31,233 records into the relational table in under 1 second. 

 

SQL> insert /*+ append */ into sample3

  2  select * from table (sample3_tab_fnc(CURSOR(select * from SAMPLE3_EXT)))

  3  /

 

31233 rows created.

 

Elapsed: 00:00:00.96

SQL>

 

This design allows for the content of the data file to change dynamically without requiring modifications to the code.  For example, if a new class (Warehouse) is added or new columns are added or removed, the data will continue to load without interruption.

Building an enterprise ETL solution

The previous examples provided convincing evidence that Oracle external tables and pipelined table functions allow an efficient, highly flexible method for performing ETL. However, the next solution consists of a few elements that are the building blocks of the enterprise solution which show these features in a dynamic, high scale, real-time loading environment. 

Dynamic External Table Definitions

The first element of the solution is the ability to dynamically control the data source location for an external table.  The following command can change SAMPLE3_EXT from using sample3.dat to using sample3a.dat, meaning the file source from PL/SQL can be dynamically changed:

 

SQL> ALTER TABLE SAMPLE3_EXT LOCATION('sample3a.dat');

Table altered.

 

Keep this in mind while other elements of the solution are explained.

NFS - Network File Systems

While having a network drive is not necessary, it can help in case the incoming directories can be on the network drive instead of the Oracle drive.  This can help minimize the risk of disk space problems affecting the database as well as improve the security model, because the NFS drive can be written to without access to the Oracle server. 

Create and Populate Custom Queue

The second element of the solution is a table to track all of the ETL processes.  This queue table should reflect all the normal information such as date, start, end, status, path, etc. 

There are a couple of options in populating a custom queue table, beginning with monitoring folders for new files using a process on the operating system.  Another option is to define the incoming directories as Oracle, by using a directory listing custom procedure.  This invokes a Java procedure to get relational access to a listing of files. The filenames in the directory are then copied to a temporary table that can be queried.  Tom Kyte gives a great explanation for how to accomplish this.  

For example, consider an incoming folder ?/incoming/vendor? and a corresponding Oracle directory ?VENDOR_DIR?.  New files ?file1.dat?, ?file2.dat?, and ?file3.dat? are placed in our folder, and the directory listing function is executed from a procedure scheduled in DBMS_SCHEDULER

 

 

Begin

 Get_directory_list(?/incoming/vendor?);

End;

 

Select filename from directory_list_tmp;

/

 

FILENAME

file1.dat

file2.dat

file3.dat

 

 

The file listing is selected from the scheduled process. Here, files are compared with those already queued, and the unrecognized files are inserted into the queue.  This simple process can add all incoming files to the custom queue table on nearly a real-time basis.

ETL Procedures

As previously explained, external tables and pipelined table functions can be used in PL/SQL procedures to load the data from the files to the relational tables.  There is a different procedure for each file format, all accepting the file path as a parameter.  Before initiating the load, the procedure should dynamically change the file source using the syntax mentioned above.  It may be helpful to change the log and bad files to be file specific.  At the end of the procedure, the corresponding queue record should be updated appropriately.    

Concurrency (Multi-threaded external tables)

Because the Oracle ETL procedures are so fast, a single thread (single concurrency per external table) should be sufficient for the majority of requirements.  However, multi-thread of your external table is necessary, and this is fairly simple to do by using a semaphore. 

Putting the pieces together

There should be a scheduled job that scans and processes the queue based on particular business rules.  If files in the queue have not been started, this process will submit a new job to DBMS_SCHEDULER. This runs the appropriate ETL procedure for that file and updates the record in queue appropriately.  By passing the file path to the ETL procedure, the procedure is able to dynamically redefine the data source for that external table to point to the file.  Rules for handling queued records in error or complete status can be added to this job, as well as rules for defining concurrency. 

At any time, the queue table can provide a real-full view of a system's entire ETL activity.  Having all elements of the ETL architecture within the database has many benefits including: reporting, cost management, error handling and maintenance.  All of the elements in the solutions above exhibit an efficient alternative for an off the shelf ETL tool. 


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.