Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 
 

Using Pipelined Table Functions to Streamline Oracle ETL

Oracle Tips by Burleson Consulting
Don Burleson

By Mark Rittman

Streamlining Oracle ETL With Pipelined Table Functions

I've recently been reading Bert Scalzo's "Oracle DBAs Guide To Data Warehousing And Star Schemas", and he's got an interesting take on the ETL process that I thought it worth looking at in more detail.

Background

Bert was the DBA behind the 7-Eleven data warehouse, one of the first multi-terabyte dimensional data warehouses built on the Oracle database. The book is a bit different to other Oracle data warehousing books in that it concentrates on the implementation details of building a warehouse, and as it's not by an Oracle member of staff he tends to have forthright opinions on what elements of the technology stack work, and what don't. A particular example of this is where he looks at the various releases of the Oracle database (7, 8i and ) and explains in some detail why 7 and 8i don't really cut the mustard as large-scale data warehousing platforms. In another instance, he looks at the evolution of the star transformation, explains why the original implementations (which were first implemented using nested loops, and then after by cartesian joins) weren't up to scratch but then absolutely insists that you use the version that comes with Oracle . As the author's not afraid to offer an opinion it's an entertaining read, and a bit different to your run of the mill 'new features' book.

One chapter that particularly interested me was his approach to loading the data warehouse. Right from the outset, Bert eschews the use of ETL tools, preferring instead to code the load routines manually. In his opinion, the trouble with ETL tools is:

  • The code they produce is not optimally efficient, and often uses slow-running Java to host and run the processes

  • The cost of the tools is often more than the hours they replace

  • They allow the project to be staffed by people with little Oracle or data warehousing experience

This is a theme echoed in an Oracleworld paper by Arup Nanda that I looked at last year, which again rejected the use of ETL tools, but this time because the nature of the ETL (lots of small transformations to a large number of nearly identical tables) was better served by running a number of Unix shell scripts that in turn generated SQL scripts to a pattern. I'll come back to the ETL vs. hand coding debate later on in the article.

Anyway, the ETL process that the author wanted to carry out involved the following steps

  1. load data from a large flat file

  2. apply transformations to the data

  3. update/insert the data into a dimension table

What made this interesting was the fact that the file was very, very big, and the ETL window was small. Therefore, any ETL process had to

  • Minimise inter-process wait states

  • Maximise total concurrent CPU usage

To illustrate, take a situation where your ETL process starts with a file of 1000 records, and has two stages, each of which take an hour to run. The least efficient way to process this file is to let the first process handle all 1000 records (taking an hour to do so), then pass the data on to the second process, which will also take an hour to run. Your total run time is two hours. What the author is suggesting with 'minimise inter-process wait states' is that it would be good if your first process could handle for example 10 records, then pass these on to the second process so that it could start it's work. The first process can then get on with the next 10 records, and the pass them on when it's done. Your total run time would then be just a bit over one hour, cutting down dramatically the time to perform the load.

The 'maximise total concurrent CPU usage' requirement aims to take advantage of the ability for multi-processor servers to carry out more than one task at once. Oracle is very good at this, and has had this feature (known as parallelism) for many years. However, it only really comes in to play if you use DML statements (insert, update etc) to do your load; if you write your load in PL/SQL using cursors and loops to process your data row-by-row, you're not going to be able to take advantage of parallel DML. What these two requirements mean, therefore, is that your ideal ETL process is going to use parallel processing, and somehow reduce to the minimum the wait between different ETL stages.

 



 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.