Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

   

 


New Developments In Oracle Data Warehousing


April 22, 2004

I've been giving a lot of thought recently to how I might go about designing and building future Oracle data warehousing projects. Over the last few months, I've come across some interesting design approaches, and coupled with some new technology that's now available it's probably a good time to have a think about how future projects might come together.

One of the key messages that keeps coming through, from both clients and from Oracle, is around near or real-time data warehousing. Traditionally, data warehouses have been refreshed daily, or weekly or monthly, and it's been generally accepted that there's always going to be a lag between the data in the source systems, and the data in the data warehouse. This isn't necessarily a bad thing - data warehouses are for analyzing trends, and you wouldn't want to change the direction of a business based on a few minutes data; however, the role that data warehouses play is changing, and it's common now to use the data warehouse as a way of getting a holistic view of the whole business, in real time, so that split-second decisions can be taken as soon as an event happens, to prevent a customer defection, or to take advantage of market movements, for example. If you're going to be relevant in the data warehousing market, you've got to be capable of being real-time.

Warehouses that are built using Oracle Warehouse Builder are generally loaded on a batch basis, using PL/SQL routines build using OWB that are scheduled using OEM and/or Oracle Workflow. If you want these routines to be 'near' real-time, what you then do is run them on a regular, constant basis (say every minute or so) and have them process changed data from the source system, as opposed to loading the entire source system each time. The usual way to do this is to have some sort of 'new' or 'changed' date field in the source system, so that you can pick up only those records that are new or amended. The smart way to do this though, as from Oracle 9i, is to use 'Change Data Capture', a mechanism for automatically feeding changed database records through to interface tables, or indeed through to the warehouse fact and dimension tables themselves.

In actual fact, I've never actually used the Oracle 9i change data capture, because of the way it is implemented - it relies on triggers being placed on the source tables, which can (a) slow down the source system and (b) worry the source system DBA as you're changing his data model. However, I recently had a look at a presentation and paper from the last Oracleworld by Michael Brey and Michael Schmitz entitled "Experiences with Real-Time Data Warehousing Using Oracle Database 10g" which explains how the new 10g version of change data capture, called 'Asynchronous' (as opposed to 'Synchronous') change data capture works.





In the 10g version, the redo logs are used instead of triggers, making the process far less invasive and putting no load on the source system. The drawback of course is that the source system (as well as the target) have to be 10g, but I can see how you could put a 'real time' system together, that mines the redo logs, transforms the data using a PL/SQL package and places the data in the correct fact or dimension table. More details on Asynchronous Change Data Capture can be found in the Oracle white paper "On-Time Data Warehousing with Oracle10g - Information at the Speed of your Business" available on OTN.

Another interesting approach that the paper talks about deals with the question of real time vs. fairly static data in the data mart. One criticism of having a real time data mart or warehouse is that if you run a query twice in succession, you might get a different answer, as more data has been loaded since you first run the query. To get around this, the system designers had a main fact table, which is only loaded once a day, together with a 'today's data' fact table, which is updated constantly during the day. If you want a static view of the data, you just query the main fact table; if you want a completely up to date view, you SELECT against both tables, and UNION the results. At the end of the day, the 'today's data' fact table is instantaneously 'partition exchanged' with the partitioned main fact table, which was partitioned by day and had an empty partition set aside for each new day's data.

Following on from this, I attended a presentation by Jon Page at the recent UKOUG Reporting Tools & BIW Special Event in March on Oracle's 'Information Architecture'. Oracle are currently pushing a vision of data warehousing where the warehouse is integrated into the ERP system; in Oracle's case, using a product called 'Daily Business Intelligence' (DBI), an add-on to the e-Business Suite. Interestingly, what Oracle are saying with DBI is that you don't need a separate warehouse, and you don't need an ETL process that loads data in batches and applies changes to the warehouse on a daily, weekly or monthly basis. It looks like DBI is being populated and refreshed using Fast Refresh Materialized Views - materialized views that are updated whenever the base tables have data COMMITed, a process that has little impact on database performance but has the advantage of keeping the summary tables (or in fact any derived tables) always up to date.
 




What this means in effect is that you don't have a long drawn-out ETL process, with multiple staging tables and mass-copying of data from source to target - you just create some fast-refresh materialized views and code the ETL into SELECT statements. A good approach - although the point I keep coming up with is 'what about complex ETL processes?' - surely at some point you're going to need intermediate tables, and surely at some point you're going to have to break out of standard SQL and use PL/SQL for a more complex process. However, it's an interesting approach and one I'll bear in mind for whatever I work on next.

Separate from all this, I also recently took a look at a paper by Arup Nanda entitled "A Multi-Source, Time Variant Data Warehouse Case Study" - you might have seen a write-up I gave this paper at the time. The case study describes a project where a medical claims data warehouse was built for a client, with a rule that no ETL tool - not even OWB - could be used. The paper then goes on to describe how they set up the warehouse and the ETL routines, using nothing more than SQL*Plus and Unix shell scripts. What was interesting about the paper was the completely different approach they took to putting a warehouse together; most projects I work on involve one or more OWB developers, building a series of ETL mappings that translate into SQL scripts and PL/SQL packages, whilst the authors of the paper did most of their work using parameterized CREATE TABLE AS SELECT ('CTAS') scripts that were called from Unix shell scripts.

What was different about Arup Nanda's approach was that he eschewed the often long, drawn out graphical ETL process used by OWB and crafted a number of reusable SQL scripts that bulk-copied data from source to target using the CTAS statement. In addition, to deal with changes to data types he then crafted a number of scripts that issues ALTER TABLE commands to change column types. Lastly, he made extensive use of (again) daily partitioned fact tables, dropping old partitions and creating new partitions using generic SQL scripts and alongside this using locally partitioned indexes that reduced index maintenance to those partitions directly affected by table loads.

What this said to me was that Oracle 9i warehousing projects don't necessarily need to use OWB, if the process can be better served by putting together well crafted reusable SQL scripts called from either MSDOS batch files or Unix shell scripts. It also highlighted the central role that partitioning plays in a warehouse, not only for performance reasons but also due to the potential for partition exchange loading - the process of swapping out a staging table that contains data for a period with the corresponding empty partition in the fact table. In fact, OWB 9.2 now supports partition exchange loading, so this can also be carried out from the OWB GUI tool without having to learn the relevant SQL statements. However, in some cases - where the ETL process is simple but repeated many times with source data of varying structure - coding SQL statements by hand and controlling them from shell scripts is sometimes more efficient than coding them using a graphical ETL tool.

The last development I've been looking at, is 'Agile' methodologies that take a fundamentally different approach to the project build than traditional 'waterfall' methodologies.

Traditional data warehouse projects that I've worked on (and managed) usually work to a project plan where requirements are first defined, then the system is designed, then coded, tested then rolled out. Whilst this works well when the project requirements can be completely designed at the start, it sometimes falls down when

  • the requirements aren't fully defined at the start, and in fact only come out as the project gets going
  • changes to requirements, or problems during the build mean that the project diverges from the plan early on
  • the client wants to see some results early, but the project only delivers something right at the end
  • time is tight but the project methodology demands lots of documentation, and lots of process
  • team members have different skills but get treated for planning purposes as interchangeable

Agile methodologies, such as Extreme Programming or Scrum address these issues by taking a fundamentally different approach.



 

  • Requirements are driven out over development cycles constantly feeding back into the project
  • iterations of the project are delivered regularly and quickly, so that feedback can be obtained
  • the methodology is 'lightweight', only determining objectives and leaving the developers to decide how to do the work
  • There is very little project 'ceremony', only requiring a minimal amount of project documentation
  • What project process there is is mainly focused on setting objectives and measuring progress
  • Team members are recognized as having unique skills which sometimes cannot be interchanged.
  • This agile approach looks excellent for business intelligence & data warehousing projects, which more often than not don't have their requirements fully defined at the start ("I don't know what I want, so show me what can be done and I'll tell you then") and benefit most from regular releases of data and reports to the warehouse user base.  It's also the case that, in most projects I've heard of, whilst there is a lot of time and attention paid to the plan at the start of the project, as time goes on less and less project documentation actually gets filled in as everyone's backs are to the wall getting the project delivered - the agile approach would therefore fit this better, allowing us to focus on that documentation which is actually essential - setting objectives and measuring progress - whilst allowing us to dispense with documentation that is only filled in 'because it should be'.

    The agile approach should therefore help us deliver projects that are more responsive to customer needs, deliver results earlier and consistently, and allow developers to concentrate their efforts on parts of the project that deliver real value. Of course the potential problem here is that most projects you are asked to quote for are on a fixed price, fixed requirements basis; the client believes they have fully defined the requirements as part of the RFP, and they therefore want you to quote a fixed price for doing this work. Any changes to requirements (which more often than not eventually come in) then become change requests which have to then be separately negotiated. If agile methodologies are going to work, they'll need to be a partnership between client and the development team, balancing the benefits from a flexible approach to the design and build with the fact that it's not possible to completely lock down the cost and elapsed time at the start of the project.

    So what does this mean then for future data warehousing projects? Well, whilst in the short term in all likelihood things will carry on as normal, some approaches I'd like to bring into future warehousing projects would include;

    • Increasing use of real-time data, through a combination of synchronous and asynchronous change data capture, and fast refresh materialized views
    • More use in general of the built-in ETL features in Oracle 9i and 10g, even when they're not yet supported by OWB (though check the new New Features documents for later versions, as more and more Oracle 9i ETL functionality is being exposed through OWB)
    • Consideration for the use of generic, parameterized SQL scripts when this would be a more efficient way to build the ETL process
    • The use of agile methodologies where the customer is agreeable to an iterative development process, drawing out requirements through a series of releases and keeping project documentation focused on setting objectives and measuring process.

    Any thoughts from anyone on this approach? Is anyone using the Oracle 9i features for real time data warehousing, or is anyone using an agile methodology, such as eXtreme Programming or Scrum to run their warehousing project? Drop me a line if you've got any feedback.


 

 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational