Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 
 

OWB - Oracle Data Warehouse Builder articles

Don Burleson

 

Here are a great series of articles on using Oracle Warehouse Builder by Mark Rittman.

The Oracle warehouse builder OWB "Paris" Data Profiler

OWB warehouse builder Performance Tips Whitepaper

OWB warehouse builder Experts Feature

Implementing Real-Time Data Warehousing in Oracle10g

Add the rest here . . .

Posted by mark at 07:05 PM | Comments (0)

OWB : Leader In Gartner ETL 2004 Magic Quadrant

Yahoo Finance: "REDWOOD SHORES, Calif., Sept. 16 /PRNewswire-FirstCall/ -- Oracle (Nasdaq: ORCL - News) today announced its position in the leader quadrant in Gartner's ETL 2004 Magic Quadrant. Oracle is the only database vendor listed in the leader quadrant.

One of 18 vendors in the extract, transform and load (ETL) market evaluated by Gartner for this year's report, Oracle was recognized based on its vision and ability to execute. The affordability of Oracle® Warehouse Builder, and its alignment with the feature-rich Oracle Database 10g, has helped Oracle grow its ETL customer base to include more than 3,000 companies. According to Gartner, "leaders" are performing well today, have a clear vision of market direction and can execute on that opportunity.

"Many of the world's largest companies are using Oracle Warehouse Builder for their most pressing ETL needs," said Ray Roccaforte, vice president of Server Development, Business Intelligence, Oracle Corp. "Industry adoption of ETL will continue to be fueled by the widespread deployment of business intelligence solutions. Our ETL tool is an excellent option for companies implementing such solutions."

Oracle Delivers Powerful Data Warehousing and ETL Offerings

In May 2004, Gartner listed Oracle in the 2004 Data Warehouse DBMS Leaders' Quadrant for completeness of vision and ability to execute. Oracle's combined strengths in data warehousing and ETL provide customers with a comprehensive offering to efficiently and cost effectively develop, deploy and manage highly scalable decision support systems.

Business Intelligence market growth has fueled ETL tool adoption such as Oracle Warehouse Builder that reads data from a database, extracts the desired data, transforms it into the required form according to predefined rules, and then loads into a data warehouse for reporting and analytic requirements.

About Oracle Warehouse Builder 10g

Oracle Warehouse Builder enables the extraction, transformation, and loading of heterogeneous data to produce quality information in Oracle Database 10g. It is the only enterprise business intelligence design tool that manages the full life-cycle of data and metadata integration for Oracle Database 10g."

Read more here including a diagram of the Gartner ETL Magic Quadrant. This is obviously based on the new Paris release of OWB, which is a step change in terms of functionality and performance above the 3i and 9i-generations of OWB. What's probably particularly helped in this release is features such as the ability to load non-Oracle databases (revealed in Paul Narth's presentation), the data profiler and close integration with Oracle OLAP. Good to see our choice of ETL tool vindicated.

 

The Register Reviews OWB 'Paris'

Oracle rebuilds Warehouse: "Unsurprisingly, Oracle has announced a whole bunch of new products at Oracle OpenWorld this week. Some of these, like the new version of 11i, will get lots of media attention. One that will not is the new release of Oracle Warehouse Builder (OWB), code-named Paris. However, in its own right it is pretty significant and it merits some serious consideration.

Perhaps the first and most obvious new feature that will be available in the Paris release (which is scheduled to be available by the end of the year) is that you will be able to send data to third party databases as targets. You will, however, have to use Oracle Transparent Gateways to do this, which presumably means that you have to go via an Oracle database. However, as OWB uses the database as its engine this shouldn't be a hardship.

The second major new facility is data profiling. Just how clever this is I do not yet know (I am arranging a detailed briefing) but my guess would be that it is on a par with, or better than, the other ETL vendors that have introduced comparable capabilities over the last year, but not as advanced as the specialised product's from companies like Trillium Software.

There is some other nice stuff in OWB. For example, it understands OLAP and has support for such things as slowly changing dimensions; the cleansing and matching capabilities provided can be integrated with data mining algorithms for enrichment purposes; there is a mapping debugger for transformations; there are extensive new metadata features and expert capabilities; and so on and so forth."

A very positive review on OWB 'Paris' by The Register. Read the rest of the article here.

 

Introduction To OWB 10gR2 "Paris" Presentation Now Available

Oracle's Paul Narth gave a presentation earlier today at Openworld in London, on the new features in OWB 'Paris', otherwise know as OWB 10g Release 2. The presentation can be downloaded from the Openworld website, and comes with a number of screenshots showing the new interface in this next release of OWB.

One of the big new features in OWB 'Paris' is the data profiler, which allows you to profile your data within the OWB interface, discover the structural content, identify anomalies, and then report on data quality once your warehouse is populated.

The mapping editor sports a redesigned interface, has a mapping operators toolbar, properties bar, and allows you to pan and zoom around a multi-screen mapping canvas.

One of the big improvements in 'Paris' is the data object editor, which allows you to define cubes and dimensions in a logical form, and then deploy them either to relational tables with ROLAP metadata, or to analytic workspaces. Then, you can either then deploy and populate your cube in a relational star schema, or alternatively, through the new AWXML feature, directly create and populate analytic workspace objects without having first created equivalent ROLAP objects.

The impact analysis report has been much improved in Paris, as has change management in general, which now allows you to perform 'what-if' analysis, and helps you more easily determine the cost of a project change.

Another interesting feature is support for 'Experts', a scripting feature (that uses OMB*Plus) that allows you to define a workflow and guided series of steps for departmental 'power users' to work with OWB mappings. If you have a database business 'power user' who commonly extracts his own data from the database, manipulates it, and then reports against it, the idea is that you can set up an 'expert' in Paris that allows him to do all of this, guided by your script, using mappings in OWB.

You can find out more details, and see some more screenshots, on Paul's powerpoint presentation.

 

OWB Performance Tips

There was an interesting thread on the OTN OWB Forum the other week, where a poster asked if anyone had any tips on improving the performance of Oracle Warehouse Builder. Nikolai Rochnik from the OWB product team came back with some useful guidance:

"There can be three factors affecting the performance of OWB Client working with Design Repository. Here is what you can do about them now as well as what OWB development has done (a lot!) to improve this in the next release (November 2004). They are in the order of greatest impact.

1. Design Repository management.

Problem: CHANGELOG table in the OWB design repository is typically several orders of magnitude smaller in row count than pctree, intralink, or interlink, so the nested loop plan is best. The optimizer's row estimate in the absence of gathered statistics does not serve us very well here.

Solution in the next release:

* Automated gathering of schema statistics (and refresh statistics on a regular basis)
* Increased use of SQL bind variables
* Improved SQL hints
* Ensured definition of all necessary indexes
* Used COMPRESS for indexes

Workaround for now: Try analyzing OWB design repository with the following statements. This is only worthwhile if the size of the repository is large.

exec dbms_stats.gather_table_stats(null, 'PCTREE', null, DBMS_STATS.AUTO_SAMPLE_SIZE, false, 'FOR ALL COLUMNS SIZE AUTO');

exec dbms_stats.gather_table_stats(null, 'INTERLINK', null, DBMS_STATS.AUTO_SAMPLE_SIZE, false, 'FOR ALL COLUMNS SIZE AUTO');

exec dbms_stats.gather_table_stats(null, 'INTRALINK', null, DBMS_STATS.AUTO_SAMPLE_SIZE, false, 'FOR ALL COLUMNS SIZE AUTO');

exec dbms_stats.gather_table_stats(null, 'CHANGELOG', null, DBMS_STATS.AUTO_SAMPLE_SIZE, false, 'FOR ALL COLUMNS SIZE AUTO');

exec dbms_stats.gather_index_stats(null, 'IDX_PCTREE_PARENTID', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_PCTREE_CHILDID', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_PCTREE_PARENTID_CHILDID', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_PCTREE_CHILDID_PARENTID', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_INTERLINK_LINKFROM', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_INTERLINK_LINKTO', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'IDX_INTRALINK_LINKFROM', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

exec dbms_stats.gather_index_stats(null, 'CHANGELOG_PK', null, DBMS_STATS.AUTO_SAMPLE_SIZE);

2. Memory management.

Solution in the next release: A lot of improvements to prevent memory leaks and unnecessary flushing of needed objects.

Workaround for now: Try a higher Java Virtual Machine (JVM) limit for you OWB Client: Edit -Xmx value in [OWB Home]\owb\bin\win32\owbclient.bat from default 384M. This is only worthwhile if the client machine has enough real memory to support the VM sizes of all actively running applications and if Windows Task Manager shows the owbclient java session Memory Usage and VM Size values close to the Xmx value, then OWB starts spending a larger portion of its effort managing memory.

3. Network latency.

Problem: Network latency is why having OWB Client installed local to the Oracle database containing the design repository will always be faster, as you observe "I found OWB much faster when Oracle and OWB running on same machine having 400 MB only." Obviously this does not solve the problem in multiple developer environment, but it is not as big of the factors above. We are confident that OWB Client 'chatting' over the network is already optimized."

Adding to this, it's been my experience that the two issues that can have the biggest impact on OWB performance are first, a client machine with less than 512MB of memory, and secondly, contention on the disk unit that houses the OWB design repository. There really is no point using the OWB client on a machine with 256MB or less of memory (it'll run extremely slow, and often crash or hang and lose your uncommitted work), and more than once we've found that sluggish OWB performance was down to the disks holding our design repository database also being used, without us knowing, to hold the database for a large and active OLTP application.


Analytic Functions In OWB

Analytic Functions In Oracle Warehouse Builder: "Analytic Functions are an important feature of the Oracle database that allows the users to enhance SQL's analytical processing capabilities. These functions enable the user to calculate rankings and percentiles, moving window calculations, lag/lead analysis, top-bottom analysis, linear regression analytics and other similar calculation-intense data processing. This whitepaper describes how to use some of these analytical functions in the ETL (extraction, transformation and loading) processing implementation using Oracle Warehouse Builder."


Nicholas Goodman On Enhancing OWB Process Flows

 

Nicholas Goodman: "There are significant limitations to the implementation of Workflow processing in OWB 10g r1 and prior. There are methods available to developers that can add at least some amount of logic and conditionals such that it might be able to behave like an application. This section will enumerate how to use the facilities in OWB currently to build an IF/ELSE workflow pattern and a CASE workflow pattern."


Tom Burzinski On Data Warehousing Test Plans

"Companies invest countless hours and hundreds of thousands, sometimes millions, of dollars in data warehouse technology in pursuit of acing the big BI exam. Yet, few make a proportionate investment in testing their systems. The fact is no data warehouse project can succeed without complete and comprehensive testing." writes Tom Burzinski for DMReview. "Testing a data warehouse is more than reviewing the system's reports or other outputs ... The nature, scale and scope of a data warehouse project may vary depending on budget, available time, data quality, the data warehouse type and complexity of data transformation; however, there are four critical phases of testing that should always be included."


Comparing Informatica And OWB

A couple of projects that I've worked on recently had chosen Oracle Warehouse Builder over Informatica's Powercenter. It was interesting therefore to see a new article by Rajan Chandras that looked at the latest version of Powercenter, and to compare how Informatica's offering compared to OWB10g.

According to "(Re)Enter The Leader", Powercenter has a similar architecture to OWB:

"PowerCenter 7.0 (I'll call it PC7) is an ETL tool in the classic mold: data extract, transform, and load logic is constructed in a (mostly) sequential arrangement of graphical objects that flow from source to target. The objective is conceptually simple: Read data from source, transform it as needed, and write it to target. Reality is a little more complex, of course, and the construction of logic happens at three levels.

At the lowest level, individual graphical objects can be sources, targets, or transformations (sources and targets can be themselves considered as special types of transformations). A source transformation is used to read from a data source, and supply that data in sequential row-wise fashion for subsequent processing. At the other end of the logic stream, the target transformation receives data (again, in row-wise order) and writes it out to recipient data structures. The remaining intermediate transformations do just that — transform data values as required.

Sources, targets, and transformations are assembled in a daisy chain to form the next level of processing, which in PC7 is called the "mapping." A mapping is the end-to-end flow of logic, from one or more source transformation to one or more target transformations.

The execution of the mapping, called the "workflow" in PC7, provides the third level of the overall logic. The workflow provides for the execution of multiple mappings and dependencies among mappings. In standard programming terms, the transforms are the syntax and components of the program, the mapping is the overall program itself, and the workflow is the execution and production of one or more programs.

There are PC7 components that correspond to these levels. The PowerCenter Designer is the programming integrated development environment (IDE), where you "assemble" all the sources, targets, and transformations to create a mapping. The PowerCenter Workflow Manager is used to build a workflow around the mapping. The Workflow Monitor provides production support capabilities for the workflow. In addition, there are the PowerCenter Repository Manager and the Repository Server Manager, which provide administration capabilities for the PC7 Repository (more on the this a little later)."

All sounds familiar, with Powercenter's graphic objects being the source and target objects you drop on to a mapping canvas, together with the PL/SQL transformations, mappings being the same in the two tools, and Powercenter's workflow being the same as the workflow interface in OWB10g.

Reading the rest of the article, some interesting similarities and differences were:

  • Informatica Powerexchange seems similar to Oracle gateways, but with connectors to Peoplesoft and Siebel in addition to the SAP connectivity that both tools offer. Support is good in both tools for non-Oracle databases (DB2, SQL Server, Teradata, Sybase and so on)
  • One major difference is that OWB will only populate Oracle 8i, 9i or 10g data warehouses, whilst Informatica works against any major vendor (thanks Duncan for pointing that one out, one of those 'so obvious if you're used to OWB, I forgot to mention it' moments...)
  • Both tools allow you to built reusable components for transforming data, with Powercenter's being specific to the tool whilst Oracle's are regular PL/SQL functions and procedures.
  • Informatica, like Oracle, are making a big noise about grid computing. "PC7 offers server grid capabilities, too, by which PowerCenter can distribute loads across heterogeneous Unix, Windows, or Linux-based computing platforms. Although grid capabilities may seem exciting, I don't believe they match real-world need for grid computing yet, and I wouldn't recommend using them in place of other industry grid solutions."
  • The main architectural different between Powercenter and OWB is that Powercenter has it's own ETL engine, that sits on top of the source and target databases and does it's own data movement and transformation, whilst OWB uses SQL and the built-in ETL functions in 9i and 10g to move and transform data. Interestingly the article observes that the Informatica approach can be slower than the approach used with OWB. "Also, be aware that ETL tools are in general a slower (if more elegant) alternative to native SQL processing (such as Oracle PL*SQL or Microsoft Transact SQL)."
  • Powercenter's use of web services and distributed computing looks more developed than OWB's. "PowerCenter Web services are managed through the Web Services Hub, another component of the architecture, which supports standards such as Simple Object Access Protocol (SOAP), Web Services Description Language (WSDL), and Universal Description, Discovery, and Integration (UDDI). The architectural components can be collocated on a single server or spread across diverse servers, which allows solution parallelism, flexibility, and scalability."
  • Powercenter starts at around $200,000 (yikes!) although there is a "Flexible pricing model.". OWB is licensed as part of 10gDS which is around $5000 per named user, although you'll need the Enterprise Edition of the 8i, 9i or 10g database to provide the ETL functionality.

Historically, customers chose Informatica when they had lots of different database sources and targets, and the transformations between them were complex and interwoven. At one point, if you wanted a graphical interface for building transformations, tools such as Informatica, Constellar, Genio and so on were the only game in town, and you were looking at a minimum investment of between $50,000 and $100,000 to deploy these tools. The introduction of DTS by Microsoft and OWB by Oracle suddenly changed the market by providing much of the functionality of these tools as either a free database component or as a low-cost add-on. Vendors like Informatica have responded by introducing additional new features (such as web services integration, distributed loading and transformation, and so on) but it's now the case that, if you have a fairly straightforward need to graphically extract, transform and load data, you'll probably find the vast majority of your needs are now met by tools like OWB, at a far lower cost.

Interestingly, Informatica also have their own BI query tool called PowerAnalyzer. Sold separately from PowerCenter but designed to be used in tandem with their ETL tool, PowerAnalyzer is a web-based query tool that creates ROLAP queries against Oracle, IBM, Microsoft and Sybase datasources. Designed to be deployed using J2EE application servers, it also comes with an Excel interface and, as Seth Grimes reports for Intelligent Enterprise, a range of prebuilt analytic applications:

"The analytics dimension features conventional query and reporting and online analytic processing-style slice-and-dice analysis, and also optional packaged modules for CRM, financial, HR, supply chain, and cross-functional analysis. Its ability to visually model analytic workflows is one that's not yet common. It's intended to facilitate root-cause analysis, although this capability appears to be limited by the packaged analytic framework and other architectural strictures. For example, the highly promoted Excel support doesn't include database write-back. It also lets analysts embed business logic in private spreadsheets rather than in a repository, which can prove limiting when logic locked in a spreadsheet isn't visible to the workflow modeler and other non-Excel interface elements. Last, PowerAnalyzer delivers only the mainstream data-analysis functions that are found in competing BI tools, underscoring Informatica's view that integration and usability rather than analytic depth are the keys to market share.

PowerAnalyzer 4.0 is a credible entry in the larger BI market and will prove compelling for organizations that require an easy, nondisruptive path for integrating mainstream analytics into an existing computing environment."

That being said though, PowerAnalyzer is rarely (if ever) deployed by non-Informatica PowerCenter customers, limiting it to a fairly small audience who have a particular need for high-end ETL and specific industry analytic templates. Looks interesting, though, albeit with a hefty pricetag.


Version And Environment Management Using OWB10g

One of our customers recently asked me about the best way to manage multiple environments when using OWB. The customer had a development environment, together with test, QA and of course production, and wanted to know how to set up their design and runtime repositories. Do we create separate design and runtime repositories for dev, test, QA and prod, do we have just one for all environments, or is the answer somewhere in between?

The situation is complicated a bit by the fact that OWB's architecture has changed slightly over the previous few versions. Up until OWB 9.0.3, you had a design repository to which your OWB client connected, and one or more OWB runtimes, which you installed into every schema you loaded data in to. With OWB 9.0.4 this changed in that you now have one runtime repository per target database, and this runtime repository contains all the packages, and stores all the load results, for use by all the target schemas in a database. More details on the new runtime architecture can be found in Architecture Whitepaper for Warehouse Builder 10g and Server-side component (runtime) in Oracle Warehouse Builder 10g on OTN.

In practice, we've found that different clients use different approaches to solve this problem. Some of them have a single design repository, with it's contents effectively being the most current view of the OWB project definition. This definition is then used to initially populate the dev warehouse environment, and after that it's used to populate the test and QA environments. Once everyone's happy that this is ok, it's then used to load up the production warehouse environment.

The problem with this is that the model within the design repository doesn't usually reflect what's actually deployed in test, QA and prod. One way around this is to make backup copies of the design repository (using the .MDL file export routine) after each deployment, and then load these up whenever required to see how the design repository looked when you last deployed to an environment. Another way to deal with this is to have separate design repositories for dev, test, QA and prod, and keep these in sync by exporting and importing modules between the environments using the .MDL file export facility. Yet another way is to use the snapshot facility in recent versions to store copies of the repository for later reference.

You'll be interested therefore to know that Oracle have recently posted a series of OWB casestudies, one of which is specifically about version and environment management using OWB. Entitled "Case Study 9: How Do I Manage Multiple Versions of my BI Implementation?" it discusses two scenarios where OWB is used to manage multiple target environments, such as dev, test, QA and prod. According to the article introduction:

"After a period of development and testing, one company implements its BI system in production. The Production version of the system typically changes as new features are incrementally implemented from Development, and as Production bugs are discovered and fixed. At the same time, the Development version of the system continues to evolve with new functionality. This company now has several individually changing versions of the system and faces a challenge familiar to all companies, regardless of how many BI environments they maintain: how to best manage changes in different versions of the system.

One version of this common scenario is depicted in Figure 9–1, where the Development environment is consistently more advanced than the functionality in Production, and QA is somewhere between the two extremes. Development changes are incrementally propagated to QA and subsequently to Production. At the same time, Production has its own cycle of changes, denoted in Figure 9–1 as the shadow environment labeled ’Production1’, and used for controlled problem solving. ’Production’ and ’Production1’ are at the same stage of development, and serve to illustrate the errors that occur in Production, which are fixed and implemented directly in Production, but that must somehow be merged with Development. Other companies may have fewer or more differing environments for their BI systems, but the same maintenance challenges still apply.

Companies may need multiple environments for their BI systems, as illustrated in Figure 9–1, because they typically implement incremental changes to the system. However, some companies implement only whole projects in Production. Figure 9–1 does not apply to these companies. In this case study, a company finds a problem with a mapping in Production. The first step is to compare the Production version of the mapping with the Development version of the mapping, as illustrated in Figure 9–2. If the mapping is identical in both environments, the solution is simple: make the changes in either environment and copy the mapping to override the older version. If the mapping in Production differs from its Development version, then the approach depends on whether the BI system is in its initial or mature phase."

Sounds just like what we are looking for. So what does it recommend?

Firstly, it splits OWB projects into two types: those that are in development or which have just been deployed (referred to as 'initial phase') or those that have been up and running for some time (referred to as 'mature phase').

For initial phase projects, the recommendation is to have a separate design repository, and runtime repository, for each environment. This is because, during this early phase, changes are likely to be made to all of the environments (prod, test, dev and so on) all at the same time, and it's more convenient to track these changes using separate repositories. To eventually keep them all in sync, the recommendation is to use the .MDL export facility to export and import individual mappings, objects, modules and so on.

For mature phase projects, the recommendation is to still have separate runtime repositories for each environment, but in this case however just one design repository is used, with the OWB snapshot facility used to store snapshots of what the dev, prod, test and QA are looking like. For example, at the start of a system change, your design repository should reflect what's in production. You'd then take a snapshot of this, and save it as a copy of production. Then, you'd make whatever changes you need to this version, referring to this now as development. When you then go to deploy it to test, you take a snapshot at that point and save it as a copy of test. In this manner, you keep copies of the design of each environment, but avoid the overhead of managing several design repository installations, using the saved snapshots instead.

The document details the steps you need to go through to migrate changes from one environment to another using the two approaches, and has some useful diagrams you might want to 'appropriate' for your project documentation. Take a look if you get a chance, and while you're there, you might want to download the complete set of ten case studies, which covers areas such as transferring remote files, handling flat files with varying names, and rolling out changes to the target schema.

 

In-Depth Introduction To Oracle Warehouse Builder 10g

"Whilst many Oracle DBAs build and maintain their data warehouses using nothing more than UNIX, SQL, and PL/SQL scripts, many are now starting to use graphical tools such as Oracle Warehouse Builder 10g (OWB) to help with development. What do tools such as OWB10g offer the DBA, and how can they help with the warehouse design and build?" Find out in my latest article for DBAZine entitled "An Introduction To Oracle Warehouse Builder 10g"


Householding Warehouse Data Using OWB10g

"Householding is the process of identifying related information from multiple sources and storing it in the data warehouse such that all related information about a given party is accessible when looking at any information about the identified party. A key point is that householding does not simply refer to the matching up of data about a family. The need for householding began with a requirement for clear and concise marketing data. As such, the goal was to identify all individuals that reside in the same physical residence, hence the name householding. A marketer could now send one piece of mail to the household, which saved the marketer considerable dollars in postage and mail preparation costs. Today, householding refers to any grouping of information about a given person, family, household or company. The need today for accurate identification of the same person, family, household or company across all systems is significantly greater than when householding was first identified as a required business rule for data migration. Today's marketing programs target people at many levels - as an individual, as a head of household or as a member of a given group of people (i.e., club, residence and/or business)." writes Greg Mancuso and Al Moreno in "The Common Problem – Working with Merge/Purge and Household" for DMReview.

As an aside, if you're using OWB, merge/purge and householding are fairly painless processes due to the inclusion of the name and address cleansing, and match/merge transformations as of OWB 9.2.

The name and address cleansing functions were originally derived from the Pure*Integrate technology Oracle licensed from Carleton, and work with name and address files (which you have to purchase separately) from companies such as Trillium.

The Match/Merge transformation that allows you to first match, and then merge, warehouse data. It allows you to match records using business rules, and consolidate multiple records into a single record.

Together with the name and address cleansing transformation, match/merge can be used to combine multiple customer records into a household record, as described in the preceding article. What particularly makes the match/merge transformation useful is the ability to do 'fuzzy matching' where exact matches between records can't be found but where similar records need to be combined (where, for example, an address matches but one or more postcode records can't be found). For more details, take a look at:

·         Match/Merge Operator

·         Name and Address Operator

from the OWB 10g Documentation.


Loading XML Documents Using OWB 10g

Recent versions of Oracle Warehouse Builder come with the ability to load XML documents into the database. This article explains, in simple steps, how to load a simple XML document into your data warehouse using OWB, and refers to OWB 10.1 although it should work with any version from 9.0.4 upwards. It's based on the examples in the Using The XML Toolkit chapter in the OWB 10g Users Guide available on OTN.

The examples are based around a scenario where an 'orders' XML document is loaded into a 'purchase orders' table. To start off, and assuming OWB 10.1 is installed together with appropriate runtime, repository and target schemas, the first thing we want to do is create a table within our target schema to receive our XML data. Create the table using this script:

create table Purchase_Orders (
    id varchar2(10) not null,
    order_date date not null,
    shipto_name varchar2(60) not null,
    shipto_street varchar2(80) not null,
    shipto_city varchar2(30) not null,
    shipto_state varchar2(2) not null,
    shipto_zip varchar2(9));

Next, create a directory on the Oracle server to hold our XML documents. In my case, I created a directory called c:\xml_test on my Windows 2000 server, and then made sure that the directory was listed in the UTL_FILE_DIR section of the init.ora file. You'll need to bounce the database for this setting to become active.

Now, using notepad or any other text editing program, create a file called orders.xml and type in the following XML:

<ROWSET>
    <ROW>
        <ID>100</ID>
        <ORDER_DATE>2000.12.20</ORDER_DATE>
        <SHIPTO_NAME>Adrian Howard</SHIPTO_NAME>
        <SHIPTO_STREET>500 Marine World Parkway</SHIPTO_STREET>
        <SHIPTO_CITY>Redwood City</SHIPTO_CITY>
        <SHIPTO_STATE>CA</SHIPTO_STATE>
        <SHIPTO_ZIP>94065</SHIPTO_ZIP>
      </ROW> 
</ROWSET>

Save this file to your directory on the server. This is the document you'll load in using OWB. Now, start up OWB and either create a new project, or open up an existing one. 

Now, an important thing to bear in mind is that OWB loads XML documents into the database using some new PL/SQL transformations, rather than XML being a new source module type or another type of flat file you can bring into the project. These new transformations ar WB_XML_LOAD, a procedure that just loads your documents, and WB_XML_LOAD_F, a function that loads documents and returns the number of documents loaded.

Therefore, to import an XML document into your warehouse, you have to go by a kind of 'roundabout' route where you create either a pre or post-mapping trigger in a mapping, and use the trigger to call either WB_XML_LOAD or WB_XML_LOAD_F. To illustrate this, 

  1. Create a new mapping and call it XML_LOAD. Drop onto the mapping a pre-mapping trigger, and select WB_XML_LOAD from the list of transformations.
  2. Drop a constant on to the mapping, add an attribute to the constant, call it CONTROL_FILE and make it VARCHAR2(250)
  3. Go into the properties for this attribute, click on the '...' button in the expression field, and enter the following into the text box:

    '<OWBXMLRuntime>'|| 
        '<XMLSource>'|| '<file>c:\xml_test\orders.xml</file>'|| '</XMLSource>'|| 
        '<targets>'|| 
            '<target dateFormat="yyyy.MM.dd">Purchase_Orders</target>'||
         '</targets>'||
     '</OWBXMLRuntime>'
  4. Click on the 'validate' button and check that there are no errors. This is your XML control file that tells OWB where the XML file source is (c:\xml_test\orders.xml) and what target to load the XML elements in to (purchase_orders). The dateFormat bit tells Oracle how to parse the date within the XML file and convert it to Oracle format.
  5. Drag the constant attribute across to the control_info attribute in your pre-mapping trigger. Your mapping should now look something like:

If you try and validate the mapping now, you'll get the error message 'VLD-1110: Code cannot be generated for mappings that contain only constants or map input/output parameters as source and targets. This is because we can't run a mapping that consists of just a pre or post-mapping trigger; to get round this, either deploy this trigger to an existing mapping, or add a dummy table update to the mapping that satisfies this requirement. For example, I created a table called DUMMY within my target schema:

CREATE TABLE DUMMY (DUMMY_ROW VARCHAR2 (1) ) ;

dropped it on to the canvas twice (once as DUMMY, once as DUMMY_1), and then connected the DUMMY_ROW column from one to the other to create a table INSERT. 

6.      Validate the transformation now, and it should come back with no errors. Now, start up the Deployment Manager, connect to your runtime repository, and then navigate down to your XML_LOAD transformation. Right-click on the transformation, choose SET ACTION > CREATE, and then choose FILE > GENERATE/DEPLOY. Your screen should look something like:

  1. Click on 'Deploy', and the results should read 'Deploy Status : Success'. Then, after clicking 'OK', go back to the Deployment Manager screen, right-click on the XML_LOAD mapping, and select 'Execute...' . Accept the defaults, let the mapping run, and your screen should look like:

8.      If you then start up SQL*Plus and select from the PURCHASE_ORDERS table, you should find your XML document loaded in.

What we've gone through here is a fairly simple example where the XML document's elements match exactly the table columns that we're going to load in to, and we're only loading only one document into only one table. Often, you might need to transform the XML document before loading it into the database, and to do this you would apply an XLST stylesheet to match up the elements in the XML document with the target table's columns. On other occasions, you might want to load the XML document into multiple tables, putting part of the data in one table and the rest in others. All of these situations are discussed in the Using The XML Toolkit chapter in the OWB 10g Users Guide which can be accessed from the OTN website.


Scripting And Refactoring Using OWB OMB*Plus

I've been doing some research for a paper on OWB10g, and because of this I've had a chance to look at the scripting support that was originally introduced with OWB9.0.4. If, like me, you didn't really pay much attention to this new feature when it first came out, I think you'll be pleasantly surprised about what is actually a very powerful new capability.

One of the criticisms leveled at graphical ETL tools such as OWB is that, whilst they might make a novice Oracle developer more productive, they actually slow down more experienced developers who are more used to working with the command line. Whilst tools such as OWB are great for novice users who need visual prompts when putting together objects and mappings in a warehouse, DBAs and Unix sysadmins are more used to defining objects using scripts and often write programs to generate batches of objects to make the process more efficient. A good example of this is Arup Nanda's "A Multi-Source, Time Variant Data Warehouse Case Study" which describes a data warehouse that had 7,200 tablespaces, 10 tables, 360 partitions and 100 indexes, and loaded data from source to target using a number of very similar CREATE TABLE AS SELECT statements that were programmatically generated using a Unix shell script. Clearly, in this example, generating all these objects and mappings by hand using the OWB GUI would be far less efficient than writing a series of reusable scripts.

Another problem that faces developers when working with a GUI application like OWB is there's no way to record and replay a series of development steps so that you can apply them in one go to your project. For example, you might want to add a number of columns (modified date, modified by etc) to all your tables, and also make these modifications to your test and production environments. Using the OWB GUI tool you'd have to go into each project and make the changes manually, which takes time and opens up the possibility of introducing user error. Also, not having an automation process precludes the ability to carry out agile or test driven development, where changes can be made to requirements at any stage of the project as long as these changes are automatically refactored  into the development model .

To address these issues, starting with OWB 9.0.4 a scripting capability was introduced which allows us to create, alter, generate or drop both warehouse objects and mappings using the command line. OWB scripting consists of two components:

  • OMB Scripting, a programming language based on Tcl that allows us to manipulate objects in the OWB repository, and
  • OMB*Plus, a 'SQL*Plus' lookalike used for interactively running OMB Scripting commands

OMB stands for 'Oracle MetaBase', which is the product name for the repository that stores the business logic and object definitions for our OWB project. OMB Scripting acts as an interface to the OWB Java API, translating scripting commands into API calls, allowing us to work with the OWB repository without using the GUI interface. Like SQL, OMB commands are divided into MDL (metadata definition language) commands which allow us to create metadata objects, and MML (metadata manipulation language) commands which allow us to change the contents of metadata objects.

For the purposes of the rest of the article, I'll be using OWB10g on Windows 2000, which can be downloaded from OTN if you've not got this current version. To start up the OWB*Plus environment, using the Start Menu navigate to your OWB program group, go to the Warehouse Builder subgroup and select 'OWB OMB Plus'. Then, to connect to your repository, use the OMBCONNECT command (changing the connection details as required)

OMBCONNECT username/password@hostname:port:service_name

Then, to change context to a project, use the OMBCC command:

OMBCC 'MY_PROJECT'

Note the use of CAPITALS with the OMB commands - this is mandatory and you'll get errors if you use lower case. Now, you can create a new module using the OMBCREATE MODULE command:

OMBCREATE ORACLE_MODULE 'STAGING_TABLES' \
    SET PROPERTIES (DESCRIPTION) VALUES ('Warehouse staging tables')

Note the '\' which indicates that a command carries on to the next line, and the lack of ';' compared to SQL*Plus. Now, change context to the new module and create a table in the repository using the OMBCREATE TABLE command

OMBCC 'STAGING_TABLES'

OMBCREATE TABLE 'STG_CUSTOMERS' \
  SET PROPERTIES (DESCRIPTION, BUSINESS_NAME) VALUES ('Customer staging table','Staging Customers') \
  ADD COLUMN 'CUST_ID' \
  SET PROPERTIES (DATATYPE, PRECISION, SCALE, NOT_NULL) VALUES ('NUMBER',7,0,1) \
  ADD COLUMN 'CUST_NAME' \

You can then add further columns to the table using the OMBALTER command:

OMBALTER TABLE 'STG_CUSTOMERS' \
  ADD COLUMN 'CUST_FORENAME' \
  SET PROPERTIES (DATATYPE, LENGTH) VALUES ('VARCHAR2',30)

Finally, commit your work with the OMBCOMMIT command. 

If you open up the OWB GUI, your project should look something like:

with the STAGING_TABLES module being the module you've just created using OMB*Plus.

The OMB scripting feature is based on Tcl with extensions for dealing with the OWB repository. Therefore, you can use all of the Tcl programming constructs to programatically create or maintain objects in the OWB repository. For example, you might wish to loop through all of the tables within a module and add a MODIFIED DATE column, which you could do as follows;

foreach tableName [OMBLIST TABLES] {

OMBALTER TABLE '$tableName' \
ADD COLUMN 'MODIFIED DATE' \
SET PROPERTIES (DATATYPE) \
VALUES ('DATE') \

puts "Table $tableName altered."
}

You can also build and maintain transformations using OMB scripting. For example, we we might have COUNTRIES and REGIONS tables which we want to join and load into a GEOGRAPHIES table. 

OMBCREATE MAPPING 'LOAD_COUNTRIES_MAP' \
SET PROPERTIES (BUSINESS_NAME, DESCRIPTION) \
VALUES ('Load Countries from HR Mapping', 'Mapping to load the countries table from HR schema') \
ADD TABLE OPERATOR 'COUNTRIES' BOUND TO TABLE '../HR_SRC/COUNTRIES'\
ADD TABLE OPERATOR 'REGIONS' BOUND TO TABLE '../HR_SRC/REGIONS'\
ADD TABLE OPERATOR 'GEOGRAPHIES' BOUND TO TABLE 'GEOGRAPHIES' \
ADD JOINER OPERATOR 'JOIN' SET PROPERTIES (JOIN_CONDITION) \
VALUES ('INGRP1.REGION_ID = INGRP2.REGION_ID') \
ADD CONNECTION FROM ATTRIBUTE 'COUNTRY_ID' OF GROUP 'INOUTGRP1' OF OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'COUNTRY_NAME' OF GROUP 'INOUTGRP1' OF OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_ID' OF GROUP 'INOUTGRP1' OF OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_ID' OF GROUP 'INOUTGRP1' OF OPERATOR 'REGIONS' TO \
GROUP 'INGRP2' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_NAME' OF GROUP 'INOUTGRP1' OF OPERATOR 'REGIONS' TO \
GROUP 'INGRP2' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM GROUP 'OUTGRP1' OF OPERATOR 'JOIN' TO \
GROUP 'INOUTGRP1' OF OPERATOR 'GEOGRAPHIES' BY NAME

Then, we can generate and deploy mappings using OMB:.

OMBCREATE TRANSIENT DEPLOYMENT_ACTION_PLAN \
'DEPLOY_PLAN' ADD ACTION 'TABLE_DEPLOY' \
SET PROPERTIES (OPERATION) \
VALUES ('CREATE') SET REFERENCE TABLE \
/MY_PROJECT/STAGING_TABLES/STG_CUSTOMERS'

OMBDEPLOY DEPLOYMENT_ACTION_PLAN 'DEPLOY_PLAN'

OMB scripting can carry out maintenance tasks such as installing runtime schemas, creating deployment locations, create runtime repository connections, and so on.

OMBCREATE RUNTIME_REPOSITORY_CONNECTION \ 'NEW_RR_CONNECTION' \
SET PROPERTIES (DESCRIPTION, BUSINESS_NAME, \
HOST, PORT, SERVICE_NAME, \
CONNECT_AS_USER, RUNTIME_REPOSITORY_OWNER) \
VALUES ('Runtime Repository', \
'MY_RUNTIME', 'acgdev03', \
1521, 'plusdev1', 'OWBRU', 'OWBRR')

Finally, OMB scripts can be saved to the filesystem, loaded up whilst in the OMB*Plus environment or executed directly from DOS or Unix by supplying the script name after the OMB*Plus executable.

ombplus.bat c:\owb\refactoring_script.tcl

Scripting within Oracle Warehouse Builder has three main advantages. Firstly, it allows us to bulk create objects and mappings, and make mass changes to them without having to go into the OWB GUI and make each change manually. This addresses the main reason that Arup Nanda used shell scripts in his technical paper, and gives power back to the experienced developer who wants to generate scripts according to a template.

Secondly, by allowing us to script changes to objects, and more importantly, be able to repeat these changes without fear of operator error, it allows us to take a more agile approach to development and we can refactor development changes reliably and repeatedly.

Lastly, and this is more of a theoretical rather than practical benefit at the moment, it allows us the possibility of using the command line to build objects and mappings, but in an abstract form that isn't directly tied to the particular implementation details for our particular Oracle platform. For instance, our developer might put together an OMB script to merge a staging table into a dimension table, which in Oracle 8i would have been implemented by OWB as two separate INSERT and UPDATE commands. Once OWB was pointed at an Oracle 9i database however, it would in this case implement the merge using the MERGE command. The advantage of using OWB is that the logical transformation (merging into a dimension table) is kept constant, whilst the actual implementation varies according to the platform. What OMB brings to this is that you can write scripts that are now at a more abstract level, building objects and mappings in a logical way whilst OWB deals with the implementation. Before OMB was around, the only way you could build things in this abstract way was to use the OWB GUI, but now you can actually write scripts that have a degree of abstraction from the actual implementation details for your particular platform.

More details on OMB scripting can be found at:


 

Publishing OWB Transformations As Web Services

There's an interesting technical note available on OTN on invoking OWB processes or mappings as web services.

Traditionally, OWB mappings and processes are run using the OWB deployment manager, SQL*Plus (using the Execution Template) or through Oracle Enterprise Manager. However, from OWB 9.0.4 onwards, mappings and process flows can be called by any application that can communicate via a web service interface.

The way this is accomplished is to create a PL/SQL procedure that accepts a set of input parameters, calls the OWB mapping or process, and outputs the results as output parameters. This procedure is then published as a PL/SQL web service using Oracle JDeveloper, and the web service is then deployed to a J2EE application server such as Oracle OC4J or Apache Tomcat.

The article describes a scenario where an OWB mapping is put together that takes as it's input a customer address, cleanses the address using the name and address transformation, then outputs the cleaned up address as output parameters. This mapping is then published as a web service and can be called by any J2EE or .NET application, allowing you to deploy and use OWB's transformation capabilities outside of the normal OWB environment. The article comes with a sample .mdl export and PL/SQL package files and full instructions on how to set up your own OWB web service. Well worth a look.


Streamlining Oracle 9i ETL With Pipelined Table Functions

I've recently been reading Bert Scalzo's "Oracle DBAs Guide To Data Warehousing And Star Schemas", and he's got an interesting take on the ETL process that I thought it worth looking at in more detail.

Background

Bert was the DBA behind the 7-Eleven data warehouse, one of the first multi-terabyte dimensional data warehouses built on the Oracle database. The book is a bit different to other Oracle data warehousing books in that it concentrates on the implementation details of building a warehouse, and as it's not by an Oracle member of staff he tends to have forthright opinions on what elements of the technology stack work, and what don't. A particular example of this is where he looks at the various releases of the Oracle database (7, 8i and 9i) and explains in some detail why 7 and 8i don't really cut the mustard as large-scale data warehousing platforms. In another instance, he looks at the evolution of the star transformation, explains why the original implementations (which were first implemented using nested loops, and then after by cartesian joins) weren't up to scratch but then absolutely insists that you use the version that comes with Oracle 9i. As the author's not afraid to offer an opinion it's an entertaining read, and a bit different to your run of the mill 'new features' book.

One chapter that particularly interested me was his approach to loading the data warehouse. Right from the outset, Bert eschews the use of ETL tools, preferring instead to code the load routines manually. In his opinion, the trouble with ETL tools is:

  • The code they produce is not optimally efficient, and often uses slow-running Java to host and run the processes
  • The cost of the tools is often more than the hours they replace
  • They allow the project to be staffed by people with little Oracle or data warehousing experience

This is a theme echoed in an Oracleworld paper by Arup Nanda that I looked at last year, which again rejected the use of ETL tools, but this time because the nature of the ETL (lots of small transformations to a large number of nearly identical tables) was better served by running a number of Unix shell scripts that in turn generated SQL scripts to a pattern. I'll come back to the ETL vs. hand coding debate later on in the article.

Anyway, the ETL process that the author wanted to carry out involved the following steps

  1. load data from a large flat file
  2. apply transformations to the data
  3. update/insert the data into a dimension table

What made this interesting was the fact that the file was very, very big, and the ETL window was small. Therefore, any ETL process had to

  • Minimise inter-process wait states
  • Maximise total concurrent CPU usage

To illustrate, take a situation where your ETL process starts with a file of 1000 records, and has two stages, each of which take an hour to run. The least efficient way to process this file is to let the first process handle all 1000 records (taking an hour to do so), then pass the data on to the second process, which will also take an hour to run. Your total run time is two hours. What the author is suggesting with 'minimise inter-process wait states' is that it would be good if your first process could handle for example 10 records, then pass these on to the second process so that it could start it's work. The first process can then get on with the next 10 records, and the pass them on when it's done. Your total run time would then be just a bit over one hour, cutting down dramatically the time to perform the load.

The 'maximise total concurrent CPU usage' requirement aims to take advantage of the ability for multi-processor servers to carry out more than one task at once. Oracle is very good at this, and has had this feature (known as parallelism) for many years. However, it only really comes in to play if you use DML statements (insert, update etc) to do your load; if you write your load in PL/SQL using cursors and loops to process your data row-by-row, you're not going to be able to take advantage of parallel DML. What these two requirements mean, therefore, is that your ideal ETL process is going to use parallel processing, and somehow reduce to the minimum the wait between different ETL stages.

Introducing External Tables

One of the key new features in Oracle 9i for business intelligence and data warehousing was the inclusion of a number of ETL features within the database, the point of which was to remove the requirement to purchase a separate ETL engine such as Informatica, Genio or Datastage. The approach proposed by the author uses these new features to shortcut the warehouse load process.

If you remember from earlier in the article, our load process had to carry out three steps:

  1. load data from a large flat file
  2. apply transformations to the data
  3. update/insert the data into a dimension table

Traditionally, you would accomplish this first step by using SQL*Loader. SQL*Loader allows you to import a flat file into the database, carry out basic transformations on it (change case, reformat, reject invalid rows), and if it helps speed up the process, load it in parallel using direct path. As you can only insert data using SQL*Loader, and the degree to which transformations can take place is limited, you'd generally load the data into a staging table and then process it in a second step.

Oracle 9i however introduced a new feature called External Tables, which allows you to define a database table over a flat file. Taking as our example a comma-separated contracts file that is used to load data into a contracts dimension, the code to create an external table would be:

create directory inp_dir as '/home/oracle/input_files';

create table contracts_file (contract_id number, descrip varchar2(50), 
    init_val_loc_curr number, init_val_adj_amt number)
organization external (
type oracle_loader
default directory inp_dir
access parameters (
fields terminated by ','
)
location ('contracts_file.csv')
)
parallel 10;

The External Table feature allows you to embed the SQL*Loader control file into the table DLL script, and then allows you to run SELECT statements against the flat file. You can include the external table in joins, subqueries and so on, but you can't use the external table to delete or update data in the flat file. External tables came in with Oracle 9i, and they're available as data sources in recent versions of Oracle Warehouse Builder.

(Pipelined) Table Functions

Traditionally, once we'd loaded the flat file data into a staging table via a SQL*Loader script, the next stage would be to transform the data, either through a series of DML commands or by using a PL/SQL package. The transformation would have one or more stages, and the stages would run sequentially, one after the other.

Oracle 9i now gives us the opportunity to improve on this situation, by allowing us to create our transformations as functions - functions that take our external table as an input, and output the transformed data as rows as columns that can be used to update a table. These functions, known as Table Functions, can also be 'pipelined', which makes Oracle start returning rows from the function before processing has finished. By pipelining table functions, you can string a number of them together and have them 'pass off' rows to the next process as soon as a batch of rows are transformed, meeting our requirement to minimize our inter-process wait states. So how does this work in practice?

    create or replace type rec_contract_type is object  
    (
    contract_id number,
    descrip varchar2(50),
    init_val_loc_curr number,
    init_val_adj_amt number
    );
    /

    create table contracts_dim
    (
    contract_id number,
    descrip varchar2(50),
    init_val_loc_curr number,
    init_val_adj_amt number
    );

    create or replace type table_contract_type is table of rec_contract_type;
    /

    CREATE OR REPLACE PACKAGE transform_pkg
    AS
    TYPE contract_cur IS REF CURSOR RETURN contracts_file%ROWTYPE;
    FUNCTION go (contract_arg IN contract_cur)
    RETURN table_contract_type
    PARALLEL_ENABLE (PARTITION contract_arg BY ANY)
    PIPELINED;
    END;
    /

    CREATE OR REPLACE PACKAGE BODY transform_pkg AS
    FUNCTION go (contract_arg IN contract_cur)
    RETURN table_contract_type
    PARALLEL_ENABLE (PARTITION contract_arg BY ANY)
    PIPELINED
    IS
        contract_rec_in rec_contract_type := rec_contract_type(null,null,null,null);
        contract_rec_out rec_contract_type := rec_contract_type(null,null,null,null);
    BEGIN
        LOOP
            FETCH contract_arg INTO contract_rec_in.contract_id,contract_rec_in.descrip, 
                    contract_rec_in.init_val_loc_curr, contract_rec_in.init_val_adj_amt;
            exit when contract_arg%NOTFOUND;
                contract_rec_out.contract_id := contract_rec_in.contract_id;
                contract_rec_out.descrip := contract_rec_in.descrip;
                contract_rec_out.init_val_loc_curr := contract_rec_in.init_val_loc_curr;
                contract_rec_out.init_val_adj_amt := 
                    contract_rec_in.init_val_loc_curr*1.6;
                PIPE ROW (contract_rec_out);
            end loop;
        return;
        end;
    end;
    /

As the example shows, the function takes as it's argument a ref cursor which points to the data being fed into the transformation, and then sends as it's output another ref cursor that points to the set of transformed rows as processed by the function. The PIPELINED clause tells the function to return rows as soon as a PIPE ROW() command is used, which happens every time a row in the cursor is processed by the function. The function therefore takes as its input a pointer to the results of a SELECT statement, and spits out at the other end a pointer to the transformed version of these rows and columns.

The Merge Command

What we want to do now, is take the transformed data and use it to update our dimension table. In the past, we'd have accomplished this by a combination of INSERT and UPDATE statements, but with Oracle 9i we've now got the MERGE command which does this in one step. What's more, we can wrap up the whole transformation process into this one MERGE command, referencing the external table and the table function in the one command as the source for the MERGEd data.

        alter session enable parallel dml;    

    MERGE /*+ parallel(contract_dim,10) append */
        INTO contracts_dim d
    USING (select * from table(transform_pkg.go(CURSOR(select /*+ parallel(contracts_file,10) full(contracts_file) */ * from contracts_file)))) f
        ON (d.contract_id = f.contract_id)
    WHEN MATCHED THEN
        update set descrip = f.descrip,
        init_val_loc_curr = f.init_val_loc_curr,
        init_val_adj_amt = f.init_val_adj_amt
    WHEN NOT MATCHED THEN
        insert values ( f.contract_id,
        f.descrip,
        f.init_val_loc_curr,
        f.init_val_adj_amt)
    /

So there we have it - our complex ETL function all contained within a single MERGE statement. No separate SQL*Loader phase, no staging tables, and all piped through and loaded in parallel.

Oracle Warehouse Builder

The good news for OWB users is that recent releases of OWB (9.0.4 upwards) support table functions as transformations in a mapping process.

OWB Table Function

The current version of OWB (10g Release 1) supports table functions that you've written and then import into the OWB repository, whilst the next release, codenamed 'Paris', will allow you to build the table function graphically within OWB. This support for table functions, together with the ability to fine tune the load parameters and take advantage of 9i ETL functions like MERGE and pipelining actually invalidate Bert's argument that ETL tools don't produce efficient, optimal code, although to be honest all of this is fairly new functionality and the book is a couple of years old now.

Conclusion

With the new ETL functionality in Oracle 9i and 10g, plus the support for this functionality in OWB 9i and 10g, there's really no need now to split every transformation into a series of staging tables and separate transformations. If you start to use features such as table functions, external tables, parallelism and pipelining, you can start to take advantage of the features Oracle have built into the database engine specifically for loading data warehouses.

If you want to read more about this subject, you can check out Bert Scalzo's excellent book "Oracle DBAs Guide To Data Warehousing And Star Schemas" or check out the following articles;

·         OTN article on Table Functions

·         OTN OWB Forum thread on Table Functions

·         "Loading and transform external data into Oracle 9i"

·         "CASTing About For a Solution: Using CAST and Table Functions in PL/SQL"

·         "Using Pipelined Table Functions (Oracle 9i)"


Web Services, BPEL And Oracle Data Warehousing

One of the news items I picked up on earlier this week was around the release of Oracle's Business Activity Monitoring ('BAM') feature for Application Server 10g. From the amount of articles written about BAM this is obviously a key product for Oracle, so I decided to take a further look.

According to Ephraim Schwartz and James Niccolai for Infoworld,

"In response to market demand for more real-time business information, Oracle announced today it will offer a BAM (business activity monitoring) tool, Business Activity Manager, as part of an upgrade to its application server software due out in the middle of the year.

The BAM component will run on the existing Oracle10g Application server, released at the start of the year, and on Application Server 10.1.2, due out this summer. It will not run on earlier releases such as Application Server 9i, according to Thomas Kurian, senior vice president of Oracle Server Technologies.

Included in Business Activity Manager are the ability to detect, correlate, and analyze separate events such as RFID scans, bar code readings, or events out of an enterprise application or a supply-chain process event.

Kurian said the the BAM application is unique in that it can correlate both traditional historical events out of a data warehouse with real-time events."

So Business Activity Monitoring will be an add-on feature for Application Server 10g, and looks like it uses the B2B-connectivity that comes with AS10g. This would seem logical as a big part of AS10g's appeal as a middleware platform is through it's ability to get different ERP and CRM systems talking together, using technologies such as XML, SOAP, web services and so on. Note the RFID scans mention (current flavour of the month with Oracle) and the nod towards integrating current activity with historical data warehouses, which concurs with the current Oracle vision of integrated, real-time data warehousing.

Martin Lamonica for CNET Asia notes that the introduction of Business Activity Monitoring is part of a general Oracle strategy to make money out of integrating software:

"The introduction of the new data-gathering software is part of Oracle's strategy to bring in more revenue from the business of stitching together incompatible software. Oracle in January introduced Customer Data Hub, integration software sold as a stand-alone product that uses XML (Extensible Markup Language)-based protocols called Web services to centralize information about customers. The company earlier this year introduced an update to its suite of server-based transactional software, called Oracle 10g Application Server.

Making disparate systems share information cost-effectively is a perennial problem for companies and represents billions of dollars in technology spending. Kurian said many companies have already automated several individual business processes, such as manufacturing or human resources, by investing in packaged applications or writing custom applications. Now, companies are looking to automate "cross-functional business processes" that span a number of computer systems, he said."

Customer Data Hub was originally mentioned by Oracle back in February this year and is a way of bringing together data on all trading entities into a single data store - which is of course an Oracle database. Customer Data Hub raised a few eyebrows at the time as it was seen as a bit of an about-face by Oracle, who up until then had advocated 'Oracle-only' solutions as the only way to deliver a successful ERP project. When you looked at Customer Data Hub in more detail though, what you actually saw was infact quite a consistent approach - keep your third-party ERP applications (Siebel, Peoplesoft and so on) but use the Oracle database (via the Customer Data Hub) as your single store of trading entity data. Business Activity Monitoring looks to be building on this approach and directly taking on integration specialists such as WebMethods, Tibco and SeeBeyond, in the process positioning Application Server 10g as the centre of your ERP framework. In addition, if you read the CNET Asia article you'll also see references to something called Business Process Execution Language, a web services protocol being championed by Oracle, IBM and BEA as a standards-based way to integrate ERP data.

Duncan Lamb mentions BAM and BPEL in an article in his (always excellent) blog:

"CRN : Daily Archives : Oracle To Update App Server With Business Activity Monitoring : 2:53 PM EST Thurs., May 13, 2004: "The BAM data feeds into a central management console, where it can be correlated, filtered and subjected to analytics, according to Oracle. The results can be viewed on a portal dashboard. "

According to the article, the BAM capabilities include support for BPEL, which is coming out the winner in the standards wars for the "Process Execution" slice of the alphabet soup of the SOA. Oracle has not settled on a standard itself however, announcing it will support both BPEL (initially proposed by MS and IBM) and WS-Choreagraphy. Thank goodness for both of them, for as the article says, "Oracle's move brings clarity to the respective focus of each group."

So what's all this about BPEL? Well, BPEL is a bit of a hot topic in the application server and web services world at the moment, and in this SearchWebServices.com article it is defined as

"BPEL (Business Process Execution Language) for Web services is an XML-based language designed to enable task-sharing for a distributed computing or grid computing environment - even across multiple organizations - using a combination of Web services. Written by developers from BEA Systems, IBM, and Microsoft, BPEL combines and replaces IBM's WebServices Flow Language (WSFL) and Microsoft's XLANG specification. (BPEL is also sometimes identified as BPELWS or BPEL4WS.)

Using BPEL, a programmer formally describes a business process that will take place across the Web in such a way that any cooperating entity can perform one or more steps in the process the same way. In a supply chain process, for example, a BPEL program might describe a business protocol that formalizes what pieces of information a product order consists of, and what exceptions may have to be handled. The BPEL program would not, however, specify how a given Web service should process a given order internally."

From an Oracle point of view, one of the best places to start learning about BPEL is Mike Lehmann's Web services, BPEL4WS, J2EE blog. Mike is the Product Manager for Oracle's BPEL engine and has put together a number of articles on web services and BPEL that you might want to take a look at to get a heads-up on the technology. As a starter, Mike's put together a couple of good articles, "Who Needs Web Services Transactions?" and "Oracle Backs BPEL4WS and WS-Choreography", together with a powerpoint presentation, "BPEL: Business Processes with Web Services" which you might want to take a look at.

So, how does this affect us? In terms of data warehousing and BI, probably not much at the moment, and it's probably only going to be relevant to organisations using Application Server 10g as a kind of 'central customer data hub' in conjunction with the latest release of Oracle Applications. In the long term though, it's another pointer towards Oracle's vision of pervasive, real-time business intelligence, that merges current activity with historical data, and you can see the Oracle Application Server being a more central part of this process, using all the interconnect functionality within the application server platform to bring the disparate data together. What this probably means to someone like me (who's probably more comfortable using the RDBMS as the integration engine) is that to stay ahead of the game, you're going to need to gain a familiarity with web services, BPEL, XML, SOAP and so on, as increasingly data warehouse sources are going to be in this form rather than just flat files and database links.


OWB 10g Now Available For Windows

Nicholas Goodman contacted me with the good news that Oracle Warehouse Builder 10g has now been released for Windows.

Although this first release of OWB 10g is mainly concerned with Oracle Database 10g compatibility as a target database, the accompanying FAQ mentions the following new features:

  • Simplified installation - there are no longer Design-Time and Server-Side installation options.
  • Warehouse Builder now provides a script that configures the Warehouse Builder Runtime Service for remote execution of Warehouse Builder's mappings on Oracle Enterprise Manager nodes.
  • Warehouse Builder 10g supports the deployment of Process Flows into Computer Associates Unicenter AutoSys Job Management.

More details on these changes are in the Release Notes.

The good news therefore is that we can now use Oracle 10g as a deployment target for OWB. However, the OWB 10g release we're probably all waiting for is the 'Paris' release, due later in 2004, which looks likely to rework the way dimensional data models are put together, supporting more complex star schemas together with more seamless support for CWM metadata and multidimensional OLAP.
 

Posted by mark at 10:13 PM | Comments (0)

Simon Terr On Data Warehousing And GIS

"Recently I've been involved in a very interesting project involving data warehousing and geographical information system (GIS) technology. Although the data warehouse was not real-time enabled, the GIS technology added a very interesting dimension to this data warehousing project." writes Simon Terr for DMReview.


Degenerate Dimensions Defined

Much of my work involves designing and reviewing dimensional data models, and an interesting issue that often comes up is how to deal with data items such as invoice number, order number and so on, that are not strictly facts - you're not going to want to add them up, or average them, or perform any other maths on them - but they don't seem to fit into existing dimensions.

Ralph Kimball coined the term 'Degenerate Dimensions' for these data items, as they perform much the same function as dimensions: they sit in the fact table and allow you to limit down or 'slice and dice' your fact table measures, but they aren't foreign key links through to dimension tables, as all the information you want - the invoice number, or the order number - is contained in the degenerate dimension column itself. Degenerate dimensions are useful as they tie the transactions, or events, in the fact table back to real-life items - invoices, orders and so on - and they can be a quick way to group together similar transactions for further analysis.

The key here is not to go overboard and make these degenerate dimensions into full dimension tables - for example, an Invoice dimensions - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific dimensions for customers and products where it can be stored as a kind of 'master copy', rather than storing it alongside each order in a ballooning Invoice dimension.

The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on. Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the more dimensions you add to the database.

Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.


New Metadata Exchange Options For OWB 9.2

From http://www.oracle.com/corporate/press/index.html?3003709.html

"Oracle(r) Provides Business Intelligence Users Powerful Metadata Management

Oracle Corporation and Meta Integration Technology, Inc. (MITI), the industry leader in metadata integration, today announced a partnership that will enable Oracle(r) Warehouse Builder customers to more efficiently manage metadata integration with modeling, extraction and reporting tools commonly used in business intelligence (BI) architectures.

The incorporation of MITI's Meta Integration(r) Model Bridge (MIMB) technology with Oracle Warehouse Builder provides users with greater control of their business data integration processes. MIMB enables the movement of metadata across numerous environments such as databases and data warehouses as well as BI, design, and extract, transfer and load (ETL) tools."

"This partnership will help customers to utilize their existing investments in modeling, reporting and ETL tools. Oracle Warehouse Builder can utilize MIMB for metadata integration with popular BI products including:

* Data modeling tools such as CA AllFusion ERwin, Sybase PowerDesigner(r)
* ETL tools such as Informatica PowerCenter
* BI query and reporting tools from Business Objects and Cognos
* Standard metadata repository formats such as OMG CW(TM)

"Every modeling, ETL, OLAP, BI and reporting tool provides its own repository that produces and consumes metadata," said Dan Vesset, research manager, Analytics and Data Warehousing, IDC. "As a result, it is critical for any new tool to integrate well in the enterprise metadata flow. Oracle's partnership with MITI enables Oracle Warehouse Builder to integrate with a wide range of tools, providing users greater control over metadata integration."

You can download MIMB from the Meta Integration Technology website and Oracle have also got a FAQ and viewlet demonstration available on OTN. Although the press release only came out recently, the technology actually works with OWB 9.2 and can be used now. It's a separate download that gets installed alongside OWB, and once installed gives you a number of additional options when using the metadata import and export bridges.

From my own point of view the Informatica integration certainly looks useful, as does that for Business Objects and Cognos. One to take a look at when I get a spare moment.


New Developments In Oracle Data Warehousing

I've been giving a lot of thought recently to how I might go about designing and building future Oracle data warehousing projects. Over the last few months, I've come across some interesting design approaches, and coupled with some new technology that's now available it's probably a good time to have a think about how future projects might come together.

One of the key messages that keeps coming through, from both clients and from Oracle, is around near or real-time data warehousing. Traditionally, data warehouses have been refreshed daily, or weekly or monthly, and it's been generally accepted that there's always going to be a lag between the data in the source systems, and the data in the data warehouse. This isn't necessarily a bad thing - data warehouses are for analysing trends, and you wouldn't want to change the direction of a business based on a few minutes data; however, the role that data warehouses play is changing, and it's common now to use the data warehouse as a way of getting a holistic view of the whole business, in real time, so that split-second decisions can be taken as soon as an event happens, to prevent a customer defection, or to take advantage of market movements, for example. If you're going to be relevant in the data warehousing market, you've got to be capable of being real-time.

Warehouses that are built using Oracle Warehouse Builder are generally loaded on a batch basis, using PL/SQL routines build using OWB that are scheduled using OEM and/or Oracle Workflow. If you want these routines to be 'near' real-time, what you then do is run them on a regular, constant basis (say every minute or so) and have them process changed data from the source system, as opposed to loading the entire source system each time. The usual way to do this is to have some sort of 'new' or 'changed' date field in the source system, so that you can pick up only those records that are new or amended. The smart way to do this though, as from Oracle 9i, is to use 'Change Data Capture', a mechanism for automatically feeding changed database records through to interface tables, or indeed through to the warehouse fact and dimension tables themselves.

In actual fact, I've never actually used the Oracle 9i change data capture, because of the way it is implemented - it relies on triggers being placed on the source tables, which can (a) slow down the source system and (b) worry the source system DBA as you're changing his data model. However, I recently had a look at a presentation and paper from the last Oracleworld by Michael Brey and Michael Schmitz entitled "Experiences with Real-Time Data Warehousing Using Oracle Database 10g" which explains how the new 10g version of change data capture, called 'Asynchronous' (as opposed to 'Synchronous') change data capture works.

 In the 10g version, the redo logs are used instead of triggers, making the process far less invasive and putting no load on the source system. The drawback of course is that the source system (as well as the target) have to be 10g, but I can see how you could put a 'real time' system together, that mines the redo logs, transforms the data using a PL/SQL package and places the data in the correct fact or dimension table. More details on Asynchronous Change Data Capture can be found in the Oracle white paper "On-Time Data Warehousing with Oracle10g - Information at the Speed of your Business" available on OTN.

Another interesting approach that the paper talks about deals with the question of real time vs. fairly static data in the data mart. One criticism of having a real time data mart or warehouse is that if you run a query twice in succession, you might get a different answer, as more data has been loaded since you first run the query. To get around this, the system designers had a main fact table, which is only loaded once a day, together with a 'today's data' fact table, which is updated constantly during the day. If you want a static view of the data, you just query the main fact table; if you want a completely up to date view, you SELECT against both tables, and UNION the results. At the end of the day, the 'today's data' fact table is instantaneously 'partition exchanged' with the partitioned main fact table, which was partitioned by day and had an empty partition set aside for each new day's data.

Following on from this, I attended a presentation by Jon Page at the recent UKOUG Reporting Tools & BIW Special Event in March on Oracle's 'Information Architecture'. Oracle are currently pushing a vision of data warehousing where the warehouse is integrated into the ERP system; in Oracle's case, using a product called 'Daily Business Intelligence' (DBI), an add-on to the e-Business Suite. Interestingly, what Oracle are saying with DBI is that you don't need a separate warehouse, and you don't need an ETL process that loads data in batches and applies changes to the warehouse on a daily, weekly or monthly basis. It looks like DBI is being populated and refreshed using Fast Refresh Materialized Views - materialized views that are updated whenever the base tables have data COMMITed, a process that has little impact on database performance but has the advantage of keeping the summary tables (or in fact any derived tables) always up to date.

What this means in effect is that you don't have a long drawn-out ETL process, with multiple staging tables and mass-copying of data from source to target - you just create some fast-refresh materialized views and code the ETL into SELECT statements. A good approach - although the point I keep coming up with is 'what about complex ETL processes?' - surely at some point you're going to need intermediate tables, and surely at some point you're going to have to break out of standard SQL and use PL/SQL for a more complex process. However, it's an interesting approach and one I'll bear in mind for whatever I work on next.

Separate from all this, I also recently took a look at a paper by Arup Nanda entitled "A Multi-Source, Time Variant Data Warehouse Case Study" - you might have seen a write-up I gave this paper at the time. The case study describes a project where a medical claims data warehouse was built for a client, with a rule that no ETL tool - not even OWB - could be used. The paper then goes on to describe how they set up the warehouse and the ETL routines, using nothing more than SQL*Plus and Unix shell scripts. What was interesting about the paper was the completely different approach they took to putting a warehouse together; most projects I work on involve one or more OWB developers, building a series of ETL mappings that translate into SQL scripts and PL/SQL packages, whilst the authors of the paper did most of their work using parameterised CREATE TABLE AS SELECT ('CTAS') scripts that were called from Unix shell scripts.

What was different about Arup Nanda's approach was that he eschewed the often long, drawn out graphical ETL process used by OWB and crafted a number of reusable SQL scripts that bulk-copied data from source to target using the CTAS statement. In addition, to deal with changes to data types he then crafted a number of scripts that issues ALTER TABLE commands to change column types. Lastly, he made extensive use of (again) daily partitioned fact tables, dropping old partitions and creating new partitions using generic SQL scripts and alongside this using locally partitioned indexes that reduced index maintenance to those partitions directly affected by table loads.

What this said to me was that Oracle 9i warehousing projects don't necessarily need to use OWB, if the process can be better served by putting together well crafted reusable SQL scripts called from either MSDOS batch files or Unix shell scripts. It also highlighted the central role that partitioning plays in a warehouse, not only for performance reasons but also due to the potential for partition exchange loading - the process of swapping out a staging table that contains data for a period with the corresponding empty partition in the fact table. Infact, OWB 9.2 now supports partition exchange loading, so this can also be carried out from the OWB GUI tool without having to learn the relevant SQL statements. However, in some cases - where the ETL process is simple but repeated many times with source data of varying structure - coding SQL statements by hand and controlling them from shell scripts is sometimes more efficient than coding them using a graphical ETL tool.

The last development I've been looking at, is 'Agile' methodologies that take a fundamentally different approach to the project build than traditional 'waterfall' methodologies.

Traditional data warehouse projects that I've worked on (and managed) usually work to a project plan where requirements are first defined, then the system is designed, then coded, tested then rolled out. Whilst this works well when the project requirements can be completely designed at the start, it sometimes falls down when

  • the requirements aren't fully defined at the start, and in fact only come out as the project gets going
  • changes to requirements, or problems during the build mean that the project diverges from the plan early on
  • the client wants to see some results early, but the project only delivers something right at the end
  • time is tight but the project methodology demands lots of documentation, and lots of process
  • team members have different skills but get treated for planning purposes as interchangeable

Agile methodologies, such as Extreme Programming or Scrum address these issues by taking a fundamentally different approach.

  • Requirements are driven out over development cycles, constantly feeding back into the project
  • iterations of the project are delivered regularly and quickly, so that feedback can be obtained
  • the methodology is 'lightweight', only determining objectives and leaving the developers to decide how to do the work
  • There is very little project 'ceremony', only requiring a minimal amount of project documentation
  • What project process there is is mainly focussed on setting objectives and measuring progress
  • Team members are recognised as having unique skills which sometimes cannot be interchanged.

This agile approach looks excellent for business intelligence & data warehousing projects, which more often than not don't have their requirements fully defined at the start ("I don't know what I want, so show me what can be done and I'll tell you then") and benefit most from regular releases of data and reports to the warehouse user base.  It's also the case that, in most projects I've heard of, whilst there is a lot of time and attention paid to the plan at the start of the project, as time goes on less and less project documentation actually gets filled in as everyone's backs are to the wall getting the project delivered - the agile approach would therefore fit this better, allowing us to focus on that documentation which is actually essential - setting objectives and measuring progress - whilst allowing us to dispense with documentation that is only filled in 'because it should be'.

The agile approach should therefore help us deliver projects that are more responsive to customer needs, deliver results earlier and consistently, and allow developers to concentrate their efforts on parts of the project that deliver real value. Of course the potential problem here is that most projects you are asked to quote for are on a fixed price, fixed requirements basis; the client believes they have fully defined the requirements as part of the RFP, and they therefore want you to quote a fixed price for doing this work. Any changes to requirements (which more often than not eventually come in) then become change requests which have to then be separately negotiated. If agile methodologies are going to work, they'll need to be a partnership between client and the development team, balancing the benefits from a flexible approach to the design and build with the fact that it's not possible to completely lock down the cost and elapsed time at the start of the project.

So what does this mean then for future data warehousing projects? Well, whilst in the short term in all likelihood things will carry on as normal, some approaches I'd like to bring into future warehousing projects would include;

  • Increasing use of real-time data, through a combination of synchronous and asynchronous change data capture, and fast refresh materialized views
  • More use in general of the built-in ETL features in Oracle 9i and 10g, even when they're not yet supported by OWB (though check the new New Features documents for later versions, as more and more Oracle 9i ETL functionality is being exposed through OWB)
  • Consideration for the use of generic, parameterized SQL scripts when this would be a more efficient way to build the ETL process
  • The use of agile methodologies where the customer is agreeable to an iterative development process, drawing out requirements through a series of releases and keeping project documentation focussed on setting objectives and measuring process.

Any thoughts from anyone on this approach? Is anyone using the Oracle 9i features for real time data warehousing, or is anyone using an agile methodology, such as eXtreme Programming or Scrum to run their warehousing project? Drop me a line if you've got any feedback.



Guide To Integrating OWB9.2 And Oracle Workflow

A recent posting on Oracle Warehouse Builder prompted some questions from readers on the process of integrating Oracle Workflow with Oracle Warehouse Builder; this integration has also been a 'hot topic' on the OTN OWB Forum. The OWB documentation stops short of explaining how workflow jobs, created using OWB9.2 and deployed using the Deployment Manager, are actually submitted and run using Oracle Workflow, and therefore Sonia Brown put together some notes on how this integration is achieved. 

The document covers the following steps, with a number of screenshots to illustrate the process;

  • Installing Oracle Workflow Server
  • Edit init.ora file 
  • Start Oracle Universall Installer for Oracle Database 9.2 
  • Install and configure Web Server 
  • Workflow web interface virtual directory mappings 
  • Installing Workflow help documents 
  • Set up a directory service for Oracle Workflow 
  • Verify your base URL 
  • Implement Oracle Internet Directory integration (optional) 
  • Implement Single Sign-On integration (optional)
  • Installing Additional Languages
  • Installing Workflow Builder Client (Optional) 
  • Installation 
  • OWB Configuration Settings 
  • Create Workflow Location 
  • Create a Process Flow using Workflow Location 
  • Set Properties of a Process Flow 
  • Deploy the Process Flow 
  • Process Flows using Workflow Server 
  • Launch process 
  • View Process Information 
  • Oracle Workflow Builder Import (Optional) 
  • Process Flow Import

There's also a viewlet available on OTN, that shows how Oracle Enterprise Manager can be used to run the resultant Oracle Workflow processes; one thing to be aware of is that the preferred database user credentials must be those of the Runtime Access User, whilst the Runtime Owner's details are used in the properties page.


Comparing OWB To Hand-Coding PL/SQL

"Hello Mark,

Stumbled across your web page and blog. Allow me to introduce myself. I am an Oracle Apps developer and have been investigating the use of OWB and ETL functionality for use in a conversion project. Have done much ETL without formal tools before the term existed, but want to know your opinion of the tool for data conversion from a mainframe legacy source to Oracle. The main issue is the benefits (standardization, use of metadata, ???) versus the cost in terms of ramp-up time. It's a no-brainer to do data conversion with sql*loader and pl/sql but to implement and learn a tool such as OWB, what in your opinion is the learning curve? What is the biggest benefit? We have maybe 30 objects or so that we will be converting. Oracle will be using the tool for creating interfaces as well. Thanks in advance for your input."

Good question about OWB. If I was to put my 'marketing' hat on, it'd be straightforward to list the benefits of using OWB 

  • metadata about objects and mappings is captured during the build process, stored in the OMG CMW format, and flows through to other Oracle tools such as Discoverer, 9i OLAP, the database, workflow and so on.
  • building graphical mappings, rather than hundreds of lines of code, makes it easier for other developers to work out how a load process works
  • easy creation of complex 9i metadata around your database objects; both the dimensions and hierarchies you get with 'normal' Oracle 8i and 9i, and the CWMLITE / CWM2 metadata required as part of the OLAP Option.
  • with OWB 9.2, you now get a mapping debugger, plus Oracle workflow integration
  • all of the extra auditing / error trapping that you get by default with an OWB mapping
  • easy access to the Oracle fine-tuning tricks (INSERT */ APPEND */ on loads, etc)
  • access to additional, pay extra functionality like name and address cleansing

However, there is an overhead in using OWB, both in getting up and running, and in building the project. In theory, these overheads are balanced out by gains in productivity, but depending on what you're looking to achieve, and the experience of your PL/SQL programmers, it's worth weighing these up beforehand. Typical overheads include

  • setting up OWB, plus the repository, runtime schema and so on, and ensuring good performance of the tool
  • learning the basics of OWB. Basically, your mappings are set based rather than procedural, so you have to rethink how you put mappings together, with each one translating to a single INSERT, UPDATE or MERGE statement.
  • With OWB 9.0.4 and 9.2, the tool itself has got more complex, with separate mechanisms for deploying mappings, and the mappings themselves (which were straight PL/SQL packages, executable from a DOS/Shell script) now wrapped up in an XML wrapper, together with the concept of connectors, collections and so forth
  • Being a java application, OWB can be a little bit 'sluggish' when working with, and can also slow down if the underlying database you're storing definitions and mappings in is under a heavy load.

One of the benefits of working with OWB is that the mappings it creates are all PL/SQL; in addition, if you want to extend the functionality available in mappings by creating your own custom transformations, these are all coded in PL/SQL. Therefore, you don't need to learn a new scripting language to get the most out of OWB. However, there's two caveats here

  • although the mappings that OWB generates are PL/SQL, and you can then go and hand-edit the mappings, there's no way of folding these changes back into the stored mapping model. Therefore, you'd need to reapply these customisations each time you deployed the mapping.
  • if you write custom functions to drop into a mapping, be careful that these don't kill the performance of the mapping. For example, if you include functions in your mapping that transform the data, or insert values into control tables as part of the mapping, these can screw with the query optimiser, leading to full table scans or the mapping reverting to row-based rather than set-based.

Weighing these all up, on balance I would still go with OWB. The benefits of code portability amongst the development team (it's easier to pass on a graphical mapping, rather than someone's PL/SQL code), the additional error control and auditing put into the code with OWB, plus the hooks into all the other Oracle functionality (workflow, OEM, OLAP, Discoverer, CWM metadata) outweigh the steepish learning curve, plus slightly slower development compared to hand-coding, at least initially. Once you've got the hang of it, a good OWB developer can be as fast as a good PL/SQL developer, but you've now got all the quality control, and metadata in a standard form, which is going to pay off on a large project.


MOSC Note On Using The OWB Design Browser With The Standalone OC4J Instance

If you're looking to use the web-based Design Browser that comes with OWB 9.0.4 and 9.2, and want to deploy it using the new built-in OC4J instance, there's a new MOSC note No. 237367.1 that goes through all the steps required to set it up.

In the past, to use the Design Browser you had to install it into an Oracle Portal installation; with the bundled OC4J server that comes with 9.0.4 and 9.2 you can now run it using the Apache webserver that comes with the 9i database. The note goes through the creation of the Gateway Database Access Descriptors Settings, and shows you what details to enter in to the Design Browser login screen when you get it up and running.


Information on Oracle Embedded Data Warehouse

"I am trying to find information about Oracle EDW, which is supposed to be a datawarehouse out of the box for Oracle applications. Only thing that pops up at Oracle is the warehouse builder FAQ and that is no help. Please send me link to any documentation."

If you've got access to MOSC, there's a section on EDW in the Top Tech Docs area http://MOSC.oracle.com/MOSC/plsql/showdoc?db=NOT&id=149404.1&blackframe=0

Basically EDW (Embedded Data Warehouse) is a series of patches applied to Apps 11i, that provides as you say an 'out of the box' data warehouse that is maintained and populated by a customised version of OWB. As of EDW 4.1, the version of OWB that's used is 9.0.3, and it comes with prebuilt collectors or mappings to get data out of Apps and in to the EDW warehouse.

The nice thing about the version of OWB that is provided with EDW is that it's got an 'apps integrator', making it easier to get data out of apps (a bit like the SAP integrator, so I hear) that allows you to reference data by it's business definition, rather than the underlying apps table. Apart from licensing EDW, AFAIK there's no other way to get this integrator.


Tips On Using OWB and AWM To Create Analytic Workspaces

Now that the 9.2.0.4.1 Oracle Database patchset is available, together with the Analytic Workspace Manager and OWB9.2, I thought I'd take a look at the different options available for populating an Analytic Workspace and making the data available for querying using SQL.

Rather than manually building and populating analytic workspaces using the OLAP Worksheet and OLAP DML, you can now build and populate them using extensions to Oracle Warehouse Builder, or using the Analytic Workspace Manager add-in to Enterprise Manager. They work in broadly similar ways, using the same underlying AWM_ PL/SQL packages supplied as part of the 9.2.0.4.1 patchset, but there a few differences in the approach the tools take, what functionality they provide, and the way they handle OLAP Catalog metadata, which i've written up in this technical article.


Migrating OWB From Windows 2000 To RedHat Linux

"We are looking at migrating our OWB on win2k to linux redhat. Does anyone have any experience with this? What vrs of redhat? And what if any advice you would like to offer?"

Back in the days of OWB 9.0.3 I looked at moving my laptop to Redhat 8.0 / SuSE 8, and 9iDS / OWB were my primary Oracle tools.

It's worth double checking what the situation with 9.0.4 and 9.2 is (as it may have changed) but a couple of issues I came across were;

  1. OWB 9.0.3 on Linux didn't come with the Discoverer Metadata Bridge add-in, and therefore we couldn't export the star schema to a Disco Business Area. This may still be the case with 9.0.4 and 9.2, and it may also be the case with the other metadata bridge feature - the one to 9i OLAP.
  2. In practice we found that a developer, as well as needing OWB on their laptop, also needed Discoverer Admin / Desktop, and Reports Builder, and these aren't included in the Linux 9iDS bundle. Because of this, it's difficult to completely 'throw out' the windows installations, as there's no way of building the Discoverer side of things.

In terms of the product itself - it's fine, and works more or less exactly the way as the Windows versions, as it's all written in java. Watch out for the above 'gotchas' though.


Oracle Warehouse Builder 10g News

Details are emerging about the next releases of Oracle Warehouse Builder, the first of which is due to be launched at Openworld next week.

The next release of OWB will be known as OWB 10g and will be a bundled install with the Oracle 10g database, much like the existing OLAP and Data Mining options with Oracle 9i. I assume that this will only be the 'server-side' install of OWB (the runtime and repository schemas) as it wouldn't make much sense to install the client element on the server. Apart from the bundled install, the functionality of this first OWB 10g will be the same as OWB 9.0.2.3

The version after that, codenamed 'Paris', is expected to have a raft of new functionality which Oracle are currently keeping close to their chests. This 'Paris' release has been talked about for quite a while now; I'd heard this codename being used for the version of OWB that was to include 9i OLAP functionality, but this is now present to a limited extent in existing versions.  Maybe this forthcoming release will add to this functionality, hopefully allowing a 9i OLAP Analytic Workspace to be populated directly rather than having to load a relational star-schema as an intermediate step.


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.