One of the customers I work with is looking to us for advice on how to set their data warehousing database up. They're looking to use the Enterprise Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will have two database instances, one to hold the OWB Design repository (with an 8k block size) and one to hold the target data warehouse and the OWB Runtime Repository (with a 16K block size). Setting up the design repository instance isn't an issue, it's just a regular database set up using the default database template, but the target data warehouse instance is a bit more interesting as we're looking to minimize the amount of redo log generated during the warehouse initial build and then refresh.

Most people are aware that you can specify a NOLOGGING clause when creating tables, indexes, partitions and tablespaces which under certain circumstances causes redo generation to be switched off when the object is loaded. You're also probably aware that the database itself can be placed in NOARCHIVELOG mode, which disables the process of redo log archiving which can otherwise slow down a bulk load into a warehouse. However, what's often not clear is which of these options should be used and why you would use one rather than the other, and I therefore thought it worthwhile going back over the documentation and trying to set out what the optimal set of choices are. As usual, if you're reading this and you spot something that I've got wrong, or can add anything that makes the process clearer, add a comment at the end and I'll update the posting.

To put this in context, the client runs a number of other Oracle databases, almost exclusively as the backend for their transactional systems, and the general rule is that these databases are run in ARCHIVELOG mode, incremental hot backups are taken during the week using RMAN followed by full hot backups at the weekend, and the archived redo logs are stored off site and also used to refresh their DR databases. Given a preference, they would rather run the new data warehouse database in ARCHIVELOG mode, so that it can fit in with their existing arrangements, but they are willing to perform the warehouse load and refresh in NOLOGGING mode as long as we give them some downtime after the load to do a full, cold backup of the warehouse.

What we would like to do is to somehow disable redo log generation for initial loads into our dimension tables, and for initial and then subsequent loads into the fact tables. The rationale for this is that the initial dimension load, and all loads into the fact table, load a very large amount of data and we can speed up this process if we don't have to write redo information into the redo log files - in other words, we dramatically reduce the amount of disk I/O we need to perform to load a given object. If our database is running in ARCHIVELOG mode, we get an additional benefit in that we don't then need to archive off these redo log files, again reducing the time to perform the load as we don't have to wait around for the ARCn process to archive the filled-up redo log files. The downside of doing this though is that these data loads are unrecoverable - that is, if the instance crashes or we have a media (disk) failure, we wouldn't be able to recoverer these objects. Whilst that isn't a bit deal for the objects we're working with - we can just reload them from the source files - it means that we're going to have problems with any other objects that are also in that datafile. What this means in practice is that we have to do a full, cold backup of the datafiles that contain these objects before we can consider the database "recoverable".

I mentioned earlier on that you can specify a NOLOGGING clause when creating tables, indexes, partitions and tablespaces. Ignoring tablespaces for the moment, what this means in practice is that you can create the table (for example) using the NOLOGGING clause:

CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)

NOLOGGING

;

Then, you can carry out bulk loads into this table in NOLOGGING mode. For example:

INSERT

/*+ APPEND */

INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);

One point to note here is that the NOLOGGING clause doesn't mean that

every

DML operations on the table works in NOLOGGING mode - it's only bulk operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo, even when you specify NOLOGGING. This is a popular misconception as logic would suggest that the NOLOGGING table creation clause would suppress all redo log generation, not just direct path loads, but all the NOLOGGING clause does is specify NOLOGGING for the table creation process itself (in case you're using CREATE TABLE ... AS SELECT) and for any subsequent direct path insertions.

Another misconception is that NOLOGGING suppresses all redo, even when using direct path loads. What actually happens is that NOLOGGING operations generate minimal redo, not no redo whatsoever, with this minimal redo generated so that the data dictionary itself is preserved. It's only a relatively small amount, but it's worth noting that some will be produced, whatever steps you take.

With indexes and partition creation, it's a similar story - you can specify NOLOGGING in the object creation clause, and then certain operations (in the case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.

NOLOGGING, NOARCHIVELOG and Standby Databases - Part I
June 10, 2005
Mark Rittman

NOLOGGING

/+ APPEND /

every

NOLOGGING, NOARCHIVELOG and Standby Databases - Part I June 10, 2005 Mark Rittman

NOLOGGING

/*+ APPEND */

every

NOLOGGING, NOARCHIVELOG and Standby Databases - Part I
June 10, 2005
Mark Rittman

/+ APPEND /