Question: I've read the documentation of configuring Oracle Streams
Data Capture and I would like to a more detailed understanding of Oracle Streams
change data capture.
Answer: Configuring Oracle Streams change
data capture is quite straightforward.
The ?Oracle
Streams? book has a complete step-by-step guide
for Streams CDC propagation.
Oracle Change Data Capture was introduced with Oracle 9i, and provided the
ability to track changes to tables and store them in a change table, for further
consumption by an ETL process. Oracle Change Data Capture worked by creating
triggers on the source tables, transferring data synchronously but creating a
processing overhead and requiring access to the structure of the source tables.
Because of the effect that the triggers had on the underlying tables, many
warehouse projects did without change data capture and used other methods to
capture changes.
Oracle 10g introduced Asynchronous Change Data Capture, which instead of using
triggers uses the database log files to capture changes and apply them to
collection tables. Asynchronous Change Data Capture therefore doesn't require
changes to the table structure and doesn't impact on database performance.
Oracle9i Database introduced the Change Data Capture (CDC) feature. Oracle
Streams Change Data Capture captures all the inserts, updates, and deletes made
to user tables. These changes are stored in a database object called a change
table, and the change data is made available to applications in a controlled way
through what is called a subscriber view.
Prior to Oracle 10g, the change data was synchronously fed from the source. In
Oracle Database 10g, it becomes possible to asynchronously feed the change data,
and there are multiple options for how this feed is obtained and managed.
Asynchronous data capture reads the changed data from Oracle redo log files. As
a result of this, dependence on the transaction activity is avoided and
performance overhead is reduced. Oracle Streams Methodology is used to capture
change data from redo log files.
Asynchronous change data capture requires a streams configuration for each CDC
change set. These configurations include the streams capture and apply processes
that read change data from redo log files and inserts it into change data
capture change tables.
Asynchronous change data capture requires both the source and staging databases
to be Oracle Database 10g.
The Purpose of Streams Change Data Capture
From the inherent data
structure perspective, change data capture can either occur at the data level or
at the application level. At the data level, a table in the target database is
regarded as a remote snapshot of a table in the origin database. At whichever
level capturing and propagation is taking place, it is imperative to note that
there is always an increase in the workload on the source database. However,
with oracle 10g, additional overhead is a forgotten issue.
Asynchronous change data capture is now adopted, in which change data is
extracted from the redo logs without any negative performance implication on the
source database. Furthermore, asynchronous change data capture can be described
as a lightweight technology targeted towards change extraction and propagation
in a data warehousing system and in which changes to the source tables are
viewed as relational data for onward consumption by subscribers.
There is therefore no gain saying that asynchronous change data capture has
greatly enhanced parallel log file processing and data transformation.
The purpose of Oracle Database 10g change data capture is to efficiently
identify and capture data that has been added to, updated, or removed from
Oracle relational tables, and make the change data available for use by
applications.
In order to facilitate the movement of data from source databases to data
warehouses, the extraction and transportation of relational data from one or
more source databases must be optimized. The Oracle Database 10g change data
capture feature is used for optimizing this type of data transfer activity.
Using a change table, the change data capture feature eliminates the need for an
intermediate flat file to stage data outside of the relational database.
Instead, the change data resulting from INSERT, UPDATE, and DELETE operations is
captured into the change data table inside the database. The change data is then
made available to applications in a controlled way.
Asynchronous Oracle Streams Change Data Capture
When
a SQL statement that performs a DML operation is committed, change data is
captured by taking it from the redo log files. Using this technique, the change
data is not captured as part of the modifying transaction, and therefore has no
dilatory effect on that transaction. Asynchronous Change Data Capture is
available with Oracle Database 10g Enterprise Edition only.
Asynchronous Change Data Capture is based on, and provides
a relational interface to, Oracle Streams. The following are the two
methods of capturing change data asynchronously:
HotLog Mode
In the HotLog mode, change data is captured from the
current active redo log file on the source database. The change tables are
populated automatically as new committed transactions arrive. No data is
recorded until the transaction commits. There will be a minor delay between the
commit of the source table transactions and the arrival of change data to the
change table.
In the HotLog mode, there is a single, predefined HotLog
change source, identified as the hotlog_source, that represents the location of
the current redo log files of the source database. There is only one source for
the HotLog change. Once defined, this cannot be altered or dropped without a
database restart.
AutoLog Mode
In the AutoLog mode, change data is captured from a set of
redo log files managed by the log transport services of Data Guard. The
publisher of the change data (source database) must configure the Data Guard log
transport services to copy the filled redo log files from the source database
system to the staging database system, and then to automatically register the
redo log files. Change sets are then populated automatically as new redo log
files arrive. The amount of delay depends on the frequency of redo log switches
in the source database.
There is no requirement for a predefined AutoLog change
source. The publisher form the source database provides information about the
source database to create the AutoLog change source.
More details on Oracle Streams Asynchronous Change Data Capture can be found
in the book "Oracle
Streams", which also details improvements to the transportable tablespaces
feature in Oracle Database 10g.
Streams Data Change Capture Example
BEGIN
iscn := DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER();
dbms_output.put_line ('Instantiation SCN : ' || iscn ) ;
DBMS_APPLY_ADM.SET_SCHEMA_INSTANTIATION_SCN@DNYOIP20.world (
source_schema_name => 'ny2',
source_database_name => 'DNYTST10.world',
instantiation_scn => iscn ,
recursive => true
);
END;
/
Start both the Streams Change Apply and Capture processes
at the destination and source databases, respectively:
connect STRMADM/STRMADM@dnyoip20
PROMPT Connected to Destination (DNYOIP20)
PROMPT Starting the Apply Process at Destination
begin
dbms_apply_adm.start_apply (apply_name => 'LN2_APPLY ' ) ;
end ;
11g new Streams CDC Features
Oracle 11g introduced synchronous change data capture, a real-time way of
capturing CDC information for databases with low-DML rates (Under 20 updates
per second).
Oracle 11g also introduced 11g the "apply only" option and advanced
queuing enhancements to improve re-synchronization after network loss.
More on Oracle Streams Change Data Capture: