Question:
I have a really big partitioned table, that I
used MERGE to populate/update it. I understand the MERGE will UPDATE and INSERT
upon matching the PK. If insert and update table rows become scattered, the
MERGE will mitigate which statements need INSERT or UPDATE in the staging area.
Assume MERGE statement is processing a whole bunch of rows -millions of
them, where some of the rows are UPDATEing and some of the rows are INSERTing on
the target table. Our actual mix appears to be 60% update vs. 40% insert in
about 25 million rows a day.
>>
Does MERGE actually issues a mixed set of UPDATE and INSERT commands -as needed-
against the target table?
Answer: The
Oracle docs reveal that MERGE statement makes it easy to
update OR insert rows in one pass through the data, without the overhead of multiple insert/update statements. Here are the reasons Oracle suggests
using MERGE instead of older methods:
- The entire operation can be expressed
much more simply as a single SQL statement.
- You can parallelize statements
transparently.
- You can use bulk DML.
- Performance will improve because your
statements will require fewer scans of the source table.
As far as the question about the
MERGE statement issuing UPDATE/INSERT commands, the answer is much less
conclusive. Oracle has documentation supporting both sides.
Here, the docs imply that MERGE will mix the INSERT and UPDATE into an
atomic statement.
The MERGE-statement combines the sequence of a conditional
INSERT and UPDATE in one single atomic statement, Oracle9i overcomes these
deficiencies and makes the implementation of warehousing applications more
simple and intuitive. The ?IF ROW EXISTS -THEN-UPDATE ELSE-INSERT? -
conditional sequence is also referred to as UPSERT
Yet
here, the docs suggest that MERGE consists of a solitary optimized internal
scan. In this, INSERT and UPDATE become conditional and processes as individual
statements. In this, there is a single scan with the DML command appropriate
for serial or parallel issues.
The new MERGE command overcomes these deficiencies,
processing the conditional INSERT-or-UPDATE within a single statement. The
data is scanned only once, and the appropriate DML command is issued, either
serially or in parallel.
The new MERGE command brings major performance benefits by providing an
optimized internal feature for the common Upsert task within ETL processing.
>> Since my developers coded a APPEND hint, each time a new
INSERT is issued by MERGE statement HWM would go a little higher. Following this
line of thinking if millions of INSERT commands are issued by MERGE, HWM would
go a little higher millions of times leaving a huge quantity of under used data
block behind and consequently "wasting" a lot of space. Is this
the correct way that an upsert (merge) works with the append hint?
Answer:
The "append" hint simply means "DO NOT re-use data block from the freelist
(as released by the PCTFREE threshold)". Whenever the merge needs a new block,
it raises the HWM and grabs file brand new empty blocks.
Note: INSERT APPEND supports only the subquery
syntax of the INSERT statement, not the VALUES clause.
Despite the new ease of the MERGE (upsert)
command, there are still some complexities which need explanation.
Here is an example of a forum user who has some questions regarding MERGE
and APPEND hints-
Basically, the APPEND hint will keep the data
blocks that are on the freelists from being reused. This being the case, if
there is a MERGE with a new block, the HWM takes fresh empty blocks and is
raised. Try viewing this
article on INSERT performance by Don Burleson on the APPEND hints.
This article by
Mark Rittman shows a sample MERGE command with an example of an APPEND hint:
alter session enable
parallel dml;
merge /*+ parallel(contract_dim,10)
append */
into contract_dim d
using TABLE(trx.go(
CURSOR(select /*+ parallel(contracts_\
file,10) full (contracts_file) */ *
from contracts_file ))) f
on d.contract_id = f.contract_id
when matched then
update set desc = f.desc,
init_val_loc_curr = f.init_val_loc_curr,
init_val_adj_amt = f.init_val_adj_amt
when not matched then
insert values ( f.contract_id,
f.desc,
f.init_val_loc_curr,
f.init_val_adj_amt);
You may want to take a look at
this forum
thread to get more information from the experts and fellow users about this.
Regarding raising the HWM with
MERGE statements, this will only occur when the block becomes logically full
(pctfree), otherwise before the blocks change, there could be multiple rows in
the append mode. This is because the APPEND hint only implies that a a fresh
block will be grabbed once it is necessary to get a new data block.
Performance wise, the merge/upsert
would bring improved performance because the ETL processing contains an Upsert,
which has the optimized internal feature. If it is MERGE performance that you
are interested in, you may be interested in
this article by David Aldridge, which discourages MERGE for refreshing MV's.