|
|
|
The mysteries of Oracle buffer block management
by Donald Burleson |
The internal machinations of Oracle buffer cache
management are not published and many Oracle professionals infer
their behavior from experimentation and tidbits from the Oracle
documentation. For a complete description of the v$ event
views, get the "free
11g poster" by UNISYS.
The v$bh view of x$bh has the
all-important "status" column that indicates the lock mode for each
data block in the data buffer. You can use v$bh to display the data
buffer contents, and
here is a great script for this. For non-OPS and non-RAC
databases, you will see these common lock modes:
- xcur - This is a RAM block that has
been acquired in exclusive current mode. According the
Oracle documentation, if a buffer state is exclusive current
(XCUR), an instance owns the resource in exclusive mode.
- cr - This mode indicates a
"cloned" RAM block (a "stale" block), that was once in xcur mode. The
instance has shared access to the block and can only perform
reads. The cr state means the owning instance can
perform a consistent read of the block, if the instance holds an
older version of the data.
- free - This is an "available" RAM
block. It might contain data, but it is not currently in-use by
Oracle.
- read - The buffer is reserved for a
block that is currently being read from disk.
During database recovery we might also see
these values for v$bh status column:
- mrec - Indicates a block in media
recovery mode
- irec - This is a block in instance
(crash) recovery mode
For RAC database we also see this
v$bh.status values:
- scur - a current mode block, shared
with other instances
A normal database will have filled all the
buffer blocks in just a few minutes, and the database buffer will
normally remain full until shutdown time. These "free" RAM blocks
still contain data. They are just marked as free because they are
eligible to be overlaid by a new incoming data block.
Multi-versioning of blocks in
the buffer
The
Oracle row-level locking mechanism is well understood by Oracle
experts, but here is some question about how row-level locking
translated into data buffer block status changes.
In a MOSC forum discussion titled "Multiple
buffer versions flooding the buffer cache", we see a DBA on Oracle
8.1.6.2 who claims to have 50 versions of a single data block in his
data buffer. This query was used to identify multiple versions
of a data block within the buffer cache:
SELECT
dbarfil,
dbablk,
class,
count(*)
FROM
x$bh
GROUP BY
dbarfil,
dbablk,
class
HAVING
count(*) > 40;
And also a query
to identify the specific segments that are experiencing multiple
data buffer versions:
SELECT
segment_name,
segment_type,
owner,
tablespace_name
FROM
sys.dba_extents
WHERE
file_id = <DBAFIL>
AND
<DBABLK> BETWEEN block_id and block_id + blocks -1;
"If this is a table or index block it is
advisable to check the application to see why the block is being
frequently changed as this may represent a point of contention."
"In general, you would see a number of
CR blocks in the buffer cache proportional to the number of
updates on that block."
In this case we see an argument for re-setting
the pctfree object parameter when we have a table (or index)
with many "hot update" rows. Re-setting pctfree to a
high-value will spread the data rows across more data blocks,
thereby relieving the possibility of having dozens of versions of
the data block in the data buffer.
For example, if you have avg_row_len = 80 and 2k data blocks,
setting pctfree=90 would remove the data block from the
freelist (marking it as logically full) after only three rows are
inserted. This approach wastes disk space but reduced the
multiple versions of "cr" RAM blocks. This high pctfree
technique is also used to reduce row fragmentation in tables with
VARCHAR data types where a row is initially stored small with huge
row expansion from subsequent UPDATE statements.
Mark Bobak, a member of the respected Oaktable
group, professes some knowledge of buffer management internals, and
published this
excellent
speculation on the behavior of buffer block status:
I'm fairly certain, but not 100% sure,
that Oracle creates a CR block from a XCUR (or SCUR? I'm not a
RAC expert), by cloning the XCUR block, and then referring to
rollback, rolls back the block till it's sufficiently old, to be
read consistent to the point in time required.
So, if a query's snapshot SCN is N, but
a particular DBA (data block address) is only available in
current mode, and the SCN is N+100, then Oracle will clone it,
and mark it as CR, and start rolling back, using the information
it the block's ITL slot(s) to backtrack where the various before
images are.
Eventually, one of two things will
happen, either the block will be sufficiently old (with an SCN
of N or less) or, Oracle will not be able to find the required
information in the rollback segment, which would result in
ORA-1555.
And yes, I believe the XCUR block is kept in the buffer cache
after a transaction completes. (Again, there may be some
gotchas in the RAC case, I'm not sure about that case.)
Consider also, that a block in XCUR mode can be flushed to disk,
even if there is a transaction open on the block. Oracle doesn't
care. If you think about how the locking model works, and how
read consistency works, it really doesn't matter. If a block
with one or more row level locks is flushed to disk, and then
the transaction commits, what happens?
Well, Oracle certainly won't re-read the
block just to update the row lock and ITL slot. So, what does it
do? It marks the transaction as committed in the rollback
segment slot header, and moves on. If another process
comes along and reads or updates that block, delayed block
cleanout will kick in to clean up the mess that was left behind.
There is also a question about buffer
multi-versioning when using Oracle 10g
Automatic Storage Memory Management (ASMM). Here is an
actual example from an ASMM 10g database showing only one free block
in the data buffer:
STATUS NUMBER_BUFFERS
------- --------------
cr 616
free 1
xcur 14790
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ %
Total
Event Waits Time (s) DB
Time Wait Class
------------------------------
------------ ----------- --------- --------------
log file parallel
write 9,670 291 55.67 System I/O
log file
sync 9,293 278 53.12
Commit
CPU
time 225 43.12
db file parallel
write 4,922 201 38.53 System I/O
control file parallel
write 1,282 65 12.42 System I/O
Here
is a sample AWR report
from the system as BC first found it.
Note that all xcur and cr blocks
in the RAM data buffers can be de-allocated (with the alter
system flush buffer_pool command) to make their status "free"
and they can then be reassigned to other SGA regions. Here is a simple script to display counts of
each v$bh status:
column c1 heading 'Status' format a10
column c2 heading 'Number|of Data|Buffers' format 999,999,999
select
status c1,
count(1) c2
from
v$bh
group by
status
order by
count(1) desc;
Here is the output from a high-DML database.
Number
of Data
Status Buffers
---------- ------------
xcur 311,967
free 270,731
cr 17,302
The number of blocks in v$bh where
status="free" is critical to high-update databases because there
must be enough buffers to hold all data blocks that are being
updated (in xcur status). The Oracle 10g Automatic
Storage Memory Management (ASMM) feature (the default on Oracle10g)
should anticipate high updates and allocate additional data buffers
during high update periods. For example, here is an actual
output from an Oracle10g database where it appears that ASMM is not
allocating enough free blocks to accommodate concurrent updates:
STATUS NUMBER_BUFFERS
------- --------------
cr 616
free 1
xcur 14790
For more on ASMM,
click here.
An Online
Demonstration
This example below shows how Oracle marks
un-locked block buffers as "free" and how non-free buffer blocks
(where status <> 0) and we see the cur and xcur blocks
being flushed from the buffer cache when we issue the alter
system flush buffer_cache command. In this example there are
50,000 rows in the POLICYREC table.
SQL> update POLICYREC set sum_assured = sum_assured + 15;
50000 rows updated.
SQL> commit;
Commit complete.
SQL> select o.OBJECT_TYPE, substr(o.OBJECT_NAME,1,10)
objname , b.objd , b.status, count(b.objd) from v$bh b,
dba_objects o where b.objd = o.data_object_id and
o.owner = 'NYUSER' group by o.object_type,
o.object_name,b.objd, b.status ;
OBJECT_TYPE OBJNAME OBJD STATUS COUNT(B.OBJD)
----------- ------------- ------ ------ -------
TABLE TEST1 43058 free 6
TABLE POLICYREC 43061 cr 47
TABLE POLICYREC 43061 free 238
TABLE POLICYREC 43061 xcur 376
SQL> alter system flush buffer_cache;
System altered.
SQL> select o.OBJECT_TYPE, substr(o.OBJECT_NAME,1,10)
objname , b.objd , b.status, count(b.objd) from v$bh b,
dba_objects o where b.objd = o.data_object_id and
o.owner = 'NYUSER' group by o.object_type,
o.object_name,b.objd, b.status ;
OBJECT_TYPE OBJNAME OBJD STATUS COUNT(B.OBJD)
---------- -------------- ----- ----- -------------
TABLE TEST1 43058 free 6
TABLE POLICYREC 43061 free 660
|