 |
|
They AMM to
Please...
Oracle Database Tips by Donald Burleson |
Note:
By Mike Ault
Ran into another undocumented feature in 10gR2
Standard edition using RAC today. On a RedHat 4.0 4-CPU Opteron
(2-Chip, 4-core) using 6 gigabytes of memory in a 2-node RAC, the
client kept getting ORA-07445's when their user load exceeded 60 users
per node. At 100 users per node they were getting these errors, a
coredump for each and a trace file on each server, for each node,
about twice per minute. There didn't seem to be any operational errors
associated with it, but it seriously affected IO rates to the SAN and
filled up the UDUMP and BDUMP areas quickly. Of course when the BDUMP
area filled up the database tends to choke.
The client is using AMM with SGA_TARGET and
SGA_MAX_SIZE set and no hard settings for the cache or shared pool
sizes. Initially we filed an ITar or SR or whatever they are calling
them these days but didn't get much response on it. So the client
suffered until I could get on site and do some looking.
I looked at memory foot print, CPU foot print and
user logins and compared them to the incident levels of the ORA-07445.
There was a clear correlation to the number of users and memory usage.
Remembering that the resize operations are recorded I then looked in
the GV$SGA_RESIZE_OPS DPV and then correlated the various memory
operations to the incidences of the ORA-07445, the errors only seemed
to occur when a shrink occurred in the shared pool as we saw the error
on node 1 where a shrink occurred and none on node 2 where no shrink
had happened yet.
Sure enough, hard setting the SHARED_POOL_SIZE to
a minimum value delayed the error so that it didn't start occurring
until the pool extended above the minimum then shrank back to it,
however, not every time. We were able to boost the number of users to
80 before the error started occurring by hard setting the shared pool
to 250 megabytes. A further boost to the shared pool size to 300
megabytes seems to have corrected the issue so far but we will have to
monitor this as the number of user processes increases. Note that you
need to look at the GV$SGA_RESIZE_OPS DPV to see what resize
operations are occurring and the peak size reached to find the correct
setting on your system.
It appears that there must some internal list of
HASH values that is not being cleaned up when the shared pool is
shrunk. This results in the kernel expecting to find a piece of code
at a particular address, looking for it and not finding it, this
generates the ORA-07445. Of course this is just speculation on my
part.
So for you folks using 10gR2 Standard edition
with RAC (not sure if it happens with Non-RAC, non-Standard) look at
either not using AMM, or be sure to hard set the SHARED_POOL_SIZE to a
value that can service your number of expected users and their code
and dictionary needs.
Note: When using AMM (by
setting memory_target, and/or sga_target, the values
for the "traditional" pool parameters (db_cache_size,
shared_pool_size, &c) are not ignored. Rather, they will specify
the minimum size that Oracle will always maintain for each sub-area in
the SGA.
|