This feature utilizes the disk group
attribute disk_repair_time to determine how long to
wait before an ASM disk is permanently dropped from an ASM
disk group after it was taken offline for whatever reason.
The default of disk_repair_time is 3.6 hours. The
time can be specified in minutes, hours and days (M?H?D). If
a unit is not specified, hours is used by default. This
means that if a disk of an ASM disk group becomes
unavailable for only a short period of time, the server
waits for the disk to become available again instead of
forcibly removing it at once from the disk group as was the
case with Oracle 10g. This can be very handy if a disk is
temporarily inaccessible possibly because of a temporary
cable disconnect.
Note: Fast mirror
resynchronization of a disk group can be switched off
by changing its attribute disk_repair_time to 0.
In 10g, such a disk would be dropped
from the disk group right away. This could cause unnecessary
re-balancing operations with lots of I/O load and need to be
fixed twice. The first time, ASM would rebalance the disk
group after dropping the disk. Oracle would then re-stripe
everything within the disk group which was striped across
n disks before the drop of the disk across now n-1
disks. This re-balancing would have to take place a second
time in order to re-stripe everything across again (n-1)+1
disks after re-adding the disk to the disk group later on.
All these expensive I/Os can be prevented with the fast
mirror resynchronization feature in 11g.
In the 11g scenario with fast mirror
resync, the time which is needed to resynchronize a
failure is reduced dramatically if the failure is only a
transient failure and can be fixed within
disk_repair_time. Only those extents which have been
marked as modified in between need to be rebalanced after
the failed disk has re-joined the disk group.
Figure 5: Modified Extents being
Resynced
The view v$asm_disk has a column
REPAIR_TIMER which shows the number of seconds
remaining until the disk is automatically dropped and 0 if
not failed.
% Fast mirror
resynchronization can dramatically reduce the time needed
to rebalance a disk group when a disk becomes
available again in time after a temporary
disconnect.
The drawback of fast mirror resync is
that in the time between the failure and the return of the
disk, there is only one mirror of those extents left which
are on the failed disk if the disk group uses normal
redundancy. This is a dangerous situation. Keep this in
mind when creating ASM disk groups with normal redundancy.
%
The use of ASM disk groups with normal redundancy in
combination with fast mirror
resynchronization is risky. In case of disconnect of a
second disk in the disk group, all mirrors of
extents might be lost at the same time. This effectively
could mean permanent loss of data.
%
By default, there is vulnerability for the duration of 3.6
hours (190 minutes)
The view v$asm_file has the
column redundancy_lowered as of 10gR2 which shows
whether an ASM file has extents with not the appropriate
number of mirrors for some reason:
SYS
AS SYSDBA @ +ASM SQL> SELECT group_number, file_number,
REDUNDANCY_LOWERED FROM v$asm_file;
GROUP_NUMBER FILE_NUMBER
REDUNDANCY_LOWERED
------------ ----------- --------------------
1
256 Y
1
257 N
1
258 N
1
259 N
1
260 N
1
261 N
1
262 N
1
263 N
1
264 N
1
265 N
1
266 Y
A possible reason for a lowered
redundancy could, among others, be an inaccessible disk in
the disk group. Another reason for lowered redundancy of
extents could be a lack of sufficient disk space in the disk
group or an insufficient number of fail groups required for
the redundancy level. ASM files showing
lowered_redundancy in v$asm_file are an indicator
for vulnerability. Check to see if the disks are accessible
and have enough fail groups for the redundancy level and if
the ASM disk group is running out disk space.