Using RMAN to repair corruption
This
article deals with RMAN in 11g and the Data Recovery Adviser. I
will demonstrate the new 11g repair interfaces with a block media recover as an
example.
Oracle
has introduced a new feature in 11g which can be used to get assistance in case
of loss or corruption of datafiles, redo log files or controlfiles. This tool is
called data recovery advisor and is accessible from the command
line as well as from the GUI.
The
command line interface is implemented as new commands for RMAN, the graphical
interface is integrated into the Enterprise Manager. In the following
little demo I want to show how we can deal with corrupted blocks in11g.
I
proactively check for block corruption in the database on a regular basis and
strongly recommend this to my customers.
By
default RMAN checks for physically corrupted blocks automatically
with every backup it creates, no matter if it is a backup set or an image copy.
Logical
corruption
is only checked with the additional syntax CHECK LOGICAL. The VALIDATE keyword
causes that no physical backup is created, but RMAN only reads all blocks
through the input puffers but does not write them to the output buffer into a
backup set.
RMAN> backup validate check logical
database;
Starting backup at 03-FEB-08
using target
database control file instead of recovery catalog
allocated channel:
ORA_DISK_1
channel
ORA_DISK_1: SID=131 device type=DISK
channel
ORA_DISK_1: starting full datafile backup set
channel
ORA_DISK_1: specifying datafile(s) in backup set
input datafile
file number=00001 name=/u01/app/oracle/oradata/orcl/system01.dbf
input datafile
file number=00002 name=/u01/app/oracle/oradata/orcl/sysaux01.dbf
input datafile
file number=00003 name=/u01/app/oracle/oradata/orcl/undotbs01.dbf
input datafile
file number=00005 name=/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf
input datafile
file number=00004 name=/u01/app/oracle/oradata/orcl/users01.dbf
channel
ORA_DISK_1: backup set complete, elapsed time: 00:01:26
List of Datafiles
. . .
=================
In 11g
RMAN directly informs us about corrupted blocks it has found in the output of
backup command and creates a trace file in the Automatic Diagnostic
Repository (ADR) :
. . .
File Status Marked
Corrupt Empty Blocks Blocks Examined High SCN
---- ------
-------------- ------------ --------------- ----------
1 OK
0 11128 76800 622989
File Name:
/u01/app/oracle/oradata/orcl/system01.dbf
Block Type
Blocks Failing Blocks Processed
----------
-------------- ----------------
Data
0 54455
Index
0 9080
Other
0 2137
File Status Marked
Corrupt Empty Blocks Blocks Examined High SCN
---- ------
-------------- ------------ --------------- ----------
2 OK
0 20756 59896 622994
File Name:
/u01/app/oracle/oradata/orcl/sysaux01.dbf
Block Type
Blocks Failing Blocks Processed
----------
-------------- ----------------
Data
0 10864
Index
0 9779
Other
0 18497
File Status Marked
Corrupt Empty Blocks Blocks Examined High SCN
---- ------
-------------- ------------ --------------- ----------
3 OK
0 0 40960 622994
File Name:
/u01/app/oracle/oradata/orcl/undotbs01.dbf
Block Type
Blocks Failing Blocks Processed
----------
-------------- ----------------
Data
0 0
Index
0 0
Other
0 40960
File Status Marked
Corrupt Empty Blocks Blocks Examined High SCN
---- ------
-------------- ------------ --------------- ----------
4 OK
0 632 640 11608
File Name:
/u01/app/oracle/oradata/orcl/users01.dbf
Block Type
Blocks Failing Blocks Processed
----------
-------------- ----------------
Data
0 0
Index
0 0
Other
0 8
File Status Marked
Corrupt Empty Blocks Blocks Examined High SCN
---- ------
-------------- ------------ --------------- ----------
5 FAILED
0 523 1280 602803
File Name:
/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf
Block Type
Blocks Failing Blocks Processed
----------
-------------- ----------------
Data
0 708
Index
0 1
Other 4 48
validate found one or more corrupt blocks
See
trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10321.trc for details
channel
ORA_DISK_1: starting full datafile backup set
channel
ORA_DISK_1: specifying datafile(s) in backup set
including current
control file in backup set
including current
SPFILE in backup set
channel
ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control
File and SPFILE
===============================
File Type
Status Blocks Failing Blocks Examined
------------
------ -------------- -----
SPFILE
OK 0 2
Control File
OK 0 594
Finished backup at
03-FEB-08
This already is a great improvement because in pre 11g Releases
we would have had to find out yourself if there were corrupted blocks found by
selecting the information in the dynamic view v$database_block_corruption.
A
check like this which has returned corrupted blocks in 11g immediately
creates a failure in the database. A failure could also be
produces by an inaccessible datafile or a lost control- or redo logfile.
There
are multiple checkers available in 11g which are automatically run
by the server as soon as a critical error is raised. These include health checks
like database structure checks, integrity checks for the data dictionary as well
as block corruption check.
These
checks can also be started manually by the DBA by using the new package
DBMS_HM .
Also the
DBA can run these checks from the ADVISER CENTRAL in OEM. There is a new
checkers-pane next to the advisers.
Once a
failure has been detected it is possible to ask RMAN for possible repair
options.
RMAN has
the following set of new commands to deal with failures:
Let's
first have a look at the LIST FAILURE command:
RMAN> list failure;
List of Database
Failures
=========================
Failure ID
Priority Status Time Detected Summary
----------
-------- --------- ------------- -------
21821
HIGH OPEN 03-FEB-08 Datafile 5:
'/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf'
contains one or more corrupt
blocks
Now the
question for the DBA is:
?What is
the fastest way
to get the problem solved??
And it
might not always be easy to give the correct answer:
It could
be that it is faster to restore the file(s) and recover it (them). This would be
the case if very many blocks need media recovery.
Or it
could be faster to get the list of corrupted blocks and use block media recovery
with RMAN to resolve the situation.
As of
11g RMAN can also choose to use block copies from the flashback logs instead of
a backup for block media recovery.
The
ADVISE FAILURE command comes in very handy here:
RMAN> advise failure;
List of Database
Failures
=========================
Failure ID
Priority Status Time Detected Summary
----------
-------- --------- ------------- -------
21821
HIGH OPEN 03-FEB-08 Datafile 5: '/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf'
contains one or more corrupt blocks
analyzing automatic repair options;
this may take some time
using channel
ORA_DISK_1
analyzing
automatic repair options complete
Mandatory Manual
Actions
========================
no manual actions
available
Optional Manual
Actions
=======================
no manual actions
available
Automated Repair Options
========================
Option Repair
Description
------
------------------
1 Recover
multiple corrupt blocks in datafile 5
Strategy: The repair includes complete media
recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm
This
command gives us comprehensive information about the best possible option for a
repair and it also automatically generates a repair script which
contains the commands that should be run for the repair action.
This script can either be run manually or with a repair command.
RMAN> host;
[oracle@rhas4 ~]$ cat
/u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm
# block media recovery
for multiple blocks
recover datafile 5
block 43 to 46;
RMAN> repair failure;
Strategy: The
repair includes complete media recovery with no data loss
Repair script:
/u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm
contents of repair
script:
# block media
recovery for multiple blocks
recover
datafile 5 block 43 to 46;
At this point RMAN asks us to confirm the repair action.
It is also possible to run the REPAIR FAILURE command with
NOPROMPT.
Do you really want
to execute the above repair (enter YES or NO)? yes
executing repair script
Starting recover
at 03-FEB-08
using channel
ORA_DISK_1
channel
ORA_DISK_1: restoring block(s) from datafile copy /u01/app/oracle/flash_recover/ORCL/datafile/o1_mf_encrypt__3t1w5bkp_.dbf
starting media
recovery
media recovery
complete, elapsed time: 00:00:07
Finished recover
at 03-FEB-08
repair failure
complete
The
REPAIR FAILURE command can be run with PREVIEW option which does not execute the
repair script but shows it.
There
are a number of dynamic views for monitoring failures and the repair actions:
SYSTEM @ orcl SQL> SELECT table_name FROM dict WHERE
lower(table_name) like 'v$ir%';
TABLE_NAME
------------------------------
V$IR_FAILURE
V$IR_FAILURE_SET
V$IR_MANUAL_CHECKLIST
V$IR_REPAIR
SYSTEM @ orcl SQL>
desc V$IR_FAILURE
Name
Null? Type
--------------
-------- -------
FAILURE_ID
NUMBER
PARENT_ID
NUMBER
CHILD_COUNT
NUMBER
CLASS_NAME
VARCHAR2(32)
TIME_DETECTED
DATE
MODIFIED DATE
DESCRIPTION VARCHAR2(1024)
IMPACTS VARCHAR2(1024)
PRIORITY VARCHAR2(8)
STATUS VARCHAR2(12)
SYSTEM @ orcl SQL>
desc V$IR_FAILURE_SET
Name
Null? Type
-----------------
-------- -------------
ADVISE_ID
NUMBER
FAILURE_ID
NUMBER
MANUAL_REPAIRS_ONLY
VARCHAR2(3)
SYSTEM @ orcl SQL>
desc V$IR_MANUAL_CHECKLIST
Name
Null? Type
----------------
-------- -------------
ADVISE_ID
NUMBER
RANK NUMBER
REQUIRED VARCHAR2(3)
MESSAGE
VARCHAR2(1024)
SYSTEM @ orcl SQL>
desc V$IR_REPAIR
Name
Null? Type
---------------
-------- ---------------
REPAIR_ID
NUMBER
ADVISE_ID
NUMBER
SUMMARY VARCHAR2(32)
RANK
NUMBER
TIME_DETECTED
DATE
EXECUTED DATE
ESTIMATED_DATA_LOSS
VARCHAR2(20)
DETAILED_DESCRIPTION
VARCHAR2(1024)
REPAIR_SCRIPT
VARCHAR2(512)
ESTIMATED_REPAIR_TIME
NUMBER
ACTUAL_REPAIR_TIME
NUMBER
STATUS VARCHAR2(7)
Let's
take a look at a failure from here:
SYSTEM @ orcl SQL> SELECT repair_id, detailed_description,
repair_script FROM V$IR_REPAIR;
REPAIR_ID
DETAILED_DESCRIPTION REPAIR_SCRIPT
----------
----------------------------------- ---------------------------------
21882 The repair includes complete media
/u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm
Recovery
with no data loss
With the
CHANGE FAILURE command it is possible to adjust the failure priority from
CRITICAL to HIGH or LOW and vice versa which can be used to filter out failures
with the LIST command. By default the LIST FAILURE command only returns the
failures with CRITICAL and HIGH priority
It is
also possible to CLOSE a failure. A closed failure does not show up with the
LIST FAILURE command.
We can
do the same from inside Database Control.
On the
ADVISER CENRAL page we can find the CHECKERS pane from where we can view
historical health check runs as well launch a new manual health check. This uses
the health monitor which in fact is the built in package DBMS_HM
and the GUI is integrated with RMAN.

From
here we can reach the View And Manage Failures page where we get
an overview over current failures and their status and priority as well as
historical runs of health checks.

With the
Advise button we can generate a repair script and then run it from inside the
OEM console.
Summary
ADR is a
great help for every DBA. A more or less inexperienced admin can get around
making big mistakes in a disaster situation where in most cases time plays an
important role and the chance to make a mistake is rather high. He/she is not
standing alone any more with the problem but can seek direct advice.
And even
a very experienced DBA can get very valuable help in complicated situations from
ADR.
Oracle
has put all its recovery experience into this tool
So don't
get afraid of it! Use it!
Lutz Hartmann
sysdba database consulting GmbH (Switzerland)
NOTE: Rampant author Laurent Schneider has some additional insight into
creating an Oracle Automatic Diagnostic Repository (ADR).