 |
|
Oracle OS watcher (OSWatcher) tips
Oracle Database Tips by Donald Burleson |
The new Oracle OS watcher (OSWatcher) reports
CPU, RAM and Network stress, and is a new alternative for monitoring
Oracle servers. OS Watcher complements the
Ion tool for proactive
Oracle monitoring.
Also see my Oracle
file watcher tips and
OSWatcher Analyzer Tips
Oracle does not run in a vacuum, and it's
important to monitor stress on your server, disk, RAM and network. Oracle
provides several tools for monitoring the external environment, including:
For those who do not have a
license to
access the AWR dba_hist tables (Oracle performance pack), Oracle OS
watcher is a free solution for UNIX/Linux RAC shops:
MOSC note 301137.1 has the users guide for
Oracle OS Watcher (OSW), a collection of UNIX C shell scripts that
help diagnose server and network bottlenecks. The Oracle OS watcher
is nicknamed OSWatcher, and it is user-configurable, collecting one
hour worth of OS data at one minute intervals, and then writing the
hour's data to an archive flat file...
Actually the sample interval is configurable down to 1 second as
opposed to one minute. Default value is 30 seconds.
Oracle OS watcher is especially useful for
Linux/UNIX-based RAC systems where monitoring the OS is important
for identifying CPU, RAM or network stress. Oracle OS Watcher may
invoke these popular UNIX/Linux utilities, depending on the platform
(Solaris, HP/UX, Linux and Tru64):
- vmstat
- iostat
- top
- netstat
- traceroute
Starting Oracle OS Watcher
(OSWatcher)
You can start Oracle OS Watcher with this
command, specifying the data collection interval (in seconds) and
the max number of hours to keep archive files. In this example we
submit the collector as a background job to collect every 5 minutes
and keep 24 hours of archive files, writing all messages to
oswatcher.log:
nohup
/u01/app/oracle/scripts/startOSW.sh 300 24 & >
/u01/app/oracle/scripts//oswatcher.log
Downloading OSWatcher
You can
download Oracle OSWatcher here:
Using Oracle OS Watcher requires knowledge of
UNIX and Linux C shell commends syntax, but it removes much of the
tedium from OS monitoring for those who are not licensed to use AWR
automatic OS statistics collection.
You can write your own vmstat collection
scripts very easily:
# run
vmstat and direct the output into the Oracle table . . .
cat /tmp/msg$$|sed 1,3d | awk '{ printf("%s %s %s %s %s %s\n", $1,
$8, $9,
14, $15, $16) }' | while read RUNQUE PAGE_IN PAGE_OUT USER_CPU
SYSTEM_CPU
DLE_CPU
do
$ORACLE_HOME/bin/sqlplus -s perfstat/perfstat@testsys1<<EOF
insert into perfstat.stats\$vmstat
values (
sysdate,
$SAMPLE_TIME,
'$SERVER_NAME',
$RUNQUE,
$PAGE_IN,
$PAGE_OUT,
$USER_CPU,
$SYSTEM_CPU,
$IDLE_CPU,
0
);
EXIT
EOF
done
The OSWatcher utility captures performance metrics
of the database host, very similar to the Cluster Health Monitor.
While both tools are similar, they have their differences. The list
below highlights some of the variances between the two.
·
OSWatcher may not be able to collect
metrics when the system is under a very heavy CPU load while CHM will
still be able to gather the data.
·
CHM gathers data every second or every
5 seconds depending on the version. By default, OSWatcher gathers data
every minute. CHM provides more detail but OSWatcher requires less
storage space.
·
OSWatcher has an analyzer that can
create performance graphs covering a much longer timeframe than CHM's
GUI tool.
·
CHM lets the database administrator see
performance metrics for all nodes in the cluster while OSWatcher
analyzes one node at a time.
·
CHM runs on Windows but OSWatcher does
not. For Windows platforms, use CHM only.
·
OSWatcher contains data from
top,
netstat, and
traceroute that is missing
from CHM.
·
Since CHM is a managed cluster
resource, it starts automatically when Grid Infrastructure is started.
OSWatcher needs to be manually started although one can certainly
create a script to be used on server startup.
The CHM utility is preferred if one must be chosen
over the other, primarily due to the first bullet point above. That
being said, Oracle does recommend running both tools, if possible, to
take advantage of their individual strengths.
The OSWatcher does not gather OS performance metrics
on its own. Instead, it relies on the Unix or Linux utilities
top,
ps,
mpstat,
ifconfig,
vmstat,
netstat,
iostat, and
traceroute. On
Linux, the utilities meminfo
and slabinfo will also be
used. When OSWatcher starts, it spawns data collector processes. One
data collector process will run the
vmstat utility, gather the
output, store the results in a file, and then go to sleep until
collection is needed again. Similar data collector processes work for
the other OS utilities.
Unlike CHM, OSWatch is not integrated with any
Oracle software, and you must download the utility and install it. The
download link can be found in My Oracle Support Note 301137.1 for
those that have a paid My Oracle Support Community (MOSC) contract.
The download consists of a single tar file. OSWatcher was originally
called OSWatcher Black Box. As such, references to the acronym
oswbb will be found when
working with this tool. Even the download file's name is of the form
oswbbxxx.tar where
xxx is a version number.
Extracting the file's contents produces a directory named oswbb
containing the OSWatcher utility.
[oracle@host01 ~]$ tar xvf oswbb730.tar
One of the benefits of OSWatcher is that it will
also examine the Cluster Interconnect for the Oracle RAC Cluster.
Looking at the private network is not set up by default. Before
OSWatcher is started for the first time, the database administrator
needs to create a file named private.net in the OSWatcher directory
with the platform appropriate
traceroute command. OSWatcher includes a file named
exampleprivate.net that
shows sample commands for each supported platform. The following shows
the private.net file for a
2-node Linux cluster. This file exists on host01 and the
traceroute command is to
host02. The last line below changes the file's permissions to allow
the file to be executable.
[oracle@host01 oswbb]$ cat private.net
echo "zzz ***"`date`
traceroute -r -F host02-priv
[oracle@host01 oswbb]$ chmod 755 private.net
If there were three nodes in the cluster, the file
on host01 would contain a second
traceroute command to
host03-priv. Similarly, OSWatcher configuration on the other nodes in
the cluster would contain traceroute commands to the other nodes in
the cluster.
With OSWatcher configured for Oracle RAC, it is time
to start the utility so that data can be collected. The
startOSWbb.sh script is
used to start data collection.
[oracle@host01 oswbb]$
./startOSWbb.sh
Info...You did not enter a value for
snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for
archiveInterval.
Info...Using default value = 48
Setting the archive log directory
to/home/oracle/oswbb/archive
Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
IFCONFIG found on your system.
NETSTAT found on your system.
TOP found on your system.
Warning... /proc/slabinfo not found on your system.
Testing for discovery of OS CPU COUNT
oswbb is looking for the CPU COUNT on your system
CPU COUNT will be used by oswbba to automatically
look for cpu problems
CPU COUNT found on your system.
CPU COUNT = 1
Discovery completed.
Starting OSWatcher Black Box v7.3.0
on Sat Sep 6 03:58:51 CDT 2015
With
SnapshotInterval = 30
With
ArchiveInterval = 48
OSWatcher Black Box - Written by Carl Davis, Center
of Expertise,
Oracle Corporation
For questions on install/usage please go to MOS
(Note:301137.1)
If you need further assistance or have comments or
enhancement
requests you can email me Carl.Davis@Oracle.com
Data is stored in directory:
/home/oracle/oswbb/archive
Starting Data Collection...
oswbb heartbeat:Sat Sep 6 03:58:56 CDT 2015
In the example above, OSWatcher was started with
default values. Metrics will be obtained every 30 seconds and
OSWatcher will retain 48 hours worth of data. When starting OSWatcher
as done above, the utility maintains control of the session. Should
the session terminate, so will data collection. Also, the screen will
be filled with heartbeat information.
oswbb heartbeat:Sat Sep 6 03:59:56 CDT 2015
oswbb heartbeat:Sat Sep 6 04:00:26 CDT 2015
oswbb heartbeat:Sat Sep 6 04:00:56 CDT 2015
To remedy this situation, OSWatcher can be stopped.
In another session, the stop script is executed as follows.
[oracle@host01 oswbb]$ ./stopOSWbb.sh
Next, OSWatcher will be started in the background:
[oracle@host01 oswbb]$ nohup ./startOSWbb.sh &
When OSWatcher is started for the first time, the
archive directory is created. By default, this directory is a
subdirectory of the main oswbb
directory. The output below shows the contents of the archive
directory.
[oracle@host01 oswbb]$ cd /home/oracle/oswbb/archive
[oracle@host01 archive]$ ls -l
total 40
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswifconfig
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswiostat
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswmeminfo
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswmpstat
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswnetstat
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 03:58 oswprvtnet
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswps
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 03:58 oswslabinfo
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswtop
drwxr-xr-x 2 oracle oinstall 4096 Sep
6 04:00 oswvmstat
The archive directory contains one subdirectory for
each process being captured. It should be easy to tell what each
subdirectory contains simply by inspecting the directory name. Looking
inside one of the directories, we can see the files that contain the
metric data.
[oracle@host01 archive]$ cd oswtop
[oracle@host01 oswtop]$ ls -l
total 108
-rw-r--r-- 1 oracle oinstall 14307 Sep
6 03:59 host01.localdomain_top_14.09.06.0300.dat
-rw-r--r-- 1 oracle oinstall 86398 Sep
6 04:32 host01.localdomain_top_14.09.06.0400.dat
OSWatcher will create a new data file each hour.
Each file contains a line with the string "zzz ***" followed by a
timestamp. The lines that follow the timestamp are the output of the
OS command. As an example, the following output shows the contents of
the private network traceroute
commands.
[oracle@host01 oswprvtnet]$
cat
host01.localdomain_prvtnet_14.09.06.0300.dat
zzz ***Sat
Sep 6 03:58:56 CDT 2015
traceroute to host02-priv (192.168.10.2), 30 hops
max, 60 byte packets
1
host02-priv.localdomain (192.168.10.2)
0.292 ms 0.166 ms
0.268 ms
[oracle@host01 oswprvtnet]$ cat
host01.localdomain_prvtnet_14.09.06.0400.dat
zzz ***Sat
Sep 6 04:27:41 CDT 2015
traceroute to host02-priv (192.168.10.2), 30 hops
max, 60 byte packets
1
host02-priv.localdomain (192.168.10.2)
0.363 ms 0.182 ms
0.108 ms
OSWatcher includes a File Manager process that will
run once per hour to clean up any data files older than the retention
period. The collection interval and retention period can be changed
with the first two parameters, respectively, to the shell script that
starts OSWatcher. The following starts OSWatcher to collect metrics
every 120 seconds and store the data for 72 hours.
[oracle@host01 oswbb]$ ./OSWatcher 120 72
This section has provided the information for the
database administrator to get the OSWatcher utility up and running.
While very similar to the Cluster Health Monitor, the OSWatcher has
enough differences to warrant using both tools. Just as CHM has a
utility to help analyze the data, OSWatcher has its utility that is
discussed in the next section.
|
 |
|
Learn RAC Tuning
Internals!
This is an excerpt from the landmark book
Oracle RAC Performance tuning,
a book that provides real world advice for resolving
the most difficult RAC performance and tuning issues.
Buy it
for 30% off directly from the publisher.
|
|
|