Oracle to capture UNIX/Linux vmstat information

Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Oracle to capture UNIX/Linux vmstat information

Don Burleson

It is a simple matter to create an Oracle table to store this information and use a script to populate the table. Creating the automated vmstat monitor begins by creating an Oracle table to contain the vmstat output:

cr_vmstat_tab.sql

connect perfstat/perfstat;

drop table stats$vmstat;
create table stats$vmstat
(
     start_date          date,
     duration            number,
     server_name         varchar2(20),
     runque_waits        number,
     page_in             number,
     page_out            number,
     user_cpu            number,
     system_cpu          number,
     idle_cpu            number,
     wait_cpu            number
)
tablespace perfstat
storage (initial   10m
         next       1m
         pctincrease 0)
;

Now that we have defined an Oracle table to capture the vmstat information, we need to write a UNIX script that will execute vmstat, capture the vmstat output, and place it into the Oracle table.

The main script to collect the vmstat information is a Korn shell script called get_vmstat.ksh. As we noted earlier, each dialect of UNIX displays vmstat information in different columns, so we need slightly different scripts for each type of UNIX.

The idea is to write a script that continually runs the vmstat utility and then directs the results into our Oracle table.

The script shows the vmstat capture utility script for the Linux operating system.

Note that you must change this script in several places to make it work for you:

· You must set the ORACLE_HOME to your directory:

ORACLE_HOME=/usr/app/oracle/admin/product/8/1/6

· You must set your ORACLE_SID in the sqlplus command:

$ORACLE_HOME/bin/sqlplus -s perfstat/perfstat@testsys1<<EOF

· You can change the duration of samples by re-setting SAMPLE_TIME UNIX variable:

SAMPLE_TIME=300

get_vmstat.ksh (Linux version)

#!/bin/ksh

# This is the Linux version

ORACLE_HOME=/usr/app/oracle/admin/product/8/1/6
export ORACLE_HOME

PATH=$ORACLE_HOME/bin:$PATH
export PATH
SERVER_NAME=`uname -a|awk '{print $2}'`
typeset -u SERVER_NAME
export SERVER_NAME

# sample every five minutes (300 seconds) . . . .
SAMPLE_TIME=300

while true
do
   vmstat ${SAMPLE_TIME} 2 > /tmp/msg$$

# run vmstat and direct the output into the Oracle table . . .
cat /tmp/msg$$|sed 1,3d | awk '{ printf("%s %s %s %s %s %s\n", $1, $8, $9,
SEE CODE DEPOT FOR FULL SCRIPT
14, $15, $16) }' | while read RUNQUE PAGE_IN PAGE_OUT USER_CPU SYSTEM_CPU
DLE_CPU
   do

      $ORACLE_HOME/bin/sqlplus -s perfstat/perfstat@testsys1<<EOF
      insert into perfstat.stats\$vmstat
                           values (
                             sysdate,
                             $SAMPLE_TIME,
                             '$SERVER_NAME',
                             $RUNQUE,
                             $PAGE_IN,
                             $PAGE_OUT,
                             $USER_CPU,
                             $SYSTEM_CPU,
                             $IDLE_CPU,
                             0
                                  );
      EXIT
EOF
   done
done

rm /tmp/msg$$

Because of the differences in implementations of vmstat, the first task is to identify the columns of the vmstat output that contain the information that we want to capture. Once we know the columns that we want to capture, we can add these columns to the vmstat script to put the output into our table.

	Run queue	Page-in	Page-out	User	System	Idle	Wait
Dialect	column	column	column	column	column	column	column
HP/UX	1	8	9	16	17	18	NA
AIX	1	6	7	14	15	16	17
Solaris	1	8	9	20	21	22	NA
Linux	1	8	9	14	15	16	NA

Using this table, you can adjust the capture script according to your operating system. You customize the script by changing the line in the script that reads the vmstat output and places it into the stats$vmstat table. Here is a summary of the UNIX dialect changes to this line.

HP/UX vmstat Columns

cat /tmp/msg$$|sed 1,3d |\

awk '{ printf("%s %s %s %s %s %s\n", $1, $8, $9, $16, $17, $18) }' |\

while read RUNQUE PAGE_IN PAGE_OUT USER_CPU SYSTEM_CPU IDLE_CPU

IBM AIX vmstat Columns

cat /tmp/msg$$|sed 1,3d |\

awk '{ printf("%s %s %s %s %s %s\n", $1, $6, $7, $14, $15, $16, $17) }' |\

while read RUNQUE PAGE_IN PAGE_OUT USER_CPU SYSTEM_CPU IDLE_CPU WAIT_CPU

Sun Solaris vmstat Columns

cat /tmp/msg$$|sed 1,3d |\

awk '{ printf("%s %s %s %s %s %s\n", $1, $8, $9, $20, $21, $22) }' |\

while read RUNQUE PAGE_IN PAGE_OUT USER_CPU SYSTEM_CPU IDLE_CPU

Linux vmstat columns

cat /tmp/msg$$|sed 1,3d |\

awk '{ printf("%s %s %s %s %s %s\n", $1, $8, $9, $14, $15, $16) }' |\

while read RUNQUE PAGE_IN PAGE_OUT USER_CPU SYSTEM_CPU IDLE_CPU

Internals of the vmstat Capture Script

It is important to understand how the get_vmstat.ksh script functions, so let's examine the steps in this script:

1. It executes the vmstat utility for the specified elapsed-time interval (SAMPLE_TIME=300).

2. The output of the vmstat is directed into the /tmp directory

3. The output is then parsed using the awk utility, and the values are inserted into the mon_vmstats table.

Once started, the get_vmstat.ksh script will run continually and capture the vmstats into your stats$vmstat table. This script is an example of a UNIX daemon process, and it will run continually to sample the server status. However, the script may be terminated if you server is rebooted, so it is a good idea to place a crontab entry to make sure that the get_vmstat script is always running. Below is a script called run_vmstat.ksh that will ensure that the vmstat utility is always running on your server.

Note that you must make the following changes to this script:

· Set the file location variable vmstat to the directory that contains your get_vmstat.ksh script:

vmstat=`echo ~oracle/vmstat`

· Create a small file in your UNIX file directory ($vmstat) called mysid. This file will contain one line and specify the name of your ORACLE_SID.

ORACLE_SID=`cat ${vmstat}/mysid`

run_vmstat.ksh

#!/bin/ksh

# First, we must set the environment . . . .
vmstat=`echo ~oracle/vmstat`
export vmstat
ORACLE_SID=`cat ${vmstat}/mysid`
export ORACLE_SID

ORACLE_HOME=`cat /etc/oratab|grep $ORACLE_SID:|cut -f2 -d':'`
export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH
export PATH

#----------------------------------------
# If it is not running, then start it . . .
#----------------------------------------
check_stat=`ps -ef|grep get_vmstat|grep -v grep|wc -l`;
SEE CODE DEPOT FOR FULL SCRIPT
oracle_num=`expr $check_stat`
if [ $oracle_num -le 0 ]
then nohup $vmstat/get_vmstat_linux.ksh > /dev/null 2>&1 &
fi

The run_vmstat.ksh script can be scheduled to run hourly on the server. As we can see by examining the code, this script checks to see if the get_vmstat.ksh script is executing. If it is not executing, the script resubmits it for execution. In practice, the get_vmstat.ksh script will not abort, but if the server is shut down and restarted, the script will need to be restarted.

Here is an example of the UNIX crontab file. For those not familiar with cron, the cron facility is a UNIX scheduling facility that allows tasks to be submitted at specific times. Note that it schedules the run_vmstat.ksh script every hour, and runs a vmstat exception report every day at 7:00 a.m.

00 * * * * /home/vmstat/run_vmstat.ksh > /home/vmstat/r.lst

00 7 * * * /home/vmstat/run_vmstat_alert.ksh prodb1 > /home/vmstat/v.lst

Now that we see how to monitor the Oracle database server, let's examine how we can use this technique to report on other Oracle-related server. This technique is very handy for reporting on Oracle Web servers and application servers.

Reporting vmstat Information on Other Oracle Servers

To get a complete picture of the performance of your total Oracle system, you must also monitor the behavior of all of the servers that communicate with Oracle. For example, many Oracle environments have other servers:

· Oracle Applications In Oracle Applications products, you generally have separate application servers communicating with the database server.

· SAP with Oracle In SAP, you have separate application servers that communicate with Oracle.

· Real Application Clusters (Oracle Parallel Server) With RAC, you have multiple Oracle database servers, all sharing the same database.

· Oracle Web Applications When using Oracle databases on the Web, you have separate WebServers that direct the communications into the database.

This technique in get_vmstat.ksh can easily be extended to measure the performance of other servers in your Oracle environment. Note that the stats$vmstat table has a column to store the server name. Since we can separate vmstat metrics by server, we simply need to create a remote vmstat script that will capture the performance of the other servers and send the data to a central database. Because only the database server contains an Oracle database, the vmstat data will be sent to the database from the remote server using database links. Any server that has a Net8 client can be used to capture vmstat information.

If we take a close look at the get_vmstat script from above, we see that this script can be executed on a remote server. The script will send the vmstat data to the server that contains our Oracle database using a database link. Note where the script enters sqlplus using "sqlplus perfstat/perfstat@prod".

By collecting the data remotely, we can capture a complete picture of the performance of all of the components of the Oracle environment, not just the database server. This is important in cases where you need to track slow performance of ecommerce systems. Using this vmstat information, you can go back to the time of the slowdown and see which Web servers may have been overloaded and also examine the load on the database server.

vmstat Reporting

The SQL script vmstat_alert.sql can quickly give a complete exception report on all of the servers in our Oracle environment. This report will display times when the CPU and RAM memory exceed your predefined thresholds:

set lines 80;
set pages 999;
set feedback off;
set verify off;

column my_date heading 'date       hour' format a20
column c2      heading runq   format 999
column c3      heading pg_in format 999
column c4      heading pg_ot format 999
column c5      heading usr    format 999
column c6      heading sys    format 999
column c7      heading idl    format 999
column c8      heading wt     format 999

ttitle 'run queue > 2|May indicate an overloaded CPU|When runqueue exceeds
the number of CPUs| on the server, tasks are waiting for service.';

select
server_name,
to_char(start_date,'YY/MM/DD    HH24') my_date,
avg(runque_waits)       c2,
avg(page_in)            c3,
avg(page_out)           c4,
avg(user_cpu)           c5,
avg(system_cpu)         c6,
avg(idle_cpu)           c7
from
perfstat.stats$vmstat
WHERE
runque_waits > 2
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD    HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD    HH24')
;

ttitle 'page_in > 1|May indicate overloaded memory|Whenever Unix performs
a page-in, the RAM memory | on the server has been exhausted and swap pages are being used.';

select
server_name,
to_char(start_date,'YY/MM/DD    HH24') my_date,
avg(runque_waits)       c2,
avg(page_in)            c3,
avg(page_out)           c4,
avg(user_cpu)           c5,
avg(system_cpu)         c6,
avg(idle_cpu)           c7
from
perfstat.stats$vmstat
WHERE
page_in > 1
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD    HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD    HH24')
;

ttitle 'user+system CPU > 70%|Indicates periods with a fully-loaded CPU subssystem.|Periods of 100% utilization are only a | concern when runqueue values exceeds the number of CPs on the server.';

select
server_name,
to_char(start_date,'YY/MM/DD    HH24') my_date,
avg(runque_waits)       c2,
avg(page_in)            c3,
avg(page_out)           c4,
avg(user_cpu)           c5,
avg(system_cpu)         c6,
avg(idle_cpu)           c7
from
perfstat.stats$vmstat
WHERE
(user_cpu + system_cpu) > 70
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD    HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD    HH24')
;

The standard vmstat alert report is used to alert the Oracle DBA and systems administrator to out-of-bounds conditions on each Oracle server. These conditions include:

· CPU waits > 40% (AIX version only) This may indicate I/O-based contention. The solution is to spread files across more disks or add buffer memory.

· Run queue > xxx - (where xxx is the number of CPUs on the server, 2 in this example) This indicates an overloaded CPU. The solution is to add additional processors to the server.

· Page_in > 2 with correlated scan rates Page-in operations can indicate overloaded memory. The solution is to reduce the size of the Oracle SGA, PGA, or add additional RAM memory to the server.

· User CPU + System CPU > 90% This indicates periods where the CPU is highly utilized.

While the SQL here is self-explanatory, let's look at a sample report and see how it will help our systems administrator monitor the server's behavior:

SQL> @vmstat_alert 7

Wed Dec 20                                                             page    1
                                 run queue > 2
                         May indicate an overloaded CPU.
                    When runqueue exceeds the number of CPUs
                  on the server, tasks are waiting for service.

SERVER_NAME     date       hour      runq pg_in pg_ot usr sys idl
--------------- -------------------- ---- ----- ----- ---- ---- ----
AD-01            01/13/04   17          3     0     0   87    5    8

Wed Dec 20                                                             page    1
                                  page_in > 1
                         May indicate overloaded memory.
               Whenever Unix performs a page-in, the RAM memory
         on the server has been exhausted and swap pages are being used.

SERVER_NAME          date       hour      runq pg_in pg_ot usr sys idl
-------------------- -------------------- ---- ----- ----- ---- ---- ----
AD-01                01/12/13    16          0     5     0    1    1   98
AD-01                01/12/14    09          0     5     0   10    2   88
AD-01                01/12/15    16          0     6     0    0    0 100
AD-01                01/12/19    20          0    29     2    1    2   98
PROD1DB              01/12/13    14          0     3    43    4    4   93
PROD1DB              01/12/19    07          0     2     0    1    3   96
PROD1DB              01/12/19    11          0     3     0    1    3   96
PROD1DB              01/12/19    12          0     6     0    1    3   96
PROD1DB              01/12/19    16          0     3     0    1    3   96
PROD1DB              01/12/19    17          0    47    68    5    5   91

Wed Dec 20                                                             page    1
                               user+system > 70%
             Indicates periods with a fully-loaded CPU sub-system.
                    Periods of 100% utilization are only a
      concern when runqueue values exceeds the number of CPUs on the server.

SERVER_NAME          date       hour      runq pg_in pg_ot usr sys idl
-------------------- -------------------- ---- ----- ----- ---- ---- ----
AD-01                01/12/13    14          0     0     2   75    2   22
AD-01                01/12/13    17          3     0     0   87    5    8
AD-01                01/12/15    15          0     0     0   50   29   22
AD-01                01/12/15    16          0     0     0   48   33   20
AD-01                01/12/19    07          0     0     0   77    4   19
AD-01                01/12/19    10          0     0     0   70    5   24
AD-01                01/12/19    11          1     0     0   60   17   24
PROD1                01/12/19    12          0     0     1   52   30   18
PROD1                01/12/19    13          0     0     0   39   59    2
PROD1                01/12/19    14          0     0     0   39   55    6
PROD1                01/12/19    15          1     0     0   57   23   20

You may notice that this exception report gives the hourly average for the vmstat information. If you look at the get_vmstat.ksh script, you will see that the data is captured in intervals of every 300 elapsed seconds (5-minute intervals). Hence, if you see an hour where your server is undergoing stress, you can modify your script to show the vmstat changes every five minutes. You can also run this report in conjunction with other STATSPACK reports to identify what tasks may have precipitated the server problem. The stats$sql_summary table is especially useful for this purpose.

��