|
 |
|
Reporting on Server Statistics
Oracle Database Tips by Donald Burleson |
Once the data is captured in
the stats$vmstat table, there is a wealth of reports that can be generated.
Because all of the server statistics exist inside a single Oracle table, it is
quite easy to write SQL*Plus queries to extract the data.
The vmstat data can be used to generate all types of
interesting reports. There are four classes of vmstat reports:
-
Exception reports These reports show the
time period where predefined thresholds are exceeded.
-
Daily trend reports These
reports are often run and used with Excel spreadsheets to produce trending
graphs.
-
Hourly trend reports These
reports show the average utilization, averaged by the hour of the day. These
reports are very useful for showing peak usage periods in a production
environment.
-
Long-term predictive
reports These reports generate a long-term trend line for performance. The
data from these reports is often used with a linear regression to predict when
additional RAM memory or CPU power is required for the server. We will cover
this report in Chapter 16.
Let's now examine the script that can be used to generate
these server reports and see how this information can help us tune our Oracle
database.
Server
Exception Reports
The SQL script
vmstat_alert.sql can quickly give a complete exception report on all of the
servers in our Oracle environment. This report will display times when the CPU
and RAM memory exceed your predefined thresholds:
L 6-13
set lines 80;
set pages 999;
set feedback off;
set verify off;
column my_date heading 'date hour' format a20
column c2 heading runq format 999
column c3 heading pg_in format 999
column c4 heading pg_ot format 999
column c5 heading usr format 999
column c6 heading sys format 999
column c7 heading idl format 999
column c8 heading wt format 999
ttitle 'run queue > 2|May indicate an overloaded CPU|When runqueue exceeds
the number of CPUs| on the server, tasks are waiting for service.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
runque_waits > 2
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'page_in > 1|May indicate overloaded memory|Whenever Unix performs
a page-in, the RAM memory | on the server has been exhausted and swap pages
are being used.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
page_in > 1
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'user+system CPU > 70%|Indicates periods with a fully-loaded CPU
subssystem.|Periods of 100% utilization are only a | concern when runqueue
values exceeds the number of CPs on the server.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
(user_cpu + system_cpu) > 70
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
The standard vmstat alert report is used to alert the
Oracle DBA and systems administrator to out-of-bounds conditions on each Oracle
server. These conditions include:
-
CPU waits > 40% (AIX version only) This
may indicate I/O-based contention. The solution is to spread files across more
disks or add buffer memory.
-
Run queue > xxx - (where xxx is the number of CPUs on the server, 2 in this example) This
indicates an overloaded CPU. The solution is to add additional processors to
the server.
-
Page_in > 2 with
correlated scan rates Page-in operations can indicate overloaded memory.
The solution is to reduce the size of the Oracle SGA, PGA, or add additional
RAM memory to the server.
-
User CPU + System CPU >
90% This indicates periods where the CPU is highly utilized.
While the SQL here is self-explanatory, let's look at a
sample report and see how it will help our systems administrator monitor the
server's behavior:
L 6-14
SQL> @vmstat_alert 7
Wed Dec 20 page
1
run queue > 2
May indicate an overloaded CPU.
When runqueue exceeds the number of CPUs
on the server, tasks are waiting for service.
SERVER_NAME date hour runq pg_in pg_ot usr sys idl
--------------- -------------------- ---- ----- ----- ---- ---- ----
AD-01 01/12/13 17 3 0 0 87 5 8
Wed Dec 20 page
1
page_in > 1
May indicate overloaded memory.
Whenever Unix performs a page-in, the RAM memory
on the server has been exhausted and swap pages are being used.
SERVER_NAME date hour runq pg_in pg_ot usr sys idl
-------------------- -------------------- ---- ----- ----- ---- ----
----
AD-01 01/12/13 16 0 5 0 1 1
98
AD-01 01/12/14 09 0 5 0 10 2
88
AD-01 01/12/15 16 0 6 0 0 0
100
AD-01 01/12/19 20 0 29 2 1 2
98
PROD1DB 01/12/13 14 0 3 43 4 4
93
PROD1DB 01/12/19 07 0 2 0 1 3
96
PROD1DB 01/12/19 11 0 3 0 1 3
96
PROD1DB 01/12/19 12 0 6 0 1 3
96
PROD1DB 01/12/19 16 0 3 0 1 3
96
PROD1DB 01/12/19 17 0 47 68 5 5
91
Wed Dec 20 page
1
user+system > 70%
Indicates periods with a fully-loaded CPU sub-system.
Periods of 100% utilization are only a
concern when runqueue values exceeds the number of CPUs on the server.
SERVER_NAME date hour runq pg_in pg_ot usr sys idl
-------------------- -------------------- ---- ----- ----- ---- ----
----
AD-01 01/12/13 14 0 0 2 75 2
22
AD-01 01/12/13 17 3 0 0 87 5
8
AD-01 01/12/15 15 0 0 0 50 29
22
AD-01 01/12/15 16 0 0 0 48 33
20
AD-01 01/12/19 07 0 0 0 77 4
19
AD-01 01/12/19 10 0 0 0 70 5
24
AD-01 01/12/19 11 1 0 0 60 17
24
PROD1 01/12/19 12 0 0 1 52 30
18
PROD1 01/12/19 13 0 0 0 39 59
2
PROD1 01/12/19 14 0 0 0 39 55
6
PROD1 01/12/19 15 1 0 0 57 23 20
You may notice that this exception report gives the
hourly average for the vmstat information. If you look at the get_vmstat.ksh
script, you will see that the data is captured in intervals of every 300 elapsed
seconds (5-minute intervals). Hence, if you see an hour where your server is
undergoing stress, you can modify your script to show the vmstat changes every
five minutes. You can also run this report in conjunction with other STATSPACK
reports to identify what tasks may have precipitated the server problem. The
stats$sql_summary table is especially useful for this purpose.
This is an excerpt from "Oracle9i
High Performance tuning with STATSPACK" by Oracle Press.
 |
If you like Oracle tuning, you may enjoy the new book "Oracle
Tuning: The Definitive Reference", over 900 pages
of BC's favorite tuning tips & scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |
|