 |
|
Oracle UNIX
Monitoring Goals Administration
Oracle UNIX/Linux Tips by Burleson Consulting |
UNIX Monitoring Goals
The monitoring of the UNIX environment
involves monitoring disk, RAM, CPU and network components. For
this book, we consider server monitoring to be limited to CPU and
RAM memory monitoring. We have divided the total UNIX
monitoring tasks into three general areas:
* Server Monitoring ? Monitor CPU and RAM
memory
* Disk Monitoring ? Monitor the I/O
subsystem ? Chapter 4
* Network monitoring ? Protocol and packet
monitoring ? Chapter 5
Let?s begin with an overview of UNIX CPU
monitoring goals.
UNIX CPU Monitoring
CPU consumption on an Oracle server is a
simple matter because the server manages all CPU transactions
automatically. All servers are configured to use CPU cycles on an
as-needed basis, and the Oracle database will use CPU resources
freely. The internal machine code will manage the assignment of
processors to active tasks and ensure that the maximum amount of
processing power is applied to each task.
CPU shortages are evidenced in cases where
the CPU run queue is greater than the number of CPUs, as shown in
Figure 3-1. In these cases, the only solutions are to increase the
number of CPUs on the processor or reduce the CPU demands on Oracle.
You can decrease CPU demands on Oracle by turning off Oracle
Parallel Query, replacing the standard Oracle listener with the
multithreaded server (MTS), and other actions that would reduce the
processing demands on the hardware.
Figure 1: CPU overload on an Oracle server
with 12 CPUs
Tasks are serviced in UNIX according to
their internal dispatching priority. Important tasks such as the
UNIX operating system tasks will always have a more favorable
dispatching priority because the UNIX system tasks drive the
operating system
CPU overload is usually evidenced by high
values in the vmstat runqueue column. Whenever the runqueue value
exceeds the number of CPUs of the server, some task may be waiting
for service. When we see a CPU overload, we have several
alternatives:
1. Add additional processors? This is
usually the best solution, because an Oracle server that is
overloading the CPU will always run faster with additional
processors.
2. Reduce server load? If the CPU overload
is not constant, task load balancing may be the solution. For
example, it is not uncommon to see a server overloaded during peak
work hours, and then return to 80-percent idle in the evenings. In
these cases, batch tasks can be rescheduled to execute when there
are more idle CPU resources available.
3. Alter task dispatching priorities? Most
all UNIX operating systems allow the root user to change the
dispatching priority for tasks. As a general rule, the online
database background tasks are given more priority (a smaller
priority value), while less critical batch processes are placed with
less priority (a higher priority value). However, altering the
default dispatching priorities is not a good long-term solution, and
it should only be undertaken in emergency situations.
Upgrading an Entire Server
On mission-critical databases where speed is
a primary concern, adding additional processors may not be the best
solution. Oracle tuning professionals will sometimes recommend
upgrading to a faster server architecture. For example, many of the
new 64-bit CPU processors will handle Oracle transactions an order
of magnitude faster than their 32-bit predecessors. For example, in
the IBM AIX environment, the IBM SP2 processors run on 32 bits.
IBM's next generation of processors utilize a 64-bit technology, and
these systems can process information far faster than their 32-bit
ancestors. The new IBM Blackbird servers will often double the
overall processing speed of an Oracle database.
When making recommendations for upgrades of
entire servers, many Oracle tuning professionals use the analogy of
the performance of a 16-bit PC compared to the performance of 32-bit
PC. In general, moving to faster CPU architecture can greatly
improve the speed of Oracle applications, and many vendors such as
IBM will allow you to actually load your production system onto one
of the new processors for speed benchmarks prior to purchasing the
new servers.
Adding Additional CPU Processors
Most symmetric multiprocessor (SMP)
architectures for Oracle databases servers are expandable, and
additional processors can be added at any time. Once added, the
processor architecture will immediately make the new CPUs available
to the Oracle database.
The problem with adding additional
processors is the high cost that can often outweigh the cost of a
whole new server. Adding additional processors to an existing server
can commonly cost over $100,000, and most managers require a
detailed cost-benefit analysis when making the decision to buy more
CPUs. Essentially, the cost-benefit analysis compares the lost
productivity of the end users (due to the response time latency)
with the additional costs of the processors.
Another problem with justifying additional
processors is the sporadic nature of CPU overloads. Oracle database
servers often experience ?transient? overloads, and there will be
times when the processors are heavily burdened and other times when
the processors are not at full utilization. Before recommending a
processor upgrade, most Oracle tuning professionals will perform a
load-balancing analysis to ensure that any batch-oriented tasks are
presented to the server at non-peak hours.
Load Balancing of Server Tasks
When CPU overload is experienced, the DBA
will generally see periods during the days when the runqueue gets
quite long and other periods at night when the processors are mostly
idle. (Figure 3-2) A common question asked by a systems
administrator is ?The CPU is 40-percent idle for 16 hours a day, so
why should we add more processors??
Figure 2: Average Oracle server run queue
values before task load balancing
However, there are times when it makes sense
to add more processors, even if the processors are idle during
off-peak times. For example, if you are working in an online
environment, the only response time that matters is the time between
7:00 A.M. and 8:00 P.M. when your online users are active. The fact
that the server is largely idle during the middle of the night has
no bearing on the decision to upgrade with additional CPUs.
Once we identify the times when the CPU
activity is excessive, we need to go to STATSPACK and examine the
activity at the times of the overload. Once we have identified the
times when the processors are overloaded, we must then see if it is
possible to reschedule batch tasks to run at off-peak hours. On an
Oracle database server, tasks may be scheduled in many ways:
* The dbms_job utility
* The UNIX cron utility
* A TP monitor such as Tuxedo
* Oracle Concurrent Manager (for Oracle
Applications)
* SAPGUI if we are running SAP
Regardless of the method of scheduling
Oracle tasks, the idea is to find large batch tasks that may be
scheduled during peak processing times. Our task is to find a large
regularly scheduled SQL task that runs during these times.
STATSPACK Solutions for Finding
High-Impact Tasks
Within STATSPACK, we can examine the
stats$sql_summary table, looking for SQL statements that are
executed during the peak hours. The high-impact tasks will generally
be associated with SQL statements that have a high value for rows
processed. Here is an easy STATSPACK table query to find the SQL:
rpt_top_sql.sql
set lines 80;
set pages 999;
set heading off;
select
to_char(snap_time,'yyyy-mm-dd hh24'),
substr(sql_text,1,50)
from
stats$sql_summary a,
stats$snapshot sn
where
a.snap_id = sn.snap_id
and
to_char(snap_time,'hh24') = 10
or
to_char(snap_time,'hh24') = 15
order by
rows_processed desc;
Here is a sample of the output from this
report, showing the beginning of all SQL statements, ordered in
descending order of rows_processed. This tells the DBA which
SQL statements are causing the most I/O resources.
Yr.
Mo dy Hr SQL_TEXT
------------- -------------------------------------------------
2000-09-20 10 begin :retCode := toc_maint . toc_insert_entry ( :
2000-09-20 10 begin :retCode := toc_maint . toc_insert_entry ( :
2000-09-20 15 INSERT INTO TOC_ENTRY ( ISBN,TOC_SEQ_NBR,VISUAL_PA
2000-09-20 15 INSERT INTO TOC_ENTRY ( ISBN,TOC_SEQ_NBR,VISUAL_PA
2000-09-20 15 SELECT PAGE_SEQ_NBR FROM PAGE WHERE
(ISBN = :b1
2000-09-20 15 SELECT PAGE_SEQ_NBR FROM PAGE WHERE
(ISBN = :b1
2000-09-21 10 select 'ALTER ' || substr(object_type,1,20) || ' '
2000-09-21 10 DECLARE job BINARY_INTEGER := :job; next_date DATE
2000-09-21 10 DECLARE job BINARY_INTEGER := :job; next_date DATE
2000-09-21 10 DECLARE job BINARY_INTEGER := :job; next_date DATE
2000-09-20 15 SELECT IMAGE_BLOB FROM PAGE_IMAGE
WHERE (ISBN =
2000-09-20 15 SELECT IMAGE_BLOB FROM PAGE_IMAGE
WHERE (ISBN =
2000-09-20 15 begin pageimages_curs . page_get_image ( :myisbn:i
2000-09-20 15 SELECT IMAGE_BLOB FROM PAGE_IMAGE
WHERE (ISBN =
2000-09-20 15 BEGIN sys.dbms_ijob.remove(:job); END;
2000-09-20 15 begin pageimages_curs . page_get_image ( :myisbn:i
2000-09-20 15 BEGIN sys.dbms_ijob.remove(:job); END;
If we are diligent, we can locate the online
or batch SQL tasks that are overwhelming the server. Once we
reschedule the offending tasks to run during off-peak hours, our
average runqueue for the server falls below the number of CPUs, as
shown in Figure 3-3.
Figure 3: Average Oracle server runqueue
values after task load balancing
Of course, load balancing Oracle tasks is
often not as trivial is it might appear. The Oracle DBA generally
has no control over when members of the end-user community submit
batch-oriented work against the Oracle database. Unless you are
using scheduling tools such as Oracle's Concurrent Manager, the end
users are free to submit large resource-intensive reports against
Oracle anytime they feel like it. In shops where end users generate
ad hoc SQL statements, there are three Oracle features that can
govern end-user query resources:
* The dbms_resource_manager PL/SQL package
can be used to govern end-user resources. In the example below,
dbms_resource_manager is called to assign CPU, SQL plan, and
Parallel Query options for the ADHOC_Group of end users:
SQL> execute
dbms_resource_manager.update_plan_directive(
plan => 'SINGLE_LEVEL_PLAN',
group_or_subplan => 'ADHOC_Group',
new_comment => 'ADHOC day users sessions at level 1',
new_cpu_p1 => 10,
new_parallel_degree_limit_p1 => 0);
* Oracle Applications systems have a profile
feature to restrict ad hoc queries. For ad hoc users, their Oracle
user ID is associated with a group, and the group is linked to a
profile that governs the amount of server resources the end users
are allowed to use.
* SQL*Plus profilesare for end users who
execute queries from SQL*Plus. The product_user_profile table can be
used to restrict access.
Hence, it is up to the Oracle professional
to become a detective, and hunt through the library cache in order
to see when high-resource statements are being added to the Oracle
system.
The stats$sql_summary table is one STATSPACK
table that has a great deal of information to help you find
offending SQL statements. SQL statements within the
stats$sql_summary table can be sorted according to a number of
resource utilization metrics, including rows processed, buffer gets,
disk reads, and executions.
Even though Oracle developers write most
SQL, it is still the duty of the Oracle DBA to monitor the behavior
of the SQL within the system in order to determine those SQL
statements that are creating excessive load upon the server
processor. The Oracle DBA must also load balance those intensive SQL
statements by requiring the end users to submit them during less
active times.
 |
If you like Oracle tuning, see the
book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |