So how well does this simple three-step process
for database tuning and optimization work in the virtual world?
Review the steps:
1.
Ask a lot of high level, dumb questions, and verify the
basics
2.
Perform Method-R like database application trace file
analysis
3.
Perform a final database health check using diagnostic
software
The first step relies on your wisdom or
experience-based insights. Since virtualization, and particularly
Oracle databases on virtual servers, is so relatively immature, this
step is not as reliable. We do not as yet have sufficient empirical
evidence to support making too many preemptive tuning strikes. But
here are some best practices worth checking at this stage of the
process, i.e. high-return low hanging fruit:
-
Hardware Sizing
-
Better to have more memory rather than
fastest CPUs
-
Consider adding inexpensive NIC per
virtual machine
-
Ponder spreading virtual machine I/O
across controllers
-
Workload Related
-
Are the cumulative co-hosted virtual
machines workloads characteristics compatible and well balanced?
-
Is each of the co-hosted virtual
machines configuration properties proportioned for concurrency?
-
Are too many of the co-hosted virtual
machines configuration properties using multiple virtual
processors? (estimated 20-30% overhead /VM doing so)
-
Database Related
-
Do the database configuration
parameters match the current properties for virtual machine? (e.g.
SGA size, CPU count, DB Writers, I/O Slaves, threads/CPU, etc)
-
Are the cumulative co-hosted virtual
machines database configuration parameters reasonably defined?
-
Are the cumulative co-hosted virtual
machines database workloads compatible and sensibly balanced?
The second step of implementing Method-R based
instrumentation analysis is essentially unchanged. However, there
are two small wrinkles to take into account. First, treat
multi-database virtual machines on a single physical server as you
would have historically treated single machines running with
multiple instances, i.e. examine the trace in terms of the
multi-database context. While there is not a truly reliable direct
correspondence between these two scenarios, the basic logic and
corrections for multi-instance tuning can often apply to the virtual
world or at least provide a sound starting point. This lets you
keep and apply more of your historical know-how.
Second and more critically, Method-R attempts to
eliminate skew during database performance analysis. Skew is
the measure of non-uniformity in a set of performance related data
and is often non-obvious, uncorrelated and can lead to incorrect
tuning and optimization conclusions. The problem is that
virtualization by its very nature complicates process monitoring in
general and especially instrumentation. The
virtual machines system clock will not be 100% accurate due to time
drift inherent with this architecture, i.e. client OS system clock
actually accessing abstracted hardware and further time sliced by
the host. Thus, the actual run and wait time values may
be slightly off. However, their proportional ratios should be fairly
reliable. So Method-R should generally work as expected.
Third and most critically, the monitoring and
diagnostic approach by computing ratios based off aggregate
performance numbers is severely challenged on virtual machines. We
know the Method-R people will tell us that this method suffers skew
due to the very aggregate nature of the numbers examined. However,
with the run and wait times suffering drift (thereby making the OS
times off), the database being a level higher than that and the
performance data being aggregate in nature anyhow, the cumulative
effect is that not only are the numbers off, they may well even be
disproportionately incorrect. So the fancy dashboards and complex
scripts could well provide totally unreliable observations such as
yielding flawed conclusions, even more so than the Method-R
advocates normally attribute. Of all the steps, meaning the methods
for database optimization and tuning, this one should be even
further reduced in importance than normal. That is not to say that
the tools will not work, only that they are even more challenged
than normal. So use such tools with an additional level of
skepticism and doubt in the virtual world.
Special Note: This time, drift issue on
virtual machines may also skew Oracle Stats Pack and ADDM/AWR
reports. It may also potentially negatively affect other Oracle
items, such as TKPROF, trace files, SQL*Plus timing, Performance
Analyzer, Real Application Testing and Database Reply. Essentially
any performance metric, indicator, measurement, ratio or calculation
based upon the system clock may be slightly off. Only hard numbers
remain totally unaffected, such as run queue length, memory used,
memory free, I/O counts, paging occurrences, swapping occurrences,
etc. So be extra careful.
This is an excerpt from
Oracle on VMWare:
Expert tips for database virtualization
by Rampant TechPress.