|
|
Finding
Oracle materialized view opportunities
Oracle Database Tips by Donald Burleson |
Oracle says that a well defined set
of materialized views can service can be made to cover a whole
schema, but that's easier said than done. I know about the
SQLTuning Advisor, but I'm always looking for a techniques to find
all opportunities for materialized views. Also see my notes
now how to identify opportunities for Materialized Views.
Materialized views are one
of the single most important SQL tuning tools and they are a true silver bullet,
allowing you to pre-join complex views and pre-compute summaries for super-fast
response time. I've devoted over a hundred pages to SQL tuning with
materialized views in my book
"Oracle
Tuning: The Definitive Reference", and also see "Oracle
Replication" a deeply-technical book on creating and managing materialized views.
Let's divide this up to sections
Automating
materialized view creation
Today we have the 10g SQL Tuning advisor,
which recommends materialized view opportunities, but its major shortcoming
is that it only uses a "set" of problematic SQL, supplied by the DBA.
The 10g SQLTuning advisor does a decent job identifying MV's, and even our
most sophisticated scripts often miss opportunities. The decision rules are
far too complex to easily quantify.
A program to detect MV opportunities at the
workload-level would be a major software engineering challenge, as large as
many artificial intelligence projects. The scope of such a program
would be massive as it would have to consider millions of distinct SQL
requests over the baseline period.
The materialized view workload duration and cohorts
An appropriate workload for an MV's should
acknowledge your repeating "cycles" in processing. For example, many
shops run OLTP during the day, processing fully-indexed rows. However,
on weekends they run giant batch jobs to collect and aggregate business
reports. Some shops drop and re-create materialized views manually
(using the dbms_mview package) for several reasons:
- The
DML-rate is too high to support real-time materialization, or
- The data
is only needed during specific time windows
Every workload is different, and given the high overhead of keeping
materialized views synchronized with their component tables (Oracle uses
snapshot replication to keep the materializations fresh), we must carefully
choose a representative workload, and take a single, deep sample size, covering
many weeks of processing.
The factors involved in the decision to
create a materialized view
As a quick review, the benefit of a Materialized View is its ability to
pre-join tables together (a real-time denormalization technique!), and to
pre-summarize frequently-requested analytics (frequent SQL with min(), max(),
sum(), rollup, etc.). An automated program to detect Materialized View
opportunities must perform these steps:
- Collection - Determine your universe of SQL statements
- Assimilation - Examine the v$sql table for new statements every nn seconds
- Recommendation - Recommend MV contents based on empirical
workload evidence
But we must also address the all-important question of stale tolerance for
materialized views.
How stale can you take
it?
The term "Stale" has a different meaning in TV show like Fear Factor, but to
Oracle, it is a super-important consideration. The instant that a
materialized view is created, it may become "stale". The concept of stale
tolerance is central to the decision to create a materialized view, since
low-volume updates are ideas, and real-time refreshing of materialized view in a
busy system can cause giant bottlenecks. Always remember:
END-USERS CONTROL THE STALENESS OF THEIR DATA, NOT YOU
I do this in all-caps because a good DBA will NEVER allow stale data without
the explicit understanding of the end-users. Getting "wrong" data is a
capital offense to an Oracle DBA, but we don't mind providing the illusions of
instantaneous response time so long as the end-user CLEARLY understand that
their data is only "fresh" to the last day (or whatever degree of staleness you
desire).
The problem is the overhead of synchronizing the materializations, and
end-users often don't know how "fresh" they need to keep their MV's. For
example, a shop doing Oracle Business Analytics shop.
It's always nice to err on the side of a frequent refresh, but we must always
remember that the "magic" of MV's is the fact that they are built from pieces of
other tables! Each and every table that participates in the MV will have a
snapshot table to maintain (called snap$tablename in the data
dictionary), the the overhead of a ON COMMIT or FAST refresh can cripple the I/O
subsystem as hundreds of refresh requests implode the engine.
So, how to we ensure that our end-users get data with the freshness they
need, without risking "bad" results from stale tables?
Now that we understand that the refresh overhead is huge, we need to guide
our end-users through the decision process. The end-user question can be
put quite simply:
"I can make your 5 minute OLAP query run with sub-second response
time, if it's OK that the data is only current to yesterday"
Is it a difference that makes a difference?
Say that your end-users are doing monthly longitudinal studies. It is
highly unlikely that today's "latest" data would make any difference in the
validity of their statistical results.
Once your end-user agrees to allow stale data, always "CYA" and ask the
developer to display a warning on the end-user screen like "This data is only
current as-of yesterday", and you get to enjoy the 24x lower overhead of a
single nightly refresh, rather then 24 expensive hourly refreshes.
In sum, try to talk the end-user into make the trade for lightening-fast
response time in-return for allowing staleness. If the MV has to be rebuilt
every 5 minutes to support a query that runs every 10 minutes, you get far less
benefit than if an MV is rebuilt daily. Of course, they end-users must
understand the tradeoff between super-fast response time and "stale" results.
John Garmany notes:
In replication the Materialized view is normally on a
different server and the refresh is executed by the remote database rather than
the local database. This leaves you with a scheduled job (refresh group) as the
remote server does not know when data is committed on the local server. If you
are replicating inside the database, that is a different issue and you can look
at ON COMMIT refresh. Likewise, on a medium or high transaction table, ON
COMMIT refresh can bring the database to it' knees, negating any benefit of an
Materialized view or a normal view. Still, in the case of a local database
replication, you can grant rights on the actual table and reduce the over head
of the Materialized view completely if that fits you requirements.
As replication continues to improve, the use of rowids is
going away in all but the most simple implementations. When any replication
uses rowid, there is a requirement to match the row to the remote site. Since
both the local and remote database are using heap organized tables, rowids may
not match, sometimes forcing FULL refresh rather than FAST refresh. Advanced
Replication all but requires the use of PKs on the replicated tables. This way
both databases (local and remote) know exactly which row to update, delete, etc.
If you are considering replication, you need to ensure that tables have PKs.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |