|
 |
|
Displaying Oracle Trace Event files
Oracle Tips by
Robert Freeman |
Note: Here are related Oracle
10046 trace file notes:
Accessing Oracle Trace Records with external tables
After turning in this paper to
meet the deadline, I did some more playing with this feature.
Here is what I did:
I created a view to make
accessing the trace records easier:
Create or replace view vw_10046_view as
select trace_row_num, event_type, cursor_number, wait_event_name,
to_char(total_elapsed_time) wait_time,
-1 command
from ext_10046_table_wait_events
union
select trace_row_num, event_type, cursor_number
,null, to_char(pef_tim), -1
from ext_10046_table_pef_events
union
select distinct b.trace_row_num, null, null, null, sql_text,
to_number(cur_oct) command
from v$sql a, ext_10046_table_cursor_events b
where upper(b.cur_ad)=a.address (+)
order by 1;
This made things easier, but
there was still a problem, performance. External table access to
large 10046 trace files is just very slow. To deal with that, I
simply created a table and dumped the 10046 trace data into it.
Create table tab_10046 as
Select * from vw_10046_view;
Create index ix_tab_10046_01 on tab_10046(trace_row_num);
Create index ix_tab_10046_02 on tab_10046(command);
Create index ix_tab_10046_03 on tab_10046(event_type);
Create index ix_tab_10046_04 on tab_10046
(event_type, trace_row_num, cursor_number, wait_time);
The indexes help tremendously
with overall access times for queries. Next, I wanted to write
some more queries that would help with real world 10046 trace
problem solving. The first query uses the analytical functions
of Oracle to provide some insight into the run times of various
operations:
Column time_between_events
heading 'time|Between|Events?
Column wait_time format
9999999999
select event_type, cursor_number, to_number(wait_time) wait_time,
(wait_time/1000000)-lag(wait_time/1000000, 1)
over (order by trace_row_num) "time_between_events"
from tab_10046
where event_type in ('EXEC','FETCH','PARSE')
order by trace_row_num;
This gives me a listing of each
individual EXEC, FETCH and PARSE operation, and provides me with
the time it took between the two operations. For example in the
big 10046 trace file I run, it produced over 5 million records?
Here is an example of this report:
Time
Cursor Between
EVENT_TYPE Number WAIT_TIME Events
---------- ---------- ----------- ----------
PARSE 3 3889179528
EXEC 3 3889180002 .000474
PARSE 3 3889182412 .00241
EXEC 3 3889182891 .000479
PARSE 3 3889185178 .002287
EXEC 3 3889185634 .000456
PARSE 3 3889188020 .002386
EXEC 3 3889188437 .000417
PARSE 3 3923555525 34.367088
EXEC 3 3923556292 .000767
PARSE 3 3923559154 .002862
EXEC 3 3923559754 .0006
PARSE 3 3923562154 .0024
EXEC 3 3923562690 .000536
PARSE 3 3923565068 .002378
EXEC 3 3923565564 .000496
So, five million plus records
seems a bit extreme?. How do we reduce this number. TO deal with
this I modified the query to add some run time criteria:
select trace_row_num, event_type, cursor_number,
to_number(wait_time) wait_time, event_time FROM (select
trace_row_num, event_type, cursor_number, wait_time,
(wait_time/1000000)-lag(wait_time/1000000, 1)
over (order by trace_row_num) "EVENT_TIME" from
tab_10046
where event_type in ('EXEC','FETCH','PARSE') order by
trace_row_num )
where trace_row_num in ( select trace_row_num
from (select trace_row_num,
(wait_time/1000000)-
lag(wait_time/1000000, 1)
over (order by trace_row_num)
"EVENT_TIME"
from tab_10046
where event_type in ('EXEC','FETCH','PARSE')
) where event_time > .25 )
So, here I am limiting my
result to those events that experienced a lag time of greater
than .25 seconds. This significantly reduced my row set, and
gave me something manageable to work with. Here is a partial
result of this query:
TRACE_ROW_NUM EVENT_TYPE Number WAIT_TIME EVENT_TIME
------------- ---------- ---------- ----------- ----------
2298019 PARSE 3 2881178580 32.95
2425666 PARSE 3 2939522206 34.07
2553299 PARSE 3 2994920152 31.08
2680751 PARSE 3 3054787045 33.54
2808832 PARSE 3 3110536030 32.24
2936778 PARSE 3 3167765847 31.58
3064266 PARSE 3 3227249876 33.48
3191935 PARSE 3 3284241535 32.12
3319636 PARSE 3 3341259394 31.84
3447092 PARSE 3 3400515679 33.46
3574826 PARSE 3 3457276827 33.02
3702529 PARSE 3 3515651502 33.09
3829993 PARSE 3 3575157371 34.83
3957662 PARSE 3 3630381367 31.68
4085349 PARSE 3 3689283680 33.62
4106254 EXEC 3 3695989891 .45
4213359 PARSE 3 3750297980 35.12
4341185 PARSE 3 3805701373 31.61
4468887 PARSE 3 3865174711 33.78
4596361 PARSE 3 3923555525 34.37
This has really made it easier
to look for specific problems. It would be even easier if I
could see the operation prior to this operation, the one that
the lag time is actually predicated on. This way I can determine
a range of records that are of interest. I modified the query
thusly to provide this information:
select ?E? operation, trace_row_num, event_type, cursor_number,
wait_time, event_time FROM
(select trace_row_num, event_type, cursor_number,
to_number(wait_time) wait_time,
(wait_time/1000000)-lag(wait_time/1000000, 1)
over (order by trace_row_num) "EVENT_TIME" from tab_10046
where event_type in ('EXEC','FETCH','PARSE')
order by trace_row_num )
where trace_row_num in
( select trace_row_num
from (select trace_row_num,
(wait_time/1000000)-
lag(wait_time/1000000, 1)
over (order by trace_row_num) "EVENT_TIME"
from tab_10046
where event_type in ('EXEC','FETCH','PARSE')
)
where event_time > .25 )
UNION
select ?B? Operation, trace_row_num, event_type, cursor_number,
to_number(wait_time) wait_time, event_time
FROM
(select trace_row_num, event_type, cursor_number, wait_time,
(wait_time/1000000)-lag(wait_time/1000000, 1)
over (order by trace_row_num) "EVENT_TIME? from
tab_10046
where event_type in ('EXEC','FETCH','PARSE')
order by trace_row_num )
where rowid in (select q_rowid
from (select trace_row_num,
(wait_time/1000000)-
lag(wait_time/1000000, 1)
over (order by trace_row_num) "EVENT_TIME?,
lag(rowid) over
(order by trace_row_num) q_rowid
from tab_10046
where event_type in ('EXEC','FETCH','PARSE')
)
where event_time > .25 )
order by trace_row_num;
Which gives us this dandy
report:
Cursor
O TRACE_ROW_NUM EVENT_TYPE Number WAIT_TIME EVENT_TIME
- ------------- ---------- ---------- ----------- ----------
B 4661621 PARSE 3 3936743204 .01
E 4661624 EXEC 3 3937120555 .38
B 4724009 EXEC 3 3948848444 .00
E 4724035 PARSE 3 3981191796 32.34
B 4851699 EXEC 3 4006419477 .00
E 4851727 PARSE 3 4040119254 33.70
B 4979646 EXEC 3 4063701927 .00
E 4979674 PARSE 3 4096337840 32.64
B 5033307 PARSE 3 4107294796 .01
E 5033311 EXEC 3 4107692394 .40
Note that this report gives me
the beginning operation (denoted with the B), and the ending
operation number (denoted with the E). The ending operation is
the one reporting the lag, the beginning operation is the one
where the lag timing started. So, the bottom line is that
somewhere between these operations, trouble has occurred.
I can run another report to see
all of the waits that occurred in between those events, this
leads me to possible culprits:
select trace_row_num, event_type, wait_event_name,
cursor_number, wait_time/1000000 wait_time
from
tab_10046
where
trace_row_num between 4851699 and 4851727
order
by trace_row_num;
Cursor
TRACE_ROW_NUM EVENT_TYPE WAIT_EVENT_NAME
Number WAIT_TIME
------------- ----------
--------------- ------ ---------
4851699
EXEC 3 4006
4851700 WAIT 'SQL*Net message to
client' 3 0
4851701 WAIT 'SQL*Net message from
client' 3 0
4851703 WAIT 'log file
sync' 0 0
4851704 WAIT 'SQL*Net message to
client' 0 0
4851705 WAIT 'SQL*Net message from
client' 0 0
4851708 WAIT 'SQL*Net message to
client' 3 0
4851709 WAIT 'SQL*Net message from
client' 3 0
4851710 WAIT 'SQL*Net message to
client' 0 0
4851712 WAIT 'SQL*Net message from
client' 0 34
Cursor
TRACE_ROW_NUM EVENT_TYPE WAIT_EVENT_NAME Number
WAIT_TIME
------------- ---------- ---------------
------ ---------
4851713 WAIT 'SQL*Net message to
client' 0 0
4851714 WAIT 'SQL*Net message from
client' 0 0
4851715 WAIT 'SQL*Net message to
client' 3 0
4851716 WAIT 'SQL*Net message from
client' 3 0
4851718
4851727
PARSE 3 4040
In this case, we find that we
had a wait event, SQL*Net message from client, that lasted for
34 seconds. This may be a normal situation (waiting for user
feedback perhaps) or it might demonstrate some sort of problem
with the application or network.
Also note that there are 3rd party tools to
format 10046 trace
events to make them more readable.
About the Author
Robert G.
Freeman is an independent database technologist who works
closely with Burleson Consulting on database training,
development, design and administration throughout the world.
Robert speaks frequently at a number of different user groups
including IOUG-A and the UKOUG.
Click here to
read next section
Click here
to read previous section
Randolf Geist, has this Unix command that can be used to transform the
raw trace into a suitable input for an Oracle external table:
cat $TRACE_FILE | awk '
/^HEAP DUMP heap name=/ { split($0,ht,"\""); HTYPE=ht[2];
doPrintOut = 1; } /Chunk/{ if
( doPrintOut == 1 ) { split($0,sf,"\"");
printf "%10d , %16s, %16s, %16s\n", $4, HTYPE, $5, sf[2];
} } /Total heap
size/ { doPrintOut=0;
} ' > $EXT_TAB_DIR/heapdump.txt
This is how this external table definition is defined for the trace
file:
create
table heapdump (
chunk_size integer
, heap_type varchar2(16)
, chunk_type varchar2(16)
, alloc_reason varchar2(16)
)
organization external (
type oracle_loader
default directory ext_tab_dir
access parameters
( records
delimited by newline fields
terminated by ',' lrtrim
)
location ('heapdump.txt') );
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |

|
|