Jul 2: Session Level ASH Reports
I noticed a post on H.Tonguç Yılmaz's blog about filtering ASH data to look at the actions of a specific instrumented query. There are a few strange things that I was going to comment on but the blog requires me to use a Wordpress account before I can post. I remember this has stopped me from commenting on a few interesting posts there in the past so I've decided to post some comments here and hopefully they'll appear as a trackback or pingback or some such modern thing
I think the post is really showing two different things, one more successfully than the other.
1) Using DBMS_APPLICATION_INFO to instrument code so that we can analyse what it's doing. It is incredibly useful and Oracle's tools are all geared up to use the info if it's there. But that's not really about ASH as such, because the information would also be written to trace files too and would prove just as useful there. You could use trcsess with module or client_id, service, action or module and you would have a consolidated trace file with the same application-aware view of things, without the sampling gaps inherent in ASH data. Of course, you'd need to know about the problem in advance or be able to recreate it.
2) Filtering ASH data to see what a specific user or application is doing. In this case it's a parallel query but if I wanted to look at pq activity with ASH, I think I'd want to include the QC_SESSION_ID and possibly QC_INSTANCE_ID to tie things together. I don't think it's necessary for what the example's trying to show, but it's worth knowing about if you didn't already.
However, I have a couple of real problems with the ASH query shown. (Recreated here with some of the white space removed from the results to make it fit the width of this template.)
SELECT session_id, client_id, event, SUM(wait_time + time_waited) total_wait_time FROM v$active_session_history WHERE client_id = 'your_identifier' AND sample_time BETWEEN SYSDATE - 30 / 1440 AND SYSDATE GROUP BY session_id, client_id, event ORDER BY 2; SESSION_ID CLIENT_ID EVENT TOTAL_WAIT_TIME ---------- ------------------------ ------------------- --------------- 223 your_identifier latch free 3969275 292 your_identifier direct path read 5304169 241 your_identifier direct path read 1133055 273 your_identifier 111542310 235 your_identifier direct path read 1052545 235 your_identifier latch free 3969283 223 your_identifier direct path read 1003455 241 your_identifier latch free 3969486 8 rows selected
I'll summarise what I think this query is meant to be returning as 'All activity for a given client id in the last 30 minutes, showing the total time waited for each event by each session used'.
I see more problems every time I look at this, but a few off the top of my head ...
1) Probably the most important is the SUM(wait_time + time_waited) as a measure of total wait time. It's not total wait time. First, ASH is sampled data so it doesn't contain all of the events. SUM-ing the data might give the illusion of something useful, but we have no idea what happened within the one second sample points! That's why the Top Sessions section of the Oracle-supplied ASH report doesn't report the amount of time spent on various events, but the percentage of samples over the period. Here's an example from one of my course slides.
Second, why WAIT_TIME + TIME_WAITED?
2) There's no filtering on session_state so it's not obvious that the group with no data returned in the EVENT column is ON CPU.
3) If I was going to ORDER BY something here, I suppose CLIENT_ID might be in there, but wouldn't I be interested in 'Most Active'? In which case, I would order by that TIME_WAITED column, if it wasn't flawed. In fact, the smart thing to do here would be to COUNT the number of samples as a proxy for time. That's what the supplied reports do and there are other examples over at ashmasters.com.
My final tip, though, would be this. If you run $ORACLE_HOME/rdbms/admin/ashrpti.sql (note the i, it's important), not only will it allow you to specify instance in a RAC cluster, but you can also limit the scope of the report in many interesting ways like this. (Although why anyone would want to report on WATI_CLASS is beyond me )
Specify SESSION_ID (eg: from V$SESSION.SID) report target: Defaults to NULL: Enter value for target_session_id: SESSION report target specified: Specify SQL_ID (eg: from V$SQL.SQL_ID) report target: Defaults to NULL: (% and _ wildcards allowed) Enter value for target_sql_id: SQL report target specified: Specify WATI_CLASS name (eg: from V$EVENT_NAME.WAIT_CLASS) report target: [Enter 'CPU' to investigate CPU usage] Defaults to NULL: (% and _ wildcards allowed) Enter value for target_wait_class: WAIT_CLASS report target specified: Specify SERVICE_HASH (eg: from V$ACTIVE_SERVICES.NAME_HASH) report target: Defaults to NULL: Enter value for target_service_hash: SERVICE report target specified: Specify MODULE name (eg: from V$SESSION.MODULE) report target: Defaults to NULL: (% and _ wildcards allowed) Enter value for target_module_name: MODULE report target specified: Specify ACTION name (eg: from V$SESSION.ACTION) report target: Defaults to NULL: (% and _ wildcards allowed) Enter value for target_action_name:
In other words, Oracle already supply a report that does everything that query is trying to do and in my opinion, much better. I hope I don't seem too critical but I'm trying to help people here and SUMs of timing information in ASH data has become a particular bug-bear of mine.
Tracked: Sep 15, 11:10
You correctly point out that this query is fundamentally flawed. Event wait times in ASH are useful really only to identify long "spike" events - if they last longer than 1 sec they will definitely be captured, and the "fixup" should tell how long they actually lasted.
BASIC ASH MATH: count(*) = Time (seconds)