AWR Differences Report

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Jul 30: AWR Differences Report

Some features in this post require a Diagnostics Pack license.

This morning I had an opportunity to use one of my favourite AWR tools, the differences report. Our system has a fairly involved overnight batch schedule consisting of multiple concurrent job streams that starts at 02:00 and usually completes at about 5:40. This morning it finished at 6:40 so I needed to work out what had gone wrong. (Better still, because this wasn't a problem with specific SQL statements, I can blog about it without using any system-specific information that needs to be obscured and that doesn't happen too often.)

Because I don't have access to the server to run $ORACLE_HOME/rdbms/admin/awrddrpt.sql, I used a SQL-only approach.

First find out the snapshots I'm interested in.

select snap_id, end_interval_time 
from dba_hist_snapshot 
where end_interval_time > trunc(sysdate-1) 
order by snap_id; 

8945                                   28/07/2009 12:00:18.555 AM          
8946                                   28/07/2009 1:00:30.647 AM           
8947                                   28/07/2009 2:00:42.740 AM           
8948                                   28/07/2009 3:00:55.045 AM           
8949                                   28/07/2009 4:00:07.050 AM           
8950                                   28/07/2009 5:00:19.198 AM           
8951                                   28/07/2009 6:00:31.596 AM           
8952                                   28/07/2009 7:00:43.751 AM           
8953                                   28/07/2009 8:00:55.820 AM           
8954                                   28/07/2009 9:00:07.679 AM           
8955                                   28/07/2009 10:00:19.891 AM          
8956                                   28/07/2009 11:00:31.994 AM          
8957                                   28/07/2009 12:00:44.094 PM          
8958                                   28/07/2009 1:00:56.239 PM           
8959                                   28/07/2009 2:00:08.123 PM           
8960                                   28/07/2009 3:00:20.204 PM           
8961                                   28/07/2009 4:00:32.556 PM           
8962                                   28/07/2009 5:00:44.682 PM           
8963                                   28/07/2009 6:00:56.788 PM           
8964                                   28/07/2009 7:00:08.679 PM           
8965                                   28/07/2009 8:00:20.759 PM           
8966                                   28/07/2009 9:00:32.851 PM           
8967                                   28/07/2009 10:00:45.004 PM          
8968                                   28/07/2009 11:00:57.713 PM          
8969                                   29/07/2009 12:00:09.744 AM          
8970                                   29/07/2009 1:00:21.840 AM           
8971                                   29/07/2009 2:00:33.939 AM           
8972                                   29/07/2009 3:00:46.268 AM           
8973                                   29/07/2009 4:00:58.378 AM           
8974                                   29/07/2009 5:00:10.702 AM           
8975                                   29/07/2009 6:00:23.183 AM           
8976                                   29/07/2009 7:00:35.604 AM           
8977                                   29/07/2009 8:00:48.040 AM       

Now that I have those snapshot ids, I can generate the AWR differences report, passing in the DBID, Instance Number, Begin and End Snapshot IDs for the first period and then the second period.

select * from TABLE(DBMS_WORKLOAD_REPOSITORY.awr_diff_report_html(4034329550,1,8947,8951,
                                                                  4034329550,1,8971,8975)); 

The report shows information for both periods, layed out for easy comparison. What immediately jumped out at me was the much higher DB Time over the first period than the second, which is probably more easily expressed as Average Active Sessions over the period. (Some people might say AAS is the magic metric)



Yes, this is over a long 4 hour period but I have the benefit of knowing that this is the same batch process that runs every night and it's performance and workload profile is more or less the same night after night. There's still no substitute for knowing your systems, after all. So to have over 4 times the number of sessions active, averaged over a long period, is a sure sign that something weird is going on and I had an immediate suspicion about what it might be.

Let's look at the Top 5 Timed Events section.



Yes, as I suspected the first period with many more active sessions on average is showing a lot of time spent on the parallel execution-related "PX Deq Credit: send blkd' event which doesn't show up at all in the second period. In fact, that's why it has an asterisk next to it in the first period numbers, to show it's an event which doesn't appear in the top 5 over the second period. That's what I love about this report. It takes the particular strength of AWR/Statspack - comparing 'good' and 'bad' periods and lays it out in readable format. No more sitting there at a desk with print-outs of two Statspack reports, poring over the details, or flicking between two vi sessions paging through the results. Those who've done that will know precisely what I mean ;-)

We have 4 times as many sessions active in the first period and there's a lot of parallel wait time that doesn't appear in the second period so I suspect that PX wasn't being used properly in the second period. I can check various statistics for that. This is a small section of the Instance Activity section of the report.



Yes, every parallel operation in the second period was downgraded to serial. You can see that the use of parallelism isn't very successful in the first 'good' period either, but that's a whole other story and possibly a blog post or two. That's where the big difference in DB Time has come from though - lots of parallel slave activity that wasn't there last night.

I think this is an interesting example because this particular report is showing a 'good' period with higher DB Time than a 'bad' period, which isn't what you'd normally expect to see - that's a side effect of looking at consolidated DB time - there's lots of additional wait time from the various slaves but they're actually helping push more work through the system, so they're a good thing. To put it another way, the instance was much busier during the good period, that's why it was able to push more work through and complete more quickly. Even though that might make sense, if we look at the graphical representation in OEM, here's the good period



... and here's the bad



I'm not convinced that people wouldn't look at those two pictures and think 'Mmmmm, things look much worse in the first period'. Oh, I should also point out that that bump at 22:00 each day is the auto stats gather job which ran for the entire maintenance window on the first night.

The root cause was a little more difficult to find out, but I got there in the end. TOAD has a nasty habit of keeping hold of PX slaves long after a parallel query has finished. (Well, that's my experience, using version 9.6.1.1, post a comment if you know different.) Unfortunately we have parallel DEGREE settings of 8 across our logging and monitoring tables (we shouldn't) and so if I query those using TOAD and forget to disable parallel query at the session level, Oracle decides to run them with parallel plans. I had one such query which used 16 slaves - our parallel_max_servers setting - which meant that there were no slaves for any of our batch processes! Result - batch was delayed by an hour.

The punchline, though, is that although the ETL process sets DEGREE to 1 on most of the tables on completion, it doesn't set the indexes to 1 (which is an oversight). The end result is that our users use parallel query during the day for their reports and (this is definitely another blog post) it makes them run more slowly. So now that I have all slaves assigned to my query, it's helped all the user queries run more quickly, so I'm sure they're happy. In fact, maybe I should just run one of those queries each morning from TOAD ;-)

The AWR differences report is very handy sometimes and I know it can't be that well known because when I had to request one prior to having the privileges myself, the DBA denied it's existence!
Posted by Doug Burns Comments: (10) Trackbacks: (2)

Trackbacks
Trackback specific URI for this entry

PingBack
Weblog: coskan.wordpress.com
Tracked: Aug 01, 02:54
PingBack
Weblog: dboptimizer.com
Tracked: May 18, 18:48

Comments
Display comments as (Linear | Threaded)

#1 - Frits Hoogland said:
2009-07-30 08:35 - (Reply)

Nice blog doug! Truly shows the power of the differences report.

I noticed the strange habit of TOAD to keep the PX slaves atached last week. I used 'select * from dual' to cut them loose. Do you know if sql developer does the same?

Also, this post shows that PX is a feature which only is effective if it's explicitly tailored for the environment, to use available resources.

#1.1 - Doug Burns said:
2009-07-30 09:02 - (Reply)

Do you know if sql developer does the same?

I can't say I do, but I'll make a note to check. To be honest, I was surprised that TOAD does.

PX is a feature which only is effective if it's explicitly tailored for the environment, to use available resources.

Amen to that

#1.1.1 - Frits Hoogland said:
2009-07-30 10:28 - (Reply)

I've noticed it very recently, your blog reminded me about the behavior (the big pink cloud on my waits after execution is ready)

The PX statement makes a lot of implications, which is not very often communicated towards PX users:
-If your OLTP is (quite) unpredictable in resource usage, PX will kill performance
-'More is better' is not always true
-If your nightbatches are subject to change due to company requirements or other reasons, PX can become ineffective

At a certain rate, I encounter clients who totally abuse PX, with the main reason they couldn't find documentation about the do nots of PX.

#2 - Christian Antognini said:
2009-07-30 13:37 - (Reply)

Hi Doug

> TOAD has a nasty habit of keeping hold
> of PX slaves long after a parallel
> query has finished.

AFAIK there is no "feature" to keep PX slaves when an execution has finished. The problem of TOAD and similar tools, in my opinion (I'm almost never use those kind of tools... so, I might be wrong...), is that by default they only fetch few dozen or rows. Therefore, the cursor is still open and the database engine is waiting for the subsequent fetches. So, a user that ignore that could casually have a look to few rows in a table, do nothing else, and therefore keep the PX slaves "busy".

Cheers,
Chris

#2.1 - Doug Burns said:
2009-07-30 15:06 - (Reply)

Of course, that's what it is. You're right.

Makes complete sense now that you mention it.

Thanks, Chris.

#3 - Pavel Ruzicka 2009-07-30 21:14 - (Reply)

I am slightly confused by this part:
"Because I don't have access to the server to run $ORACLE_HOME/rdbms/admin/awrddrpt.sql.."
I believe this is script is part of Oracle client installation as well (together with awrrpt.sql).
BTW - the fact that AWR usage (e.g. awrrpt.sql) is subject to Diagnostic Pack licensing still p* me off..

#3.1 - Doug Burns said:
2009-07-31 08:45 - (Reply)

I actually thought about that line before I typed it because in some cases the script will be on the client. However, it's not on the client installation at the site I'm currently working at (I've just checked) and people often don't have local admin rights to install a new client.

Even if it is part of your client installation they may be slightly different versions and I've run into small problems running the client version in the past, so I'd use that with caution.

#4 - Kerry Osborne said:
2009-08-11 19:05 - (Reply)

Nice post Doug. Although I routinely compare AWR reports, I never use the diff report. I usually just pop the two reports I'm comparing into a couple of windows and scroll them side by side. I had forgotten that the diff report does provide some additional info (the calculations of percentage differences in values for example). Anyway, thanks for reminding me of this report. I've used it already this morning.

Kerry

#5 - Rajendra Sunkara 2010-03-23 14:33 - (Reply)

Thanks Doug. Diff report really helps as we got a problem after applying 2010 January security patches. We see increase of direct path reads (double) resulting doubling the query times.

#6 - Amardeep Sidhu said:
2012-01-31 07:23 - (Reply)

Cool Stuff !

Doing the same but here "gc" is the culprit ;-)


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl