Dec 6: A More Complex Statspack Example - Part 1
Because a Statspack report contains a lot of information and this particular exercise threw up a fewcommon problems, I'm going to break it up into several postings. In fact, my overall analysis was prefixed with the following.
"I'm sorry this reply is so long and sprawling and doesn't contain any easy answers."
Mmmm, quite.
When you're a DBA for a large organisation, it's only a matter of time before one of your software suppliers claims that the performance they're getting in their test environment is substantially better than your test environment. The initial stages of this discussion usually involve sending them lists of init.ora parameters (this is *very* important
Eventually, though, the only sensible solution is to analyze the performance of the same job in the two different environments. My preference here is to use event 10046 or DBMS_MONITOR/DBMS_SESSION to trace the actions and wait events (in fact, there's an example from the same application here) but that can be a little tricky if the application isn't instrumented and uses connection pools. Sometimes it's difficult to narrow down which session you should be tracing. Without wanting to be too disparaging, I also need to think about how experienced the vendor's technical people are with tracing. Bitter experience leans towards 'not very'.
In those cases Statspack can be useful, particularly if we're using test environments and we can guarantee that the only job running is the one we're interested in. This is what Mogens Norgaard means when he says :-
"The one exception is when only one thing was running... then the summary becomes just a summary of that one form, report or whatever."
in this blog which focusses on the limitations of Statspack. That's what I suggested to the vendor - we would both enable Statspack snapshot collection, run the same job with a before and after snapshot and compare the resulting reports.
Anyway, let's start delving into these reports. I usually open the pair in different windows and work my way through them, flicking back and forward between the two. Most of the information in this blog will be obfuscated, but I've kept the important details intact.
The first thing I noticed about the reports was that we're running Oracle 9.2.0.7 and the vendor is running 9.2.0.1. That difference might turn out to be minor, but I've hit enough 9.2.0.x bugs to know that it might not be. Having looked through both reports completely, I don't think it's the cause of the performance difference in this case but I've suggested they might want to upgrade to a later version.
Here's the next section that shows the report period. At first I was a little confused by this because we have 15-minute snapshots, so I took a snapshot before the job was running and then the nearest 15 minutes after it finished. On our system the job took about 12 hours.
Our System
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 346 22-Nov-06 08:53:55 62 .7
End Snap: 395 22-Nov-06 21:00:02 46 1.0
Elapsed: 726.12 (mins)The job took 6 hours 44 minutes on the vendor's system, but the report is for a longer period of time.
Vendor System
Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 4 03-Dec-06 17:30:11 18 2.8
End Snap: 7 04-Dec-06 08:05:45 14 4.6
Elapsed: 875.57 (mins)The reason for the anomaly was that they had left the job running and had run a closing snapshot hours after the job finished, the next morning. However, as their system was running nothing after the job completed, I could make allowances for this. I would have preferred them to have a report covering the exact period of the job, though.
In fact, you might think at this point that I've broken one of the cardinal rules of sensible Statspack use. A 12 hour report?!? What good could that be? Think of the detail hidden in those 12 hours! Well, not really because the job cycles round a few different sections multiple times and, even though it's a very long business task, it is a single business task and I want to compare the entire task running on two different environments. If I do want to drill down to a narrower period, I can use the multiple 15 minute snapshots.
Next the Cache Sizes. This section is one of a few (including the list of non-default parameters at the end of the report) that don't help me solve any performance problem, but offer a quick sanity check of the two environments.
Our System
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 1,968M Std Block Size: 8K
Shared Pool Size: 576M Log Buffer: 20,480KVendor's System
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 512M Std Block Size: 8K
Shared Pool Size: 128M Log Buffer: 20,480KWhy are our caches so much bigger than the vendor's? Because we've been working on this problem for a little while now and the vendors just keep asking us to increase cache sizes in the hope that everything will magically perform better. If in doubt, bump everything up a bit! (I'll leave it to the reader to guess how successful this has proved.)
On to the Load Profiles
Our System
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 95,650.78 9,958.63
Logical reads: 3,317.07 345.35
Block changes: 650.44 67.72
Physical reads: 45.52 4.74
Physical writes: 11.59 1.21
User calls: 423.49 44.09
Parses: 188.57 19.63
Hard parses: 0.46 0.05
Sorts: 15.96 1.66
Logons: 0.06 0.01
Executes: 118.06 12.29
Transactions: 9.60
% Blocks changed per Read: 19.61 Recursive Call %: 10.37
Rollback per transaction %: 0.00 Rows per Sort: 7.61
Vendor's System
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 63,599.53 433,182.61
Logical reads: 2,124.32 14,468.98
Block changes: 433.31 2,951.29
Physical reads: 100.97 687.73
Physical writes: 6.52 44.39
User calls: 253.44 1,726.23
Parses: 106.31 724.09
Hard parses: 0.01 0.04
Sorts: 1.17 7.94
Logons: 0.01 0.05
Executes: 55.26 376.40
Transactions: 0.15
% Blocks changed per Read: 20.40 Recursive Call %: 1.12
Rollback per transaction %: 0.00 Rows per Sort: 7.12
This is the stage where I started to have doubts. Look at the difference in redo size per second and per transaction.
Now that I'm writing down how I tend to use Statspack, I realise that Redo size is a figure I tend to focus on because I think it gives a pretty good indication of workload. If both these databases are running the same job, why the difference, particularly the massive difference in redo size per transaction? (In fact, it looks like our system is doing about 50% more work than the vendors when you look at the various load profile values)
More on that in the next blog.
#1 - Noons said:
2006-12-07 01:32 - (Reply)
recursive calls are also very high (roughly 10X more) in yours?
#2 - Sudhi 2006-12-07 05:10 - (Reply)
"The initial stages of this discussion usually involve sending them lists of init.ora parameters..."
Wonder whether a RDA report would help in reducing email exchange of data... any thoughts?
#3 - Doug Burns said:
2006-12-07 06:56 - (Reply)
Indeed, and I know of at least one very simple reason for this (next blog) but there may be others and I think that's something that would benefit from specific tracing activity to check.
Hey, this is going to be pretty useful - extra free consultancy from blog readers! Cool ![]()
#4 - Doug Burns said:
2006-12-07 06:59 - (Reply)
Interesting idea and I confess I'd never thought of that. I think the reason is that, given most external vendors struggle with anything more complex than a list of init.ora parameters, they would just be overwhelmed by an RDA report!
Seriously, it takes them a long time to swallow even a well-explained Statspack analysis.
The other thing is that it's very little effort to give them what they want with a brief comment that I don't think it will help them and why ![]()
#5 - Mathew Butler 2006-12-07 12:30 - (Reply)
I find the report slightly confusing as some your figures are than the vendors worse per second, but then better per transaction. I don't really understand why.
Clearly though by the measure you describe your system is doing more work. I'd be interested to see what the recursive SQL is captured later in the report.
Very interesting post!
Regards,
#6 - Mathew Butler 2006-12-07 12:33 - (Reply)
Finger trouble, that should read:
"I find the report slightly confusing as some of your figures are worse that the vendors (per second), but then better per transaction. I don't really understand why."
#7 - Doug Burns said:
2006-12-07 13:19 - (Reply)
The next (delayed posting) installment should turn up at about 13:50 today (UK time).
I'm not sure that will answer your question, but I hope one of the later entries will.
#8 - vinod ladda 2007-09-19 10:23 - (Reply)
No of transactions per second are different in both systems. Vendor's system has very less number of transactions per second so those figures are worse in that report.

