A More Complex Statspack Example - Part 1

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Dec 6: A More Complex Statspack Example - Part 1

Following on from the last Statspack example, up popped an example at work this week of another common reason I use Statspack - comparing the performance of different environments. It's also a nice illustration of some of Statspack's limitations.

Because a Statspack report contains a lot of information and this particular exercise threw up a fewcommon problems, I'm going to break it up into several postings. In fact, my overall analysis was prefixed with the following.
"I'm sorry this reply is so long and sprawling and doesn't contain any easy answers."

Mmmm, quite.

When you're a DBA for a large organisation, it's only a matter of time before one of your software suppliers claims that the performance they're getting in their test environment is substantially better than your test environment. The initial stages of this discussion usually involve sending them lists of init.ora parameters (this is *very* important ;-)), lists of indexes, export dumps of your schema and so it goes on.

Eventually, though, the only sensible solution is to analyze the performance of the same job in the two different environments. My preference here is to use event 10046 or DBMS_MONITOR/DBMS_SESSION to trace the actions and wait events (in fact, there's an example from the same application here) but that can be a little tricky if the application isn't instrumented and uses connection pools. Sometimes it's difficult to narrow down which session you should be tracing. Without wanting to be too disparaging, I also need to think about how experienced the vendor's technical people are with tracing. Bitter experience leans towards 'not very'.

In those cases Statspack can be useful, particularly if we're using test environments and we can guarantee that the only job running is the one we're interested in. This is what Mogens Norgaard means when he says :-
"The one exception is when only one thing was running :-) ... then the summary becomes just a summary of that one form, report or whatever."

in this blog which focusses on the limitations of Statspack. That's what I suggested to the vendor - we would both enable Statspack snapshot collection, run the same job with a before and after snapshot and compare the resulting reports.

Anyway, let's start delving into these reports. I usually open the pair in different windows and work my way through them, flicking back and forward between the two. Most of the information in this blog will be obfuscated, but I've kept the important details intact.

The first thing I noticed about the reports was that we're running Oracle 9.2.0.7 and the vendor is running 9.2.0.1. That difference might turn out to be minor, but I've hit enough 9.2.0.x bugs to know that it might not be. Having looked through both reports completely, I don't think it's the cause of the performance difference in this case but I've suggested they might want to upgrade to a later version.

Here's the next section that shows the report period. At first I was a little confused by this because we have 15-minute snapshots, so I took a snapshot before the job was running and then the nearest 15 minutes after it finished. On our system the job took about 12 hours.

Our System
              Snap Id     Snap Time      Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 346 22-Nov-06 08:53:55 62 .7
End Snap: 395 22-Nov-06 21:00:02 46 1.0
Elapsed: 726.12 (mins)

The job took 6 hours 44 minutes on the vendor's system, but the report is for a longer period of time.

Vendor System
            Snap Id     Snap Time      Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 4 03-Dec-06 17:30:11 18 2.8
End Snap: 7 04-Dec-06 08:05:45 14 4.6
Elapsed: 875.57 (mins)

The reason for the anomaly was that they had left the job running and had run a closing snapshot hours after the job finished, the next morning. However, as their system was running nothing after the job completed, I could make allowances for this. I would have preferred them to have a report covering the exact period of the job, though.

In fact, you might think at this point that I've broken one of the cardinal rules of sensible Statspack use. A 12 hour report?!? What good could that be? Think of the detail hidden in those 12 hours! Well, not really because the job cycles round a few different sections multiple times and, even though it's a very long business task, it is a single business task and I want to compare the entire task running on two different environments. If I do want to drill down to a narrower period, I can use the multiple 15 minute snapshots.

Next the Cache Sizes. This section is one of a few (including the list of non-default parameters at the end of the report) that don't help me solve any performance problem, but offer a quick sanity check of the two environments.

Our System
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 1,968M Std Block Size: 8K
Shared Pool Size: 576M Log Buffer: 20,480K

Vendor's System
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 512M Std Block Size: 8K
Shared Pool Size: 128M Log Buffer: 20,480K

Why are our caches so much bigger than the vendor's? Because we've been working on this problem for a little while now and the vendors just keep asking us to increase cache sizes in the hope that everything will magically perform better. If in doubt, bump everything up a bit! (I'll leave it to the reader to guess how successful this has proved.)

On to the Load Profiles

Our System
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 95,650.78 9,958.63
Logical reads: 3,317.07 345.35
Block changes: 650.44 67.72
Physical reads: 45.52 4.74
Physical writes: 11.59 1.21
User calls: 423.49 44.09
Parses: 188.57 19.63
Hard parses: 0.46 0.05
Sorts: 15.96 1.66
Logons: 0.06 0.01
Executes: 118.06 12.29
Transactions: 9.60

% Blocks changed per Read: 19.61 Recursive Call %: 10.37
Rollback per transaction %: 0.00 Rows per Sort: 7.61

Vendor's System

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 63,599.53 433,182.61
Logical reads: 2,124.32 14,468.98
Block changes: 433.31 2,951.29
Physical reads: 100.97 687.73
Physical writes: 6.52 44.39
User calls: 253.44 1,726.23
Parses: 106.31 724.09
Hard parses: 0.01 0.04
Sorts: 1.17 7.94
Logons: 0.01 0.05
Executes: 55.26 376.40
Transactions: 0.15

% Blocks changed per Read: 20.40 Recursive Call %: 1.12
Rollback per transaction %: 0.00 Rows per Sort: 7.12

This is the stage where I started to have doubts. Look at the difference in redo size per second and per transaction.

Now that I'm writing down how I tend to use Statspack, I realise that Redo size is a figure I tend to focus on because I think it gives a pretty good indication of workload. If both these databases are running the same job, why the difference, particularly the massive difference in redo size per transaction? (In fact, it looks like our system is doing about 50% more work than the vendors when you look at the various load profile values)

More on that in the next blog.
Posted by Doug Burns Comments: (8) Trackbacks: (0)

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

#1 - Noons said:
2006-12-07 01:32 - (Reply)

recursive calls are also very high (roughly 10X more) in yours?

#2 - Sudhi 2006-12-07 05:10 - (Reply)

"The initial stages of this discussion usually involve sending them lists of init.ora parameters..."

Wonder whether a RDA report would help in reducing email exchange of data... any thoughts?

#3 - Doug Burns said:
2006-12-07 06:56 - (Reply)

Indeed, and I know of at least one very simple reason for this (next blog) but there may be others and I think that's something that would benefit from specific tracing activity to check.

Hey, this is going to be pretty useful - extra free consultancy from blog readers! Cool ;-)

#4 - Doug Burns said:
2006-12-07 06:59 - (Reply)

Interesting idea and I confess I'd never thought of that. I think the reason is that, given most external vendors struggle with anything more complex than a list of init.ora parameters, they would just be overwhelmed by an RDA report!

Seriously, it takes them a long time to swallow even a well-explained Statspack analysis.

The other thing is that it's very little effort to give them what they want with a brief comment that I don't think it will help them and why ;-)

#5 - Mathew Butler 2006-12-07 12:30 - (Reply)

I find the report slightly confusing as some your figures are than the vendors worse per second, but then better per transaction. I don't really understand why.

Clearly though by the measure you describe your system is doing more work. I'd be interested to see what the recursive SQL is captured later in the report.

Very interesting post!

Regards,

#6 - Mathew Butler 2006-12-07 12:33 - (Reply)

Finger trouble, that should read:

"I find the report slightly confusing as some of your figures are worse that the vendors (per second), but then better per transaction. I don't really understand why."

#7 - Doug Burns said:
2006-12-07 13:19 - (Reply)

The next (delayed posting) installment should turn up at about 13:50 today (UK time).

I'm not sure that will answer your question, but I hope one of the later entries will.

#8 - vinod ladda 2007-09-19 10:23 - (Reply)

No of transactions per second are different in both systems. Vendor's system has very less number of transactions per second so those figures are worse in that report.


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl