Dec 7: A More Complex Statspack Example - Part 2
So far we're barely into the comparison of the Statspack reports from the two different test environments and they look quite different already.
Next up, the Instance Efficiency Percentages. I have to confess that I rarely find these very useful because the numbers are always near 100% on most databases I look at and, even when they aren't, they're aggregating so much detail that who knows what they're telling us? (Actually, this is a good point to say that there's a lot of opinion in these blogs and this is my personal approach in this particular case. I'm very interested in hearing how others approach these reports. For example, here is Jonathan Lewis discussing Statspack instance efficiency percentages.)
This particular example proves an exception to the rule, though, largely because of the Execute to Parse ratio.
Our Environment
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 98.63 In-memory Sort %: 100.00
Library Hit %: 99.81 Soft Parse %: 99.76
Execute to Parse %: -59.73 Latch Hit %: 99.99
Parse CPU to Parse Elapsd %: 98.32 % Non-Parse CPU: 96.73Vendor Environment
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 95.25 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 100.00
Execute to Parse %: -92.37 Latch Hit %: 100.00
Parse CPU to Parse Elapsd %: 97.01 % Non-Parse CPU: 95.48Isn't it horrific? That means the application is parsing many more statements than it's executing. At first this seems so bizarre as to be impossible (and, as Alex Gorbachev points out in a comment to Dan Fink's blog, it's always worth considering that Statspack itself has bugs). However I know this ratio reflects reality because I've already traced this application. There's more on why you might see an negative Execute to Parse ratio in this AskTom thread. Suffice to say I've already expressed my dismay to the vendor and warned that this is likely to limit scalability, but the application won't be re-written any time soon.
Further confirmation of the problem is available by jumping down a few sections in the report to SQL ordered by Parse Calls section (just our environment, this time)
% Total
Parse Calls Executions Parses Hash Value
------------ ------------ -------- ----------
1,777,416 1,777,416 21.63 1081982467
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10) 1,777,416 0 21.63 3225036475
Module: dllhost.exe
SELECT W6Key,W6SubKey_1,Name,RawValue,ActualValue,MinError,MaxEr
ror,TrendValue,FittedValue,Description FROM W6FORECAST_UNITS_MEA
SURES
444,354 0 5.41 956924307
Module: dllhost.exe
SELECT RelatedPlan,PlanScenario,Region,District,JobType,ParentUn
it,JobCategory FROM W6FORECAST_UNITS
The first statement is using bind variables, so you would think that Oracle wouldn't need to parse it each time it executes, but that's not the case here. Worse, the next two statements aren't even executed, just parsed! I've seen this before with certain development tools that parse a statement in order to validate it, but never use the results. (You need to bear in mind the scale of the problem, though. The parse operations for those three statements, divided by the number of minutes that the report covers, is 1100 parse operations per minute, or 18 per second.)
Okay, on to the Top 5 Timed Events, which is often one of the most useful sections.
Our Environment
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 23,832 63.56
log file sync 423,557 6,909 18.43
log file parallel write 837,369 2,164 5.77
sbtwrite2 75,108 1,366 3.64
sbtbackup 6 1,230 3.28Vendor Environment
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 2,342 64.40
db file scattered read 331,760 631 17.35
db file sequential read 152,702 466 12.81
db file parallel write 7,430 72 1.97
log file parallel write 76,262 42 1.15In this case, it's actually not too useful, but let's start with what I can take from it.
1) There's an RMAN backup running at some point during our test - signified by the sbt events.
2) There's a similar percentage of time spent running on the CPU, as opposed to waiting, in both reports.
3) All the other wait events are i/o related, which isn't too surprising in a batch job which is moving data around.
How about the problems, though? Well our system has 23,832 seconds of CPU available over the report period, the vendor's only 2,342. Considering that the vendor's Statspack report covers a longer period, how could that be? Well, the servers have completely different CPU configurations and because Statspack is a system-wide report (well, more accurately, an instance-wide report) it's going to aggregate the CPU resource as well as the wait events. Once you realise this, then it's obvious, but I've seen more than one person wonder how they can get more seconds of CPU than the number of seconds in the snapshot interval.
Here is an excellent Jonathan Lewis article that looks into this difficulty in more detail.
In any case, the interval for this report is 12 hours, which is far too long because I don't know whether all of those wait events are occuring in a short burst, or throughout the reporting period. That's why I would normally run a report for a 15 minute interval rather than such a long period, except when I'm comparing the performance of a single task and want the report to cover it's entire progress.
Another problem showing up in my current investigations is that our environment seems to have a real problem with log file sync events that isn't present on the vendor's environment. My initial reaction to this was that we might have a problem with the disks that our online redo log files sit on. To check this, I flicked down to the next section, containing all the wait events and not just the Top 5, so that I could see the log file sync waits in the vendor's environment too.
Our Environment
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 423,557 4,462 6,909 16 1.0Vendor Environment
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 7,874 3 11 1 1.0Okay, so our log file syncs do take a lot longer, but the stand-out numbers for me here are that our test suffered 423,557 log file sync waits to the vendor's 7,874. Are we *sure* these two environments are running the same workload?!?!?
#1 - amit poddar 2006-12-07 14:48 - (Reply)
"I've seen this before with certain development tools that parse a statement in order to validate it, but never use the results."
With bind variable peeking I thought that oracle postpones the optimizing part to EXEC step ?
If thats true then execeesive parsing should not be a problem ?
Where am I going wrong ?
#1.1 - Doug Burns said:
2006-12-07 15:08 - (Reply)
... and why would bind variable peeking have anything to do with this statement that doesn't use any literals or bind variables?
----------------------------------------
1,777,416 0 21.63 3225036475
Module: dllhost.exe
SELECT W6Key,W6SubKey_1,Name,RawValue,ActualValue,MinError,MaxEr
ror,TrendValue,FittedValue,Description FROM W6FORECAST_UNITS_MEA
SURES
#2 - Doug Burns said:
2006-12-07 14:57 - (Reply)
Where am I going wrong ?
Maybe you're not, maybe I am?
If thats true then execeesive parsing should not be a problem ?
Even if Oracle postpones the optimisation stage until execution and these statements are never executed I *guess* they might never be optimised. I still don't see the point of parsing them though and I very much doubt that there's no overhead associated with parsing them just because the optimisation overhead is eliminated.
But I'd very much welcome someone more knowledgeable than me coming along to correct us both ![]()
#3 - Gabe 2006-12-07 19:13 - (Reply)
The first statement is using bind variables, so you would think that Oracle wouldn't need to parse it each time it executes, but that's not the case here.
Looks like they call dbms_sql.parse for the INSERT statement for every execution (and one dbms_sql.parse call for the SELECT statement too).
So Oracle doesn't have to parse that statement more than once ... unless it is specifically told so.
#3.1 - Doug Burns said:
2006-12-07 19:29 - (Reply)
Thanks Gabe. So, just to make sure I've got this right - even though the call to DBMS_SQL.PARSE will increment the number of parses in the Statspack statistics, that doesn't necessarily represent a parse operation if Oracle decides it doesn't need to parse it?
#3.1.1 - Gabe 2006-12-07 20:36 - (Reply)
[I hope the formatting comes out right ... if not tell me what tags to use]
No. The number of parses will go up with every actual parse call. My _guess_ here is that they deliberately [wrongly] make that parse call for each insert. Oracle has no choice but to oblige. They use binds, that is true and hence the shared pool doesn’t get flooded with non-unique INSERTs … but they burn more CPU than they should etc.
It is interesting you have the same number of parses for the INSERT and the SELECT.
The SELECT FROM W6FORECAST_UNITS_MEASURES with 0 executions suggests they make dbms_sql.parse calls without actually executing the cursor. By extension they might be doing the same thing with the INSERT INTO W6FORECAST_UNITS_MEASURES … that is, specifically calling dbms_sql.parse every single time
Tracing a bit of modified code shamesly lifted out of AskTom …
… and the relevant portion from the trace file …
#3.1.1.1 - Doug Burns said:
2006-12-07 20:40 - (Reply)
Gabe,
Thanks for that. I picked you up wrong the first time, thought I'd read the situation incorrectly, but it looks like they are doing what I thought they were!
Sorry about the comment formatting, it's a constant pain in the neck for me. Your effort was surprisingly good, should look fine, but doesn't ![]()
I'll see what I can do to clean it up.
Cheers,
Doug
#3.1.1.2 - Doug Burns said:
2006-12-07 20:59 - (Reply)
Apparently, the annoying tags will disappear when I upgrade my blogging software. May not happen too soon, but hopefully your example is just about readable for now.
#3.2 - Yas said:
2006-12-08 12:18 - (Reply)
Looks like they call dbms_sql.parse for the INSERT statement for every execution (and one dbms_sql.parse call for the SELECT statement too).
So Oracle doesn't have to parse that statement more than once ... unless it is specifically told so
There is a bug in 9.2.0.7 that can cause insert (with the values clause) statements with bind variables to produce many versions. Using a noappend hint solves the problem and prevents many versions of the insert statement being produced and parsed. I do not remember the bug number but I can look it up if anyone wants to take a look.
#4 - Gary 2006-12-07 22:05 - (Reply)
Given the difference in proportion between 'redo per second' and 'redo per transaction' from you system and the vendors from the first posting, I was tempted to suggest that, for some reason, your system is committing a LOT more frequently than theirs. They are processing at a rate of one 'transaction' per six/seven seconds, and yours is close to 10 'transactions' per second. (That's database transactions, not necessarily business transactions.)
The log file sync would also suggest that waiting for commits to flag as okay is an issue in your system. Is there any configurable commit frequency in the application (eg some form of load process) ?
Those awful parse/execute ratios suggest this isn't a PL/SQL app. Maybe there's some (maybe Java) component that isn't disabling autocommit.
#4.1 - Doug Burns said:
2006-12-07 22:29 - (Reply)
Gary,
Your comments are spot-on regarding the symptoms. There'll be more about the underlying cause in the next blog or two ![]()
#5 - Simon Kelsey 2006-12-08 09:03 - (Reply)
It's not another case of SRDF synchronously copying online redo logs to a remote site, and not being able to keep up, is it?
#5.1 - Doug Burns said:
2006-12-08 09:17 - (Reply)
Something far simpler than that I'm afraid! But I must admit I'm developing a worrying knee-jerk reaction to log file sync waits ![]()


The more I compare these two reports, the more the workload looks different. In particular, as Gary pointed out in his comment to the last blog, our environment is committing a lot more transactions than the vendor's. (Remember those log file sync waits?
Tracked: Dec 08, 12:42