A More Complex Statspack Example - Part 2

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Dec 7: A More Complex Statspack Example - Part 2

So far we're barely into the comparison of the Statspack reports from the two different test environments and they look quite different already.

Next up, the Instance Efficiency Percentages. I have to confess that I rarely find these very useful because the numbers are always near 100% on most databases I look at and, even when they aren't, they're aggregating so much detail that who knows what they're telling us? (Actually, this is a good point to say that there's a lot of opinion in these blogs and this is my personal approach in this particular case. I'm very interested in hearing how others approach these reports. For example, here is Jonathan Lewis discussing Statspack instance efficiency percentages.)

This particular example proves an exception to the rule, though, largely because of the Execute to Parse ratio.

Our Environment

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:    100.00
            Buffer  Hit   %:   98.63    In-memory Sort %:    100.00
            Library Hit   %:   99.81        Soft Parse %:     99.76
         Execute to Parse %:  -59.73         Latch Hit %:     99.99
Parse CPU to Parse Elapsd %:   98.32     % Non-Parse CPU:     96.73

Vendor Environment

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   95.25    In-memory Sort %:  100.00
            Library Hit   %:  100.00        Soft Parse %:  100.00
         Execute to Parse %:  -92.37         Latch Hit %:  100.00
Parse CPU to Parse Elapsd %:   97.01     % Non-Parse CPU:   95.48

Isn't it horrific? That means the application is parsing many more statements than it's executing. At first this seems so bizarre as to be impossible (and, as Alex Gorbachev points out in a comment to Dan Fink's blog, it's always worth considering that Statspack itself has bugs). However I know this ratio reflects reality because I've already traced this application. There's more on why you might see an negative Execute to Parse ratio in this AskTom thread. Suffice to say I've already expressed my dismay to the vendor and warned that this is likely to limit scalability, but the application won't be re-written any time soon.

Further confirmation of the problem is available by jumping down a few sections in the report to SQL ordered by Parse Calls section (just our environment, this time)

                           % Total
 Parse Calls  Executions   Parses  Hash Value
------------ ------------ -------- ----------
   1,777,416    1,777,416    21.63 1081982467
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)

   1,777,416            0    21.63 3225036475
Module: dllhost.exe
SELECT W6Key,W6SubKey_1,Name,RawValue,ActualValue,MinError,MaxEr
ror,TrendValue,FittedValue,Description FROM W6FORECAST_UNITS_MEA
SURES

     444,354            0     5.41  956924307
Module: dllhost.exe
SELECT RelatedPlan,PlanScenario,Region,District,JobType,ParentUn
it,JobCategory FROM W6FORECAST_UNITS

The first statement is using bind variables, so you would think that Oracle wouldn't need to parse it each time it executes, but that's not the case here. Worse, the next two statements aren't even executed, just parsed! I've seen this before with certain development tools that parse a statement in order to validate it, but never use the results. (You need to bear in mind the scale of the problem, though. The parse operations for those three statements, divided by the number of minutes that the report covers, is 1100 parse operations per minute, or 18 per second.)

Okay, on to the Top 5 Timed Events, which is often one of the most useful sections.

Our Environment

Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                       23,832    63.56
log file sync                                     423,557       6,909    18.43
log file parallel write                           837,369       2,164     5.77
sbtwrite2                                          75,108       1,366     3.64
sbtbackup                                               6       1,230     3.28

Vendor Environment

Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                        2,342    64.40
db file scattered read                            331,760         631    17.35
db file sequential read                           152,702         466    12.81
db file parallel write                              7,430          72     1.97
log file parallel write                            76,262          42     1.15

In this case, it's actually not too useful, but let's start with what I can take from it.

1) There's an RMAN backup running at some point during our test - signified by the sbt events.
2) There's a similar percentage of time spent running on the CPU, as opposed to waiting, in both reports.
3) All the other wait events are i/o related, which isn't too surprising in a batch job which is moving data around.

How about the problems, though? Well our system has 23,832 seconds of CPU available over the report period, the vendor's only 2,342. Considering that the vendor's Statspack report covers a longer period, how could that be? Well, the servers have completely different CPU configurations and because Statspack is a system-wide report (well, more accurately, an instance-wide report) it's going to aggregate the CPU resource as well as the wait events. Once you realise this, then it's obvious, but I've seen more than one person wonder how they can get more seconds of CPU than the number of seconds in the snapshot interval.

Here is an excellent Jonathan Lewis article that looks into this difficulty in more detail.

In any case, the interval for this report is 12 hours, which is far too long because I don't know whether all of those wait events are occuring in a short burst, or throughout the reporting period. That's why I would normally run a report for a 15 minute interval rather than such a long period, except when I'm comparing the performance of a single task and want the report to cover it's entire progress.

Another problem showing up in my current investigations is that our environment seems to have a real problem with log file sync events that isn't present on the vendor's environment. My initial reaction to this was that we might have a problem with the disks that our online redo log files sit on. To check this, I flicked down to the next section, containing all the wait events and not just the Top 5, so that I could see the log file sync waits in the vendor's environment too.

Our Environment

                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync                     423,557      4,462      6,909     16      1.0

Vendor Environment

                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync                       7,874          3         11      1      1.0

Okay, so our log file syncs do take a lot longer, but the stand-out numbers for me here are that our test suffered 423,557 log file sync waits to the vendor's 7,874. Are we *sure* these two environments are running the same workload?!?!?

Posted by Doug Burns Comments: (13) Trackback: (1)

Trackbacks
Trackback specific URI for this entry

A More Complex Statspack Example - Part 3
The more I compare these two reports, the more the workload looks different. In particular, as Gary pointed out in his comment to the last blog, our environment is committing a lot more transactions than the vendor's. (Remember those log file sync waits?
Weblog: Doug's Oracle Blog
Tracked: Dec 08, 12:42

Comments
Display comments as (Linear | Threaded)

#1 - amit poddar 2006-12-07 14:48 - (Reply)

"I've seen this before with certain development tools that parse a statement in order to validate it, but never use the results."

With bind variable peeking I thought that oracle postpones the optimizing part to EXEC step ?

If thats true then execeesive parsing should not be a problem ?

Where am I going wrong ?

#1.1 - Doug Burns said:
2006-12-07 15:08 - (Reply)

... and why would bind variable peeking have anything to do with this statement that doesn't use any literals or bind variables?

----------------------------------------

1,777,416 0 21.63 3225036475
Module: dllhost.exe
SELECT W6Key,W6SubKey_1,Name,RawValue,ActualValue,MinError,MaxEr
ror,TrendValue,FittedValue,Description FROM W6FORECAST_UNITS_MEA
SURES

#2 - Doug Burns said:
2006-12-07 14:57 - (Reply)

Where am I going wrong ?

Maybe you're not, maybe I am?

If thats true then execeesive parsing should not be a problem ?

Even if Oracle postpones the optimisation stage until execution and these statements are never executed I *guess* they might never be optimised. I still don't see the point of parsing them though and I very much doubt that there's no overhead associated with parsing them just because the optimisation overhead is eliminated.

But I'd very much welcome someone more knowledgeable than me coming along to correct us both ;-)

#3 - Gabe 2006-12-07 19:13 - (Reply)

The first statement is using bind variables, so you would think that Oracle wouldn't need to parse it each time it executes, but that's not the case here.

Looks like they call dbms_sql.parse for the INSERT statement for every execution (and one dbms_sql.parse call for the SELECT statement too).

So Oracle doesn't have to parse that statement more than once ... unless it is specifically told so.

#3.1 - Doug Burns said:
2006-12-07 19:29 - (Reply)

Thanks Gabe. So, just to make sure I've got this right - even though the call to DBMS_SQL.PARSE will increment the number of parses in the Statspack statistics, that doesn't necessarily represent a parse operation if Oracle decides it doesn't need to parse it?

#3.1.1 - Gabe 2006-12-07 20:36 - (Reply)

[I hope the formatting comes out right ... if not tell me what tags to use]

No. The number of parses will go up with every actual parse call. My _guess_ here is that they deliberately [wrongly] make that parse call for each insert. Oracle has no choice but to oblige. They use binds, that is true and hence the shared pool doesn’t get flooded with non-unique INSERTs … but they burn more CPU than they should etc.

It is interesting you have the same number of parses for the INSERT and the SELECT.

The SELECT FROM W6FORECAST_UNITS_MEASURES with 0 executions suggests they make dbms_sql.parse calls without actually executing the cursor. By extension they might be doing the same thing with the INSERT INTO W6FORECAST_UNITS_MEASURES … that is, specifically calling dbms_sql.parse every single time

Tracing a bit of modified code shamesly lifted out of AskTom …

CODE:
create table abc ( n number not null , v varchar2(10) not null); create or replace procedure p1 as     l_cursor    int;     l_status    int; begin     for i in 1 .. 100     loop         l_cursor := dbms_sql.open_cursor;         dbms_sql.parse( l_cursor,                        'select n,v from abc',                         dbms_sql.native );         dbms_sql.parse( l_cursor,                        'insert into abc (n,v)                         values (:x,:y)',                         dbms_sql.native );         dbms_sql.bind_variable( l_cursor, ':x', i );         dbms_sql.bind_variable( l_cursor, ':y', to_char(i) );         l_status := dbms_sql.execute( l_cursor );         dbms_sql.close_cursor(l_cursor);     end loop; end; /


… and the relevant portion from the trace file …

CODE:
select n,v  from  abc call     count       cpu    elapsed       disk      query    current        rows ------- ------  -------- ---------- ---------- ---------- ----------  ---------- Parse      100      0.04       0.09          0          1          0           0 Execute      0      0.00       0.00          0          0          0           0 Fetch        0      0.00       0.00          0          0          0           0 ------- ------  -------- ---------- ---------- ---------- ----------  ---------- total      100      0.04       0.09          0          1          0           0 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 36     (recursive depth: 1) ******************************************************************************** insert into abc (n,v)                         values (:x,:y) call     count       cpu    elapsed       disk      query    current        rows ------- ------  -------- ---------- ---------- ---------- ----------  ---------- Parse      100      0.01       0.00          0          0          0           0 Execute    100      0.03       0.07          0        100        304         100 Fetch        0      0.00       0.00          0          0          0           0 ------- ------  -------- ---------- ---------- ---------- ----------  ---------- total      200      0.04       0.07          0        100        304         100 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 36     (recursive depth: 1)

#3.1.1.1 - Doug Burns said:
2006-12-07 20:40 - (Reply)

Gabe,

Thanks for that. I picked you up wrong the first time, thought I'd read the situation incorrectly, but it looks like they are doing what I thought they were!

Sorry about the comment formatting, it's a constant pain in the neck for me. Your effort was surprisingly good, should look fine, but doesn't :-(

I'll see what I can do to clean it up.

Cheers,

Doug

#3.1.1.2 - Doug Burns said:
2006-12-07 20:59 - (Reply)

Apparently, the annoying tags will disappear when I upgrade my blogging software. May not happen too soon, but hopefully your example is just about readable for now.

#3.2 - Yas said:
2006-12-08 12:18 - (Reply)

Looks like they call dbms_sql.parse for the INSERT statement for every execution (and one dbms_sql.parse call for the SELECT statement too).

So Oracle doesn't have to parse that statement more than once ... unless it is specifically told so


There is a bug in 9.2.0.7 that can cause insert (with the values clause) statements with bind variables to produce many versions. Using a noappend hint solves the problem and prevents many versions of the insert statement being produced and parsed. I do not remember the bug number but I can look it up if anyone wants to take a look.

#4 - Gary 2006-12-07 22:05 - (Reply)

Given the difference in proportion between 'redo per second' and 'redo per transaction' from you system and the vendors from the first posting, I was tempted to suggest that, for some reason, your system is committing a LOT more frequently than theirs. They are processing at a rate of one 'transaction' per six/seven seconds, and yours is close to 10 'transactions' per second. (That's database transactions, not necessarily business transactions.)

The log file sync would also suggest that waiting for commits to flag as okay is an issue in your system. Is there any configurable commit frequency in the application (eg some form of load process) ?
Those awful parse/execute ratios suggest this isn't a PL/SQL app. Maybe there's some (maybe Java) component that isn't disabling autocommit.

#4.1 - Doug Burns said:
2006-12-07 22:29 - (Reply)

Gary,

Your comments are spot-on regarding the symptoms. There'll be more about the underlying cause in the next blog or two ;-)

#5 - Simon Kelsey 2006-12-08 09:03 - (Reply)

It's not another case of SRDF synchronously copying online redo logs to a remote site, and not being able to keep up, is it?

#5.1 - Doug Burns said:
2006-12-08 09:17 - (Reply)

Something far simpler than that I'm afraid! But I must admit I'm developing a worrying knee-jerk reaction to log file sync waits ;-)


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl