A More Complex Statspack Example - Part 3

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Dec 8: A More Complex Statspack Example - Part 3

The more I compare these two reports, the more the workload looks different. In particular, as Gary pointed out in his comment to the last blog, our environment is committing a lot more transactions than the vendor's. (Remember those log file sync waits? The extra redo per second?)

Let's take a look at the various SQL statements that are running during this batch job to see if the differences are apparent. (This is a long section of the report, so I'll take it a few key statements at a time, focussing on the top 5 SQL statements ordered by Gets first.)

Our System

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     21,165,424    1,777,416           11.9   14.6   188.42    217.73 1081982467
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)

     16,515,271      444,354           37.2   11.4   119.72    145.70 1221790051
Module: dllhost.exe
update W6FORECAST_UNITS set RelatedPlan=:1,PlanScenario=:2,Regio
n=:3,District=:4,JobType=:5,ParentUnit=:6,JobCategory=:7 where W
6Key = :8

      9,838,684       13,662          720.1    6.8   370.97    375.31  302416274
Module: dllhost.exe
SELECT W6Key, Revision FROM W6FORECAST_UNITS WHERE ( ( ( History
 = :1 AND Region = :2 AND District = :3 AND JobCategory = :4 ) )
 AND ( ((StartTime <= :5) AND (FinishTime >= :6)) ) ) AND ( ( (
RelatedPlan = :7 AND PlanScenario IS NULL ) ) OR ( ( RelatedPlan
 IS NULL AND PlanScenario IS NULL ) ) ) ORDER BY History , Regio

      8,916,132      444,354           20.1    6.2    55.49     52.92 3116000638
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS (W6Key,Revision,RelatedPlan,PlanSce
nario,Region,District,JobType,StartTime,FinishTime,History,Total
Unit,ParentUnit,JobCategory) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:
10,:11,:12,:13)

      4,011,657          693        5,788.8    2.8    18.64     18.77 3585738665
Module: dllhost.exe
SELECT W6ASSIGNMENTS.W6Key,W6ASSIGNMENTS.Revision,W6ASSIGNMENTS.
CreatedBy,W6ASSIGNMENTS.TimeCreated,W6ASSIGNMENTS.CreatingProces
s,W6ASSIGNMENTS.ModifiedBy,W6ASSIGNMENTS.TimeModified,W6ASSIGNME
NTS.ModifyingProcess,W6ASSIGNMENTS.Task,W6ASSIGNMENTS.StartTime,
W6ASSIGNMENTS.FinishTime,W6ASSIGNMENTS.CommentText,W6ASSIGNMENTS

Vendor's System

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     50,736,460       13,870        3,658.0   45.5  1016.56 ######### 3617180858
SELECT W6Key, Revision FROM W6PLAN_ACTIONS  WHERE (  (  RelatedP
lan  = :1  AND PlanScenario IS NULL )  )  OR (  (  RelatedPlan I
S NULL  AND PlanScenario IS NULL )  )  ORDER BY  RelatedPlan , P
lanScenario

     23,867,527    1,796,528           13.3   21.4   250.22    374.91 2597570457
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)

     15,507,602      449,132           34.5   13.9   120.33    231.30 2081669994
update W6FORECAST_UNITS set RelatedPlan=:1,PlanScenario=:2,Regio
n=:3,District=:4,JobType=:5,ParentUnit=:6,JobCategory=:7 where W
6Key = :8

      5,020,204      449,132           11.2    4.5    49.48     77.33 2235729426
INSERT INTO W6FORECAST_UNITS (W6Key,Revision,RelatedPlan,PlanSce
nario,Region,District,JobType,StartTime,FinishTime,History,Total
Unit,ParentUnit,JobCategory) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:
10,:11,:12,:13)

      4,701,942       13,662          344.2    4.2   244.66    271.37 1861931583
SELECT W6Key, Revision FROM W6FORECAST_UNITS  WHERE ( (  (  Hist
ory  = :1  AND Region  = :2  AND District  = :3  AND JobCategory
 = :4 )  )  AND ( ((StartTime <= :5) AND (FinishTime >= :6)) )
)  AND ( (  (  RelatedPlan  = :7  AND PlanScenario IS NULL )  )
 OR (  (  RelatedPlan IS NULL  AND PlanScenario IS NULL )  )  )

Most of the statements are duplicated in the two environments, but the bold statement in the vendor's environment doesn't appear in ours. As the top SQL by number of gets, that's a little worrying. What's even more confusing is that I expected to see more workload in *our* environment. Well, I've only looked at the top 5 SQL statements, so I'll keep looking further down, but to save us all from a 12-part blog series I'll narrow it down a bit.

All the evidence so far is that our environment is experiencing a higher transaction throughput. so I'll focus on INSERT, UPDATE and DELETE statements to try to work out where the additional activity is coming from. Note - the output below is limited to those statements.

Our System

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
      2,031,055      219,574            9.2    1.4    47.75     48.39 2867215530
Module: dllhost.exe
insert into W6GIS_CACHE (Origin_Lat,Origin_Long,Destination_Lat,
Destination_Long,TravelTime,Distance,GISDataSource,UpdateStatus)
values(:1,:2,:3,:4,:5,:6,:7,:8)

      2,193,162       93,612           23.4    1.5    58.90     57.60  713686856
Module: dllhost.exe
DELETE FROM W6OPERATION_LOG WHERE OperationTime < :1

                                                     CPU      Elapsd
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
            976       23,254            0.0    0.0     7.23     23.81 4083657034
Module: dllhost.exe
INSERT INTO W6AGENTS_OPERATIONS (W6Key,Revision,CreatedBy,TimeCr
eated,CreatingProcess,ModifiedBy,TimeModified,ModifyingProcess,S
erver,AgentName,StartTime,Status,FinishTime,Integer1,Integer2,In
teger3,Integer4,Integer5,String1,String2,String3,Boolean1,Boolea
n2,Boolean3,Boolean4,Boolean5,LastError,Integer6,Integer7,Intege



Vendor's System

No statements

Okay, so at this stage it looks like our environment is handling activity that the vendor's system isn't. I'd already told the vendor that it looked like our workload was higher and so we set up a conference call to discuss what those additional statements were doing. It turned out that when the vendor is performing the test, the database is working in isolation on this job (which is what we want) but when we're running the job, our test environment has 10 application servers connected to it, performing normal maintenance activities. i.e. We're handling the normal 'background noise' of this application, whereas the vendor's test system is simplified.

I could wade through the rest of the report but I think I'm going to leave it there for now. Problem solved, although this one probably took at least half a day of comparing the reports and sending out an analysis to the vendor, then a meeting and now we await their fixes to make our system perform better but the fixes won't be to the main batch job, but the background processes.

Posted by Doug Burns Comments: (12) Trackback: (1)

Trackbacks
Trackback specific URI for this entry

Log Buffer #22: A Carnival of the Vanities for DBAs
Welcome to the 22nd edition of Log Buffer, a weekly compendium of postings and news from database-related...
Weblog: Brian Kelley
Tracked: Dec 08, 17:05

Comments
Display comments as (Linear | Threaded)

#1 - Daryl 2006-12-08 13:39 - (Reply)

50 million gets taking 1016s of cpu time is excessive. Do some sql tuning on the statement to see if there are better index options to be found ( abit tricking since there is an OR and NULL in the sql) (Note that even 3000+ gets per exec is not that bad) Resort your data based on gets per exec.

Discard the various sql with low gets per exec.

#2 - Doug Burns said:
2006-12-08 14:10 - (Reply)

The thing is, that statement isn't even running on our system. I'm reliably informed it was an adhoc job so I'm not convinced I should be fixing the vendor's code ;-)

#3 - Daryl 2006-12-08 16:09 - (Reply)

oh ya .. looking at your statsplack clip, you dont have a problem :-)

#4 - Jonathan Lewis said:
2006-12-08 16:14 - (Reply)

Daryl,

Without knowing the application, you cannot say that "50M gets taking 1016 CPUs excessive". For a start, you don't know how many rows were acquired, sorted and returned per execution. 3,000 gets per execution may be perfectly reasonable if the result set was a few hundred rows each time.

As it is, based on apparent column names, the query probably has to do a tablescan - and tablescans can soak a lot of CPU, especially when they are a few thousand blocks long.

#5 - Gary 2006-12-09 11:48 - (Reply)

Okay, can you give us a hint on the "SELECT W6Key, Revision FROM W6FORECAST_UNITS..." statement. 13,662 executions in both systems, but yours has over twice the number of buffer gets.

#6 - Doug Burns said:
2006-12-11 00:48 - (Reply)

Gary,

Sorry for the sluggish reply - busy weekend ;-)

One possible reason for the large difference in resource usage of that particular statement is that the vendor is testing on a database populated from an export file we sent them. It's likely that this table and it's indexes, which suffer large scale inserts and deletes might not be as efficiently organised in our database as the vendor's. I suspect that's the cause of some of the more subtle variations in the statistics.

It's only one possibility, though, which I planned to mention at some point, but the difference is quite dramatic so it could be something else entirely.

The problem is that, until we can be sure we're testing the same workloads, I've shelved this for now because the situation is too confusing, but that statement is definitely worth investigating further - cheers.

I'm going to try to summarise some of this story in the next blog.

#7 - Jonathan Lewis said:
2006-12-11 08:21 - (Reply)

Another possibility, also relating to the export/import cycle and possibly affected by the extra workload on the production box, is the cost of both read-consistency and delayed block cleanout. A rough indication of this effect would come from cross-checking the three statistics "%undo records applied%".

#8 - Doug Burns said:
2006-12-11 11:00 - (Reply)

Very small, though different, values in the reports (note for anyone looking at their own reports, you need to search for "undo records" because Statspack truncates the statistic name)

I'll apologise up-front for the formatting. Comments are not the ideal place for code :-(

Our Environment

CODE:
Statistic                                      Total     per Second    per Trans --------------------------------- ------------------ -------------- ------------ rollback changes - undo records a                976            0.0          0.0


Vendor's Environment
CODE:
rollback changes - undo records a                310            0.0          0.0


By 'delayed block cleanout', do you mean if the tests were run immediately after the import? I suspect the job had been run a couple of times between the import and the test that the report covers.

However, there are much more significant differences in undo usage between the instances, for example these waits only exist in our environment.

CODE:
                                                                 Avg                                                      Total Wait   wait    Waits Event                               Waits   Timeouts   Time (s)   (ms)     /txn ---------------------------- ------------ ---------- ---------- ------ -------- undo segment extension          1,285,971  1,285,907          0      0      3.1


This is partly down to the additional transaction workload, as confirmed by other undo stats, but I think there's also something strange going on with the undo tablespace in our environment.

The more I think about this, the more frustrating it is not to have control over both tests. Control over the workload; whether the instance was restarted; what level of Statspack snapshot.

#9 - Jonathan Lewis said:
2006-12-11 16:57 - (Reply)

Doug, I'd forgotten about the truncation, there are actually three stats to check:

transaction tables consistent reads - undo records applied

data blocks consistent reads - undo records applied

rollback changes - undo records applied

The first one won't even show the U of undo, the second gets to the UN.

Another point to check (only as a clue) is the number of 'consistent gets - examination' which indicate access to undo blocks for read consistency (plus a couple of other phenomena, unfortunately).

#10 - Doug Burns said:
2006-12-11 18:04 - (Reply)

there are actually three stats to check

Which I should have worked out for myself, of course :-(

Our System

CODE:
Statistic                                      Total     per Second    per Trans --------------------------------- ------------------ -------------- consistent gets - examination             36,933,862          847.8         88.3 data blocks consistent reads - un             38,492            0.9          0.1 rollback changes - undo records a                976            0.0          0.0 transaction tables consistent rea (DOESN'T APPEAR)


Vendor's System
CODE:
Statistic                                      Total     per Second    per Trans --------------------------------- ------------------ -------------- ------------ consistent gets - examination              1,651,581           31.4        214.1 data blocks consistent reads - un             77,686            1.5         10.1 rollback changes - undo records a                310            0.0          0.0 transaction tables consistent rea                  0            0.0          0.00


I'm at work at the moment, so I need to go and run some release scripts :-(

Just at a glance, I think the big difference in 'consistent gets - examination' reflects the fact that our environment has a constant stream of background activity on objects completely unrelated to the batch job we're tuning.

#11 - Jonathan Lewis said:
2006-12-11 22:37 - (Reply)

Doug, given the very small number of application of undo records, compared to the huge volume of examinations, I suspect that most of the examinations are actually related to index root block accesses - possibly for foreign key referential integrity checks (unless you have some perfect single table hash clusters).

#12 - Doug Burns said:
2006-12-11 22:46 - (Reply)

unless you have some perfect single table hash clusters

No, we don't

The FK integrity checks would fit in with the background maintenance agents that are running in our environment.


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

A couple of posts about Incremental Stats confusion

Part 1
Part 2

Comments

personal blog about Moving Sideways
Wed, 01.06.2016 18:34
That is a good tip particularl y to those fresh to the blogos phere. Short [...]
odziezprestige.pl about Moving Sideways
Wed, 01.06.2016 17:07
Please let me know if you're l ooking for a article writer fo r your site. [...]
Doug Burns about Moving Sideways
Tue, 10.05.2016 22:43
Oh, I won't give it that long unless I enjoy it ;-)

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml dbms_stats
xml Direct Path Reads
xml Fun
xml Grid Control
xml hotsos 2010
xml listener
xml locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml partitions
xml Patching
xml Swingbench
xml The Reality Gap
xml Time Matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl