Dec 8: A More Complex Statspack Example - Part 3
The more I compare these two reports, the more the workload looks different. In particular, as Gary pointed out in his comment to the last blog, our environment is committing a lot more transactions than the vendor's. (Remember those log file sync waits? The extra redo per second?)
Let's take a look at the various SQL statements that are running during this batch job to see if the differences are apparent. (This is a long section of the report, so I'll take it a few key statements at a time, focussing on the top 5 SQL statements ordered by Gets first.)
Our System
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
21,165,424 1,777,416 11.9 14.6 188.42 217.73 1081982467
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10) 16,515,271 444,354 37.2 11.4 119.72 145.70 1221790051
Module: dllhost.exe
update W6FORECAST_UNITS set RelatedPlan=:1,PlanScenario=:2,Regio
n=:3,District=:4,JobType=:5,ParentUnit=:6,JobCategory=:7 where W
6Key = :8
9,838,684 13,662 720.1 6.8 370.97 375.31 302416274
Module: dllhost.exe
SELECT W6Key, Revision FROM W6FORECAST_UNITS WHERE ( ( ( History
= :1 AND Region = :2 AND District = :3 AND JobCategory = :4 ) )
AND ( ((StartTime <= :5) AND (FinishTime >= :6)) ) ) AND ( ( (
RelatedPlan = :7 AND PlanScenario IS NULL ) ) OR ( ( RelatedPlan
IS NULL AND PlanScenario IS NULL ) ) ) ORDER BY History , Regio
8,916,132 444,354 20.1 6.2 55.49 52.92 3116000638
Module: dllhost.exe
INSERT INTO W6FORECAST_UNITS (W6Key,Revision,RelatedPlan,PlanSce
nario,Region,District,JobType,StartTime,FinishTime,History,Total
Unit,ParentUnit,JobCategory) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:
10,:11,:12,:13)
4,011,657 693 5,788.8 2.8 18.64 18.77 3585738665
Module: dllhost.exe
SELECT W6ASSIGNMENTS.W6Key,W6ASSIGNMENTS.Revision,W6ASSIGNMENTS.
CreatedBy,W6ASSIGNMENTS.TimeCreated,W6ASSIGNMENTS.CreatingProces
s,W6ASSIGNMENTS.ModifiedBy,W6ASSIGNMENTS.TimeModified,W6ASSIGNME
NTS.ModifyingProcess,W6ASSIGNMENTS.Task,W6ASSIGNMENTS.StartTime,
W6ASSIGNMENTS.FinishTime,W6ASSIGNMENTS.CommentText,W6ASSIGNMENTS
Vendor's System
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
50,736,460 13,870 3,658.0 45.5 1016.56 ######### 3617180858
SELECT W6Key, Revision FROM W6PLAN_ACTIONS WHERE ( ( RelatedP
lan = :1 AND PlanScenario IS NULL ) ) OR ( ( RelatedPlan I
S NULL AND PlanScenario IS NULL ) ) ORDER BY RelatedPlan , P
lanScenario23,867,527 1,796,528 13.3 21.4 250.22 374.91 2597570457
INSERT INTO W6FORECAST_UNITS_MEASURES (W6Key,W6SubKey_1,Name,Raw
Value,ActualValue,MinError,MaxError,TrendValue,FittedValue,Descr
iption) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10)15,507,602 449,132 34.5 13.9 120.33 231.30 2081669994
update W6FORECAST_UNITS set RelatedPlan=:1,PlanScenario=:2,Regio
n=:3,District=:4,JobType=:5,ParentUnit=:6,JobCategory=:7 where W
6Key = :85,020,204 449,132 11.2 4.5 49.48 77.33 2235729426
INSERT INTO W6FORECAST_UNITS (W6Key,Revision,RelatedPlan,PlanSce
nario,Region,District,JobType,StartTime,FinishTime,History,Total
Unit,ParentUnit,JobCategory) VALUES(:1,:2,:3,:4,:5,:6,:7,:8,:9,:
10,:11,:12,:13)4,701,942 13,662 344.2 4.2 244.66 271.37 1861931583
SELECT W6Key, Revision FROM W6FORECAST_UNITS WHERE ( ( ( Hist
ory = :1 AND Region = :2 AND District = :3 AND JobCategory
= :4 ) ) AND ( ((StartTime <= :5) AND (FinishTime >= :6)) )
) AND ( ( ( RelatedPlan = :7 AND PlanScenario IS NULL ) )
OR ( ( RelatedPlan IS NULL AND PlanScenario IS NULL ) ) )
Most of the statements are duplicated in the two environments, but the bold statement in the vendor's environment doesn't appear in ours. As the top SQL by number of gets, that's a little worrying. What's even more confusing is that I expected to see more workload in *our* environment. Well, I've only looked at the top 5 SQL statements, so I'll keep looking further down, but to save us all from a 12-part blog series I'll narrow it down a bit.
All the evidence so far is that our environment is experiencing a higher transaction throughput. so I'll focus on INSERT, UPDATE and DELETE statements to try to work out where the additional activity is coming from. Note - the output below is limited to those statements.
Our System
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
2,031,055 219,574 9.2 1.4 47.75 48.39 2867215530
Module: dllhost.exe
insert into W6GIS_CACHE (Origin_Lat,Origin_Long,Destination_Lat,
Destination_Long,TravelTime,Distance,GISDataSource,UpdateStatus)
values(:1,:2,:3,:4,:5,:6,:7,:8) 2,193,162 93,612 23.4 1.5 58.90 57.60 713686856
Module: dllhost.exe
DELETE FROM W6OPERATION_LOG WHERE OperationTime < :1
CPU Elapsd
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
976 23,254 0.0 0.0 7.23 23.81 4083657034
Module: dllhost.exe
INSERT INTO W6AGENTS_OPERATIONS (W6Key,Revision,CreatedBy,TimeCr
eated,CreatingProcess,ModifiedBy,TimeModified,ModifyingProcess,S
erver,AgentName,StartTime,Status,FinishTime,Integer1,Integer2,In
teger3,Integer4,Integer5,String1,String2,String3,Boolean1,Boolea
n2,Boolean3,Boolean4,Boolean5,LastError,Integer6,Integer7,Intege
Vendor's System
No statements
Okay, so at this stage it looks like our environment is handling activity that the vendor's system isn't. I'd already told the vendor that it looked like our workload was higher and so we set up a conference call to discuss what those additional statements were doing. It turned out that when the vendor is performing the test, the database is working in isolation on this job (which is what we want) but when we're running the job, our test environment has 10 application servers connected to it, performing normal maintenance activities. i.e. We're handling the normal 'background noise' of this application, whereas the vendor's test system is simplified.
I could wade through the rest of the report but I think I'm going to leave it there for now. Problem solved, although this one probably took at least half a day of comparing the reports and sending out an analysis to the vendor, then a meeting and now we await their fixes to make our system perform better but the fixes won't be to the main batch job, but the background processes.
#1 - Daryl 2006-12-08 13:39 - (Reply)
50 million gets taking 1016s of cpu time is excessive. Do some sql tuning on the statement to see if there are better index options to be found ( abit tricking since there is an OR and NULL in the sql) (Note that even 3000+ gets per exec is not that bad) Resort your data based on gets per exec.
Discard the various sql with low gets per exec.
#2 - Doug Burns said:
2006-12-08 14:10 - (Reply)
The thing is, that statement isn't even running on our system. I'm reliably informed it was an adhoc job so I'm not convinced I should be fixing the vendor's code
#3 - Daryl 2006-12-08 16:09 - (Reply)
oh ya .. looking at your statsplack clip, you dont have a problem
#4 - Jonathan Lewis said:
2006-12-08 16:14 - (Reply)
Daryl,
Without knowing the application, you cannot say that "50M gets taking 1016 CPUs excessive". For a start, you don't know how many rows were acquired, sorted and returned per execution. 3,000 gets per execution may be perfectly reasonable if the result set was a few hundred rows each time.
As it is, based on apparent column names, the query probably has to do a tablescan - and tablescans can soak a lot of CPU, especially when they are a few thousand blocks long.
#5 - Gary 2006-12-09 11:48 - (Reply)
Okay, can you give us a hint on the "SELECT W6Key, Revision FROM W6FORECAST_UNITS..." statement. 13,662 executions in both systems, but yours has over twice the number of buffer gets.
#6 - Doug Burns said:
2006-12-11 00:48 - (Reply)
Gary,
Sorry for the sluggish reply - busy weekend
One possible reason for the large difference in resource usage of that particular statement is that the vendor is testing on a database populated from an export file we sent them. It's likely that this table and it's indexes, which suffer large scale inserts and deletes might not be as efficiently organised in our database as the vendor's. I suspect that's the cause of some of the more subtle variations in the statistics.
It's only one possibility, though, which I planned to mention at some point, but the difference is quite dramatic so it could be something else entirely.
The problem is that, until we can be sure we're testing the same workloads, I've shelved this for now because the situation is too confusing, but that statement is definitely worth investigating further - cheers.
I'm going to try to summarise some of this story in the next blog.
#7 - Jonathan Lewis said:
2006-12-11 08:21 - (Reply)
Another possibility, also relating to the export/import cycle and possibly affected by the extra workload on the production box, is the cost of both read-consistency and delayed block cleanout. A rough indication of this effect would come from cross-checking the three statistics "%undo records applied%".
#8 - Doug Burns said:
2006-12-11 11:00 - (Reply)
Very small, though different, values in the reports (note for anyone looking at their own reports, you need to search for "undo records" because Statspack truncates the statistic name)
I'll apologise up-front for the formatting. Comments are not the ideal place for code
Our Environment
Vendor's Environment
By 'delayed block cleanout', do you mean if the tests were run immediately after the import? I suspect the job had been run a couple of times between the import and the test that the report covers.
However, there are much more significant differences in undo usage between the instances, for example these waits only exist in our environment.
This is partly down to the additional transaction workload, as confirmed by other undo stats, but I think there's also something strange going on with the undo tablespace in our environment.
The more I think about this, the more frustrating it is not to have control over both tests. Control over the workload; whether the instance was restarted; what level of Statspack snapshot.
#9 - Jonathan Lewis said:
2006-12-11 16:57 - (Reply)
Doug, I'd forgotten about the truncation, there are actually three stats to check:
transaction tables consistent reads - undo records applied
data blocks consistent reads - undo records applied
rollback changes - undo records applied
The first one won't even show the U of undo, the second gets to the UN.
Another point to check (only as a clue) is the number of 'consistent gets - examination' which indicate access to undo blocks for read consistency (plus a couple of other phenomena, unfortunately).
#10 - Doug Burns said:
2006-12-11 18:04 - (Reply)
there are actually three stats to check
Which I should have worked out for myself, of course
Our System
Vendor's System
I'm at work at the moment, so I need to go and run some release scripts

Just at a glance, I think the big difference in 'consistent gets - examination' reflects the fact that our environment has a constant stream of background activity on objects completely unrelated to the batch job we're tuning.
#11 - Jonathan Lewis said:
2006-12-11 22:37 - (Reply)
Doug, given the very small number of application of undo records, compared to the huge volume of examinations, I suspect that most of the examinations are actually related to index root block accesses - possibly for foreign key referential integrity checks (unless you have some perfect single table hash clusters).
#12 - Doug Burns said:
2006-12-11 22:46 - (Reply)
unless you have some perfect single table hash clusters
No, we don't
The FK integrity checks would fit in with the background maintenance agents that are running in our environment.
Welcome to the 22nd edition of Log Buffer, a weekly compendium of postings and news from database-related...
Tracked: Dec 08, 17:05