Sep 1: Time Matters - DB CPU
"... 'DB time' fully contains 'DB CPU' and both are measured values (which means in particular that 'DB CPU' is not inflated by runq time)"
I'd like to talk about the end of that sentence in particular. About a year and a half ago, I wrote a paper (that's a PDF link) about building response time profiles for sessions that use Parallel Execution. One of the problems I highlighted in that paper was that PX jobs are prone to overload the available CPU resources simply because of the number of PX slave processes that are running concurrently. It's mentioned first in the section on "More Lost Time" on page 4 but most of the rest of the paper goes on to discuss this particular problem. I wasn't very successful coming up with any solutions beyond my usual approach of combining Oracle and O/S tools.
Essentially, as the queue of runnable processes that are waiting to execute on a CPU grows (I'll just refer to it as the run queue from now on), the Oracle server connection processes appear to lose time because they're unable to get on to a CPU in order to record the end time of an event. In the Response Time profilers I've seen, this time is recorded as unaccounted-for and it's what JB's referring to when he talks about runq time. Actually, time spent in the run queue is not the only cause of unaccounted-for time but it's the one I'll focus on here. (Here's another plug for Cary and Jeff's book - other reasons are discussed in chapter 7 on Oracle Kernel Timings.)
As JB stated, DB CPU does not include this time spent waiting in the run queue but the ASH samples for the session are aware that the server connection process is on the CPU or in the run queue. As a consequence, you can see apparently contradictory graphs in the various OEM (DB Control or Grid Control) Performance pages. Although I've tried this with the PX tests and confirmed the same behaviour, I'm going to use Swingbench for the examples here to continue the approach so far and so that you can try them out yourself if you want. (I might post PX examples later.)
Swingbench is supplied with several example applications, one of which is the Sales History (SH) application which uses the SH example schema supplied with recent versions of Oracle. The key attribute of this test for my purpose is that it's read only and CPU intensive, so that the limited I/O bandwidth on a home PC isn't the bottleneck.
I ran 8 concurrent sessions of the SH application for a total duration of 5 minutes with no delay between transactions, which was pretty unrealistic but I was only interested in over-loading the single dual-core CPU in the PC. Soon after the test started, this was how the database instance's home page looked.
CPU is at 100% and there are about 7 sessions active in the database on average, just under 2 of which are running on the CPU, while the rest are waiting. That looks about right. Now, if we look at the performance page at the end of the test, the results are quite different.
It looks like there were only ever about 2 sessions active on average! However, there are a couple of hints as to what's going on. If we look at the graph above as well as the Average Active Sessions graph, we can see that the Host's run queue is far in excess of the two available CPU cores.
Things become even more interesting if we look at the Top Activity screen for the same period.
In that view, the sessions waiting in the run queue are shown, so there's an average of about 7 active sessions during the test. Take a good look at the difference in the key text for the two graphs as well. The Average Active Sessions graph on the Performance Home page shows 'CPU Used', whereas the Top Activity page shows 'CPU'. It's a small but significant and appropriate difference. The first graph shows you the average number of sessions that are actually using CPU and is related to DB CPU in the Time Model Stats, but the Top Activity screen is really showing 'CPU + Wait for CPU', which is the event you'll see throughout an ASH report covering the same period. e.g.
- '# Samples Active' shows the number of ASH samples in which the session was found waiting for that particular event. The percentage shown in this column is calculated with respect to wall clock time and not total database activity.
- 'XIDs' shows the number of distinct transaction IDs sampled in ASH when the session was waiting for that particular event
- For sessions running Parallel Queries, this section will NOT aggregate the PQ slave activity into the session issuing the PQ. Refer to the 'Top Sessions running PQs' section for such statistics.
Sid, Serial# % Activity Event % Event User Program # Samples Active XIDs 136, 2716 12.75 CPU + Wait for CPU 12.16 SH java.exe 227/300 [ 76%] 0 137, 120 12.64 CPU + Wait for CPU 12.32 SH java.exe 230/300 [ 77%] 0 140, 164 12.59 CPU + Wait for CPU 12.00 SH java.exe 224/300 [ 75%] 0 144, 240 12.53 CPU + Wait for CPU 11.94 SH java.exe 223/300 [ 74%] 0 133, 29 12.37 CPU + Wait for CPU 11.62 SH java.exe 217/300 [ 72%] 0
The more I think about this, the more I think that there's probably a Rule Of Thumb here (I'm joking, really, but maybe it's something I'll look for during further tests). Maybe when DB Time is much bigger than the sum of the wait times on various timed events plus CPU time, it's indicative of CPU overload? Then again, maybe it's just easier to check the process states and run queue?
Finally, I couldn't resist checking what ADDM makes of all this, specially for Graham Wood (that's another PDF link). Well, in this case, ADDM is smart enough to work out what's going on.
DETAILED ADDM REPORT FOR TASK 'TASK_46' WITH ID 46 -------------------------------------------------- Analysis Period: 28-AUG-2008 from 23:22:54 to 23:27:56 Database ID/Instance: 1533705496/1 Database/Instance Names: TEST1020/TEST1020 Host Name: DESKTOP Database Version: 10.2.0.4.0 Snapshot Range: from 26 to 27 Database Time: 2006 seconds Average Database Load: 6.6 active sessions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FINDING 1: 100% impact (2006 seconds) ------------------------------------- Host CPU was a bottleneck and the instance was consuming 98% of the host CPU. All wait times will be inflated by wait for CPU. RECOMMENDATION 1: Host Configuration, 100% benefit (2006 seconds) ACTION: Consider adding more CPUs to the host or adding instances serving the database on other hosts. ACTION: Also consider using Oracle Database Resource Manager to prioritize the workload from various consumer groups. ADDITIONAL INFORMATION: Host CPU consumption was 99%. CPU runqueue statistics are not available from the host's OS. This disables ADDM's ability to estimate the impact of this finding. The instance spent significant time on CPU. However, there were no predominant SQL statements responsible for the CPU load.
Tracked: Feb 06, 21:13
Hi Doug -
A few things to comment on in this post, but first a note of appreciation for your diligence and passion for getting the DB Time concept right.
Just so others are clear, when you say DB CPU in this thread you refer to statistic values from V$SYS_TIME_MODEL and DBA_HIST_SYS_TIME_MODEL.
On to the issues. What you are seeing here are the effects of runq on metrics combined with displays that have differing data sources and thus different sensitivities to "the runq problem".
Your first screenshot is from database home page in EM and the Active Sessions widget composes a 3-tier breakdown of DB time as follows: take total DB time from time model, subtract measured CPU from time model, subtract measured User I/O wait from waitclass metrics, allocate the remainder as WAIT time. So here we see accurate DB Time (6.7 active sessions), accurate DB CPU (1.8+ active sessions), and computed WAIT which includes all that runq time.
In the second screenshot (EM Performance Page) we see that total Active Sessions is 2.3 or so made up of 1.8+ active sessions on CPU and approx 0.5 sessions in WAIT. This page is built up from DB CPU and the waitclass metrics and does not proceed top-down from measured DB Time as above. Most of the runq in your cpu-heavy workload is incurred by sessions already on cpu, so that accounts for the lost time. However, the WAIT time in this chart is actually also inflated by runq time, which can be seen by comparing to the third screenshot of Top Activity page.
The Top Activity page is data-sourced primarily from ASH. In ASH the sessions get counted on CPU if they are not known to be on a WAIT event, which can be a big problem when new events go un-instrumented. Here, however, we see in comparison to the Performance Page that sampled WAIT time is much less than measured WAIT time. I believe this reflects that in your test runq inflates measured wait event times much more than sampled ones.
So runq time significantly affects sampled CPU and WAIT times, as well as measured WAIT times, and especially displays that combine these data sources. Proper data is nice to have, but taking proper action much more important. With respect to DB Time tuning, the basic question is whether to tune for CPU or wait, with CPU always trumping WAIT if there is CPU issue. In your case, even though the displays seem inconsistent for all the reasons cited, hopefully a real user would recognize that this is a CPU issue and proceed on that basis (or better, consult ADDM)
That said, we are very actively working to remove these inconsistencies and improve CPU reporting in EM (proper corporate disclaimers apply to this statement of course)
That explains things more or less as I understood them, but your explanation is much clearer so thanks for that.
When I was writing this post, I didn't really think of these discrepancies as a big problem, just something that could cause confusion. As you know, in most cases the Performance and Top Activity pages can look so similar that it's easy to expect that they're equivalent.
In fact, I peversely like the discrepancy because, along with the load profile, it's a useful indicator and your explanation of the numbers behind the pictures adds to that.
I also meant to say ...
In your case, even though the displays seem inconsistent for all the reasons cited, hopefully a real user would recognize that this is a CPU issue and proceed on that basis (or better, consult ADDM)
Agreed. I wouldn't want an obsession with the detailed data to obscure the overall strength of time-based tuning.
Interesting stuff. John, I need to reread your comment before commenting on that.
I am reading this as DB CPU providing the user session Service time of the CPU (time actually spent on CPU).
CPU+Wait for CPU is the Residence time at the CPU ( time on CPU plus wait time). But where does the Wait for CPU time come from? I did some (non-extensive) searching in the data dictionary but couldn't find anything.
Avg. Active Sessions is the average queue length (sessions waiting and currently on CPU)
All of this ignores CPU used by background sessions such as DBWR and LGWR, so I would expect DB CPU to always be less than total CPU available. Isn't there a background CPU stat? I wonder how this matches up with v$sess_time_model for DB CPU for the background processes?
But where does the Wait for CPU time come from?
I think the answer to that one is in JB's comment here :-
In ASH the sessions get counted on CPU if they are not known to be on a WAIT event, which can be a big problem when new events go un-instrumented.
Although I'm sure he'll correct me if I'm wrong.
As for the background CPU, that is in the time model stats (see the earlier DB Time post). One of the additions I almost made to this post was to show the background data during the test period, but I thought it was getting long enough already!
Yes, DB CPU is exactly user (foreground) session service time at CPU resource. This is measured by the process itself and does not include time waiting for CPU.
Avg Active Sessions is average queue length at the database when considered as a black-box queuing center. This is not avg queue length waiting for CPU service, but rather avg queue length waiting for database services (which includes CPU, I/O, enqueues, etc.) So you could see nontrivial AAS even though CPU utilization is low, e.g. in a very inefficient application with locking issues.
Wait for CPU is what I referred to as runq and this quantity in particular has (so far) defied accurate measurement in platform-independent way.
As Doug points out, background process CPU is also measured and recorded in time model statistics. However for application tuning and other purposes, we (generally) ignore background time and resource consumption as they do not (generally) impact foreground response time.
Am I correct in that 'Wait for CPU' is determined by assuming that runq is the major source of uninstrumented waits?
I was also wondering if DB Time is calculated positively or negatively. (that's a clear question, isn't it?). In Doug's earlier blog he used a definition of DB Time stating that it is
Time spent in the database by foreground sessions
Includes CPU time, IO time and wait time
Excludes idle wait time
So I could see DB Time per session being calculated either by summing all CPU, I/O, and wait times (excluding idle waits) or by subtracting idle waits from the session's elapsed time.
Your above explanation which concludes that computed WAIT (which) includes all that runq time implies that DB Time is Elapsed time - idle waits. This would include the uninstrumented waits. Is that correct?
Hi Henry -
Uninstrumented waits are bugs, but unfortunately bugs do happen. So there is really no intended connection between runq and uninstrumented waits.
I believe the ASH report that says "CPU + Wait for CPU" is simply acknowledging that we don't know (from ASH) whether the session was on CPU or in the runq. Sessions in unistrumented waits will also show up in this category, but again those are bugs.
My explanation about computed WAIT was specifically about the DB home page widget in EM and how what it displays is computed.
Time Model DB Time is measured, not calculated. The mechanism is new (10g) instrumentation that begins timing at the entry point of a database call and stops when the call exits. So what is measured is exactly "session in-call time" which is the definition of DB time (when limited to foreground sessions.)
Graham points out that my definition above neglects to subtract idle-wait time from the "session in-call time", so for instance waiting on "dbms pipe get" is subtracted from session in-call time to arrive at DB time for the session.
I am not sure how Oracle measure DB CPU.
We have a 6 nodes RAC running, the other day (Monday 21st Sep) we turned 3 nodes off and we compared DB CPU for that day and the previous Monday we noticed that DB CPU was almost doubled! One of node were thrasing (90% to 100% CPU Usage all the time).
So can thrashing affect how Oracle measure DB CPU and shows wrong numbers?
I've read your comment a few times now and still find myself a little confused.
1) How are you measuring DB CPU?
2) What period of time did you measure it over?
3) Are you measuring across all nodes in the cluster, or just a specific instance?
4) What number did you expect to see? Maybe I'm misunderstanding, but I wouldn't find it surprising if I switched off half the nodes in a cluster, the workload was the same and resource usage on a specific instance doubled.
Sorry not explaining too well.
We measure the DB CPU using DB CPU from DBA_HIST_SYS_TIME_MODEL.
This is from a single instance point of view:
Normally DB CPU in a normal day uses around 21% of total CPU but when half of the nodes were turned off the CPU usage of one of the nodes jumped from 21% to 60%. Since that node was so loaded I was wondering if Oracle can report wrong DB CPU statistic if the server is very loaded (for example does it count CPU when it is in the runqueue?)
We were expecting a 45% CPU usage maximum but it jumped to 60%
From all instances point of view the total DB CPU increased from 16% to 25% which doesnt seem right because the business workload did not increase. We expected same 16%.
Apologies - limited internet access at the moment so difficult to get time to reply, but I did notice this thread that might be of interest.
Arggh! Sorry, David, totally lost track of this one in the midst of general busy-ness later. Posting a comment to remind myself ...
hello , after reading your thread, i do a test .
in four parallel session run pl/sql like :
this block do not generate any other wait but on cpu.
as test in 10 minutes:
i got the statistics in awr:
Elapsed: 9.85 (mins)
DB Time: 38.71 (mins)
sql execute elapsed time 2,322.81 100.01
PL/SQL execution elapsed time 2,305.12 99.25
DB CPU 1,155.34 49.74
and ash statistics:
Elapsed Time: 10.0 (mins)
Sample Count: 2,412
Average Active Sessions: 4.02
Avg. Active Session per CPU: 2.01
the actually db cpu has no problem. but the db time do not equal db cpu time(because of no other wait event), the db time equal sql execution time. I assume the (sql execution time)=(sql cpu+cpu on queue+wait event).
So as a consequence, (db time)=(db cpu+cpu on queue+wait event) but !=(db cpu+wait event).
I've read this comment a couple of times and I think I know where you're coming from, but am not sure.
Does it help any to say that DB CPU is measured CPU usage but that it won't include time spent in the run queue?
Or does this earlier comment from JB cover it?
"Yes, DB CPU is exactly user (foreground) session service time at CPU resource. This is measured by the process itself and does not include time waiting for CPU."
Could anyone please clarify this? According to the information provided, there are 4.02 Average Active Sessions. I assume that Average Active Sessions per CPU is Average Active Sessions divided by number of CPU's in the machine. So, in this case, there must be 2 CPU's on the machine running this test. Based on this assumption, how can DB Time be more than two times the duration of the test? Should DB Time not be a portion of the total available CPU time for a time slice? So, in this case, the maximum DB Time should be 20 minutes. As expected, DB CPU is just under 20 minutes which does make good sense.
So, now, I am very puzzled by what actually goes into DB Time. Could it be that DB Time is really the total time spent by all processes trying to execute something even though there is not enough CPU to go around to service these requests?
I assume that Average Active Sessions per CPU is Average Active Sessions divided by number of CPU's in the machine.
Should DB Time not be a portion of the total available CPU time for a time slice?
No. DB *CPU* should be a portion of the total available CPU time. DB Time is a completely different thing which also includes wait time.
So, in this case, the maximum DB Time should be 20 minutes.
No, I can make DB Time as high as I want to make it by just connecting more and more sessions to the database and making them work. The underlying principle here is that systems have an infinite capacity for waiting and DB Time measures all of the CPU and Wait Time for all active foreground sessions.
Does that make things any clearer?
You can see more about DB Time in this presentation
Yes, this is excellent thank you for the quick reply. So, based on this, why is DB CPU reported as a percentage of DB Time in AWR reports?
People involved in writing the reports can answer that question best, but I think it's useful to know how much of the DB Time is being spent running on CPU and how much is spent waiting on something.
If almost all of your DB Time is DB CPU Time, then Waits aren't your problem and there's not much you can do to make things quicker except tune your application.
But there are many numbers in an AWR report many of which won't be useful very often! LOL
This sounds like 2 CPU machine with hyperthreading enabled. DB CPU from time model only measures actual CPU consumed whereas ASH shows "ON CPU" when process is really in run-queue, so time model (DB CPU) shows about 2 active sessions of CPU and ASH shows 4 (2 real CPU and 2 run queue "CPU")
How about this one?
As shown below, DB CPU is 9,125 seconds...
Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 114,952.5 99.7
DB CPU 9,125.4 7.9
PL/SQL execution elapsed time 5,052.2 4.4
RMAN cpu time (backup/restore) 539.3 .5
parse time elapsed 167.3 .1
connection management call elapsed time 33.0 .0
hard parse elapsed time 26.0 .0
hard parse (sharing criteria) elapsed time 9.7 .0
inbound PL/SQL rpc elapsed time 2.8 .0
sequence load elapsed time 0.8 .0
PL/SQL compilation elapsed time 0.6 .0
repeated bind elapsed time 0.5 .0
hard parse (bind mismatch) elapsed time 0.2 .0
failed parse elapsed time 0.0 .0
DB time 115,328.1 N/A
background elapsed time 8,760.2 N/A
background cpu time 745.1 N/A
...on an eight processor box over a 30 minute AWR window...
Operating System Statistics DB/Inst: WCPROD/wcprod1 Snaps: 51218-51219
How do I have 9,100 seconds of DB CPU and 10,347 seconds of IDLE_TIME in a 14,400 second (eight processors * 30 minutes * 60 seconds)? I also confirmed with sar on the host that we were about 20% utilized during this window (about 55% including I/O wait time). DB CPU would lead you to believe that the host should have been at *least* 62% bust (9125 / 14400 seconds available)
Unless I am missing something major, me thinks this statistic is not very reliable
Here is another example where the discrepancy is even more dramatic.
First, we show our snap time is for a half, as well as the snap_id's and the time frame...
WCPROD 1509354295 wcperf2 2 10.2.0.4.0 YES oh1xcwcdb02
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 51413 30-Jan-11 23:30:08 135 20.3
End Snap: 51414 31-Jan-11 00:00:10 123 20.3
Elapsed: 30.03 (mins)
DB Time: 2,492.11 (mins)
...then we show the amount of DB CPU recorded by Oracle...
Time Model Statistics DB/Inst: WCPROD/wcperf2 Snaps: 51413-51414
-> Total time in database user-calls (DB Time): 149526.8s
-> Statistics including the word "background" measure background process
time, and so do not contribute to the DB time statistic
-> Ordered by % or DB time desc, Statistic name
Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 149,513.1 100.0
DB CPU 6,718.3 4.5
PL/SQL execution elapsed time 16.4 .0
parse time elapsed 3.4 .0
hard parse elapsed time 1.7 .0
hard parse (sharing criteria) elapsed time 1.6 .0
sequence load elapsed time 0.3 .0
connection management call elapsed time 0.2 .0
PL/SQL compilation elapsed time 0.1 .0
failed parse elapsed time 0.0 .0
repeated bind elapsed time 0.0 .0
DB time 149,526.8 N/A
background elapsed time 270.5 N/A
background cpu time 223.7 N/A
...and finally, the OS statistics gathered by Oracle...
Operating System Statistics DB/Inst: WCPROD/wcperf2 Snaps: 51413-51414
...and the sar output for the same time period...
oh1xcwcdb02:oracle:wcperf2:/u01/app/oracle/product/10.2.0/rdbms/admin>sar -f /var/log/sa/sa30 -u -s 23:30:00 -e 23:59:59
Linux 18.104.22.168-0.66.1-smp (oh1xcwcdb02) 01/30/2011
11:30:01 PM CPU %user %nice %system %iowait %steal %idle
11:31:01 PM all 6.55 0.00 1.65 0.86 0.00 90.94
11:32:01 PM all 4.80 0.00 0.61 0.32 0.00 94.27
11:33:01 PM all 4.92 0.00 0.59 0.78 0.00 93.71
11:34:01 PM all 4.87 0.00 0.57 0.03 0.00 94.53
11:35:01 PM all 4.74 0.00 0.62 0.34 0.00 94.29
11:36:01 PM all 5.60 0.00 1.34 0.05 0.00 93.02
11:37:01 PM all 4.57 0.00 0.57 0.16 0.00 94.70
11:38:01 PM all 4.55 0.00 0.69 0.09 0.00 94.67
11:39:01 PM all 4.59 0.00 0.55 0.11 0.00 94.75
11:40:01 PM all 4.86 0.00 0.68 0.10 0.00 94.36
11:41:01 PM all 6.18 0.00 1.63 0.09 0.00 92.09
11:42:01 PM all 5.32 0.00 0.80 0.35 0.00 93.54
11:43:01 PM all 4.76 0.00 0.59 0.06 0.00 94.59
11:44:01 PM all 4.82 0.00 0.67 0.04 0.00 94.47
11:45:01 PM all 4.87 0.00 0.66 0.06 0.00 94.41
11:46:01 PM all 5.76 0.00 1.55 0.34 0.00 92.35
11:47:01 PM all 4.72 0.00 0.69 0.24 0.00 94.34
11:48:01 PM all 5.34 0.00 0.72 1.16 0.00 92.78
11:49:01 PM all 4.97 0.00 0.69 0.52 0.00 93.81
11:50:01 PM all 5.02 0.00 0.71 0.04 0.00 94.23
11:51:01 PM all 6.23 0.00 1.64 0.09 0.00 92.05
11:52:01 PM all 5.19 0.00 0.71 0.46 0.00 93.65
11:53:01 PM all 4.86 0.00 0.68 0.56 0.00 93.89
11:54:01 PM all 4.95 0.00 0.73 0.17 0.00 94.15
11:55:01 PM all 4.80 0.00 0.62 0.06 0.00 94.51
11:56:01 PM all 5.82 0.00 1.51 0.04 0.00 92.62
11:57:01 PM all 4.92 0.00 0.67 0.17 0.00 94.23
11:58:01 PM all 4.58 0.00 0.61 0.08 0.00 94.73
11:59:01 PM all 4.77 0.00 0.77 0.07 0.00 94.39
Average: all 5.10 0.00 0.84 0.26 0.00 93.80
If you do the math, the DB CPU recorded suggests the host CPU should have been at least 46% utilized for the period between 11:30PM and 12AM on January 30th. However, as both sar and the Oracle gathered OS stats show, it isn't even close to that.
This is SLES 10 with Oracle 10.2.0.4 RAC. My assumption is DB CPU is a simple getrusage() call at certain points, but this difference is beyond anything I can explain.
The impact to us is that we are trying to consolidate databases in a cluster, and would like to be able to identify the amount of CPU each uses. "DB CPU" sounded perfect, and 99% of the time is very close. However, when it is off, it is *really* off
They're very interesting examples of weird numbers. Don't suppose you've opened an SR to drill down into the detail? No, I bet I couldn't be bothered either
Worth knowing to be suspicious of DB CPU though.