"... I can see that once you start to define workload, or transactions, in business terms (I need to get all these things done, what works best overall?) then workload response time* does make sense, both as a metric, and more pertinently as a tuning target. I think, and have done for a while, that for much of the last 10 years the tuning dialog has been around query or session optimization with either an explicit or implicit assumption that optimization of particular queries will result in workload optimisation from a business perspective. In many cases this is true, I suspect though that for many cases it isn't. Response time will still be the core metric, but perhaps some of the tuning methods will vary. It may be as simple as redefining stage 1 of the tuning process from
'define the task of interest to the business'
'define the workload of interest to the business'"
I agree with Niall, particularly with regard to the current emphasis on optimising individual sessions or queries, rather than the overall workload.
I'll state this another way. Maybe ...
1) the fastest individual session response time
... isn't always the most important optimisation goal if you're trying to achieve ...
2) the highest workload throughput?
(That's why I have a few slides early on in the course I wrote for Oracle that define a few ways of looking at system performance, including response time, throughput and scalability.)
I'll try to illustrate these two different tuning goals using Dominic Giles' Swingbench utility. I executed the Sales Order Entry (SOE) benchmark with different numbers of concurrent users for a period of 3 minutes. The Oracle instance, hardware configuration (dual-core laptop with the usual lousy hard drive) and application code were identical for all of the tests. The only significant variable was the number of concurrent sessions. Here are the results (which Swingbench helpfully displays in the Output tab after each test run).
|Concurrent Sessions||Avg. Response Time (ms)||Transactions Completed|
Without question, the average response time is best when only one user is running the test, and gets worse steadily as the number of active sessions grows. However, having only one active session also delivers the worst number of completed transactions during the test. If we want to process as many transactions as possible in those 3 minutes, then it looks like we should have around 28 concurrent execution streams.
Do we care that the average transaction completion time has increased to 587 milliseconds? Well, if the transaction is part of a user interaction, then the answer is undoubtedly yes. I'm sure your users like the fastest possible response and if we have 28 active sessions, then an individual user is going to experience performance 5 times slower than if they were the only user. Oh, and that's just the average response time - the maximum will be much worse!
If, however, this is all part of an overnight batch process and the tuning goal is to get through all of the workload as quickly as possible, then the number of transactions completed is our focus and who cares if each transaction takes a little longer? In this case, it's better to load the system more heavily and increase the response time of individual transactions. Scheduling the workload such that the transaction response time increases might seem counter-intuitive. For example, if I were to trace one of the sessions at random during each of the test runs and I analysed the trace files, wouldn't 'performance' look much 'better' in the first test than the last, because I'm only focussing on one session?
Is performance better in the first test than the last? It all depends on what you mean by performance.
Now, let me address what I might appear to be implying here, that tracing and tuning individual sessions is flawed. In fact, all of these tests would have yielded even more impressive results if I had tuned the SOE application at the session level *first* so that each transaction uses less resources and completes more quickly. That's a good thing. I do understand that analysis and tuning at the session scope is vital but, if each individual session's work has been optimised already, there are other aspects we need to consider and looking at a single session trace will not give me all of the information I need. (Gosh, I'd better be careful because I'm getting close to suggesting that Statspack and AWR might be useful tools for some situations after all!)
However, when looking at this useful improvement in throughput on my laptop by increasing the load during a basic test, be very careful.
- Whilst this particular 'batch process' processed more work in the same period of time with more concurrent sessions, it also slowed response time for *everything else* on my laptop (server). That's why it's not sensible to mix batch and interactive workloads.
- As the number of sessions increases past a certain point, the number of completed transactions goes *down* as the system starts to suffer contention so we get further away from our goal whilst using more resources. Achieving the correct balance is the key.
- Looking at the results again, is the best balance to run 28 concurrent sessions when we can deliver almost the same throughput from 16? Well, I'd plump for 16 for this job on this configuration every time, if it meets the system requirements. Squeezing every last ounce of performance from a system is likely to lead to a serious performance problem one day because we're reducing the breathing room to cope with unplanned events.
All I'm really saying in this post is that performance analysis, response time tuning, workload management (call it what you will) can't and shouldn't be limited to looking at the performance of an individual session in isolation.
* Is it just me, or does 'Workload Response Time' sound weird? It's the same idea as session response time - how quickly do we get from the beginning to the end - but it seems different to me. That might be my lack of understanding, reading, or rigour, but I find the idea of a system, workload or multiple users having an aggregated response time a strange idea to get my head around. I understand start - do lots of work - stop. But is the time that that takes a response time or throughput? Have I just got too used to Response Time being directly related to a user sitting, waiting for a response?
Metric Baselines were designed to be easy to implement. There are only two options :-1) Pick how much recent activity you want to use and let Oracle recompute the statistics based on the most recent metric values over time (Moving Window). 2) Pick
Tracked: May 07, 18:39
Tracked: Jul 07, 04:28
Tracked: Oct 03, 15:27
Well, that was a nice start to the day! Someone came up to me just before Wolfgang Breitling's presentation to point out that something from my presentation had helped him fix a problem at work last night. It was the OEM Raw Data drill-down that shows you
Tracked: Mar 10, 17:43
Tracked: Mar 16, 00:23
In one of my Hotsos Symposium 2010 posts I mentioned that Peter Stadler had plugged some test results from an earlier blog post into Neil Gunther's Universal Scalability Law to see how well the model applied. Peter's posted his slides now and I've added t
Tracked: Mar 28, 19:35
1) When you measure time to run a business process, and there is going to be a human waiting for the results, you want to tune the 90 percentile response time. Because usually they start complaining when around 10% of the transactions take too long. Convincing them that everything is fine on average doesn't normally have the desired effect. I'd love to see the "90th percentile response time" in this benchmark. I suspect that variance gets quite bad very early in the game, and that averages don't tell you the whole story. Less loaded system is not just a faster one, it is also a more predictable one.
2)There is also the "refresh effect". Any click that takes over 10 seconds to respond with cause a user to refresh his browser window. If your application transmits this to the database as a second request - you are pretty much doomed. Any temporary slowdown turns into something you cannot recover from. Queue theory nightmare - the longer the wait time, the higher the arrival rate.
3) Keeping the workload at 16 vs cranking it up to 24 will leave some capacity to handle unexpected requests, which are statistically quite expected.
Chen's right. The last serious round of user-acceptance performance testing/tuning I ran was premised on being able to perform X amount of a specified mix of user transactions with 90% of transactions in less than T seconds (where T depended on the criticality and to some extent the difficulty of the function - so wild card queries were allowed to take much longer than moving to the next screen in a workflow).
(Actually, the - government - customer initially wanted 100% of transactions to have response time below target, but we managed to educate them a little on queueing theory.)
Of course we tuned to the sweet spot where (business) transaction throughput is maximised within the response time constraint.
Yes, I realise there was a lot more I could have said about response time and the average wasn't meant to represent the whole picture. I could also have said that there was no random think time programmed into the test or anything useful like that.
But I think there's been a *lot* said about response time in recent years and rather less about batch throughput.
I suppose I was hoping to just focus on the fact that the lowest response time is not the only metric.
The 90 percentile comment is an interesting one, especially given that most of the smart places that I've seen express their requirements in the 80/20 split (we want 80% of our transactions to behave acceptably). I'd be interested to know if your 90% is empirical or (like the 80/20) a 'best guess' - not meant to be a criticism Chen, it sounds like a highly plausible hypothesis, and a potentially quite interesting explanation for the large number of sites where users are unhappy with response times. despite the metrics showing things as fine. If the SLA doesn't reflect reality and this can be evidenced then likely the SLA can be changed.
My premise on variance was derived from this issue. I needed to improve performance in a data warehouse, but my questions about 'Which processes were most critical?' or 'Which processes are the performance issues?' met with blank stares. The only target metric anyone could verbalize was that batch processing needed to complete by 2:00 am. Digging deeper into the nights that failed to meet the target led me to a handful of processes that performed inconsistently, e.g. processes with high levels of variance.
Most of the problems I'm asked to solve are throughput issues. (all processing must complete in X hours) Investigation usually leads to finding the individual processes with the response time issue. Once the individual response time issue is improved, the overall throughput also improves. If the overall throughput is satisfactory, the fact that individual process response time is slightly slower tends to go unnoticed. (but this last corollary doesn't apply to actively watched processes.)
The two targets are interwoven, and ignoring one to optimize the other would be silly. And I like the comment about AWR and StatsPack - both tools are geared toward system performance as a whole and they are best used to monitor systems that have already been optimized at the process level. Any real improvement I've ever made in a system has come from improving individual response times within the workload. Fortunately we don't have to chose one just one tool or one approach.
you want to tune the 90 percentile response time. Because usually they start complaining when around 10% of the transactions take too long.
Whilst I know that that's what's commonly used, why be so fixed on 90? I've worked on systems where people wouldn't accept one in 10 responses being too long! It also depends on how much longer we're talking about.
But I know what you mean and 11g DB/Grid Control has some really useful additions in this area that I'll blog about soon.
I just worry about what seem like certain statements that I'm not sure apply in all situations.
I'd love to see the "90th percentile response time" in this benchmark.
Sadly, it's not available from Swingbench, but the coming blog post will use 11g DB Control, which is better at showing that.
I suspect that variance gets quite bad very early in the game, and that averages don't tell you the whole story. Less loaded system is not just a faster one, it is also a more predictable one.
Yes, of course and quoting the average response time made things look better than they really were. In the context of the blog, though, when I was trying to illustrate that bad response time *might* be good, I would have thought I was erring on the side of caution and doing the response time side of the argument a favour
That's the thing, I was trying to write a blog to say - 'look - here is a case where lousy response time = best results' and everyone just wants to talk about response time all over again! LOL
But, but, but ....
Didn't I just show that response time could be awful, maybe for as many as 15, 20 or actually 100% of the transactions, but that the business objective was met more effectively, because I'm talking about batch throughput, without any users waiting for a response?
Maybe I made a mistake here by using a transaction that was originally designed as a user interaction and then putting it into a 'batch schedule' but I was really just trying to indicate that loading a system so that response time increases might be a good thing.
I don't know how much you've looked at 11g DB/GC, but if not much, you're in for a treat. It's Variance City!
Investigation usually leads to finding the individual processes with the response time issue. Once the individual response time issue is improved, the overall throughput also improves.
Mmm, yes, that's often the case (and I did say 'all of these tests would have yielded even more impressive results if I had tuned the SOE application at the session level *first*'), but it's not what I've illustrated here. All sessions are running precisely the same transactions. None of them had their performance improved by tuning. Which were the 'problem' transactions was completely variable, by which I mean they were all equally bad at different times, because the system was hopelessly overloaded.
... and yet, the results from a business perspective got better. Didn't they?
Any real improvement I've ever made in a system has come from improving individual response times within the workload.
I almost took issue with that statement, but it depends on what you mean by 'system'. If you mean application then I probably agree but, if you mean system in the widest sense, including the workload, the performance of the infrastructure etc., then didn't I just 'improve' the system in a very real way without improving the response times? I made them *worse* and I didn't tune the application one iota.
The thing is, I know as well as the next person that all the real gains tend to come from application tuning and individual session response time tuning, but I think it's a pretty limited view of performance that I was trying to challenge a little bit.
I'm not sure I did a very good job, though
You know, having read back the comments, maybe if I said *Individual Session* or *Individual Transaction* response time isn't the only target, it might make it clearer.
Because in the example I was using to illustrate it, a higher number of transactions completed in the three minutes implies a reduced workload response time (the time taken for the entire workload to complete).
Think ALL_ROWS hint - get me to the end as efficiently as possible - because I need the entire workload to complete as quickly as possible and don't need an interactive response.
Your comment about AWR and statspack makes the use of these tools sound like guilty pleasures . At the risk of falling the wrong side of the extended tracing brigade, I've used db time on a project which involved the bench marking of batch processes and I found it really useful. Because of the shear number of things that could be tuned and monitored across the software stack (which included a J2EE app server), db time worked well as a high level metric and stopped us from drowning in statistics. As you are probably acutely aware, you tune one thing and the bottleneck usually moves somewhere else. The batch tuning work I did was a bit like pushing down on one end of a see-saw, i.e. if the load on the database was reduced it would increase on the app server and vica versa. Using db time made this effect easy to see.
I wondered how long it would be before DB Time put in an appearance
I, like you, have had nightmares trying to tune batch throughput using a single session view. It's not impossible, and Robyn highlighted what sounded like just the right approach, but it gets more difficult when you've tuned your application to death and there's still variable SAN performance or some other mystical factor (that is meant to be a joke - there should be no mysteries) that screws up your carefully laid plans.
As for Statspack and AWR - don't start me! It hurts my head that apparently they shouldn't help me solve performance problems but they still do. Every time I solve a problem and haven't traced a session, it makes me doubt my sanity and I know I must be doing something wrong.
Good stuff Doug. A few thoughts:
The retrograde behavior of your throughput swingbench tests is fairly typical. It comes from coherency issues (as opposed to contention). Neil Gunther does a good job explaining this.
I think the most important point you make here is the need to ask the right question before just going off and gathering data. Do I care more about throughput or response times? As far as Statspack and AWR, that also comes down to the question. if I have a mixed workload and one squeaky user, the averaging of AWR won't be of much help. On the other hand, with your homogeneous swingbench test (or some load testing I have worked on), AWR is great, and makes things much easier (the average of 5,5,5,5,5 is 5).
Robyn mentioned that
"any real improvement I've ever made in a system has come from improving individual response times within the workload."
I have used AWR together with some queueing theory to show pretty quickly whether or not the database is actually the system bottleneck. This has allowed me to save time by not tuning at all when the bulk of the time was outside the database (most of this was done just using DB Time, Elapsed Time, txn/sec from AWR reports)
Thanks for the blog Doug.
I don't have any special attachment to 90% specifically.
However, we do agree that it will be impossible to tune for 100% of the queries, and that tuning for 50% is probably not good enough.
It'll be nice to do an empirical study of when users get frustrated
And I'm sorry for derailing the discussion away from throughput...
(BTW. Did you see that the guys in Pythian say you are old?)
Interesting blog post and discussion. I'd like to raise a couple of issues from the original post.
Firstly there is the issue of the average response time increasing from 79ms to 587ms (not metres/seconds Doug ). Frankly, an end use is not likely to notice a 0.5sec increase in response time of a transaction. The actual response time seen by the user will, of course, be longer; there are likely to be multiple networks and probably middle tiers involved. I do agree that in the fully loaded system there will be a much higher variance in the response time and users will notice increased response times once it gets beyond a couple of seconds for outlier transactions that are at or near the max transaction time. I have in fact been involved with systems were data was actually delayed in being returned to the users so that a more consistent user response time was perceived i.e. slow down the fast stuff, rather than speeding up the slow stuff. But again we were talking of providing response times of a small ( less than 2) seconds.
Secondly there are the results for the throughput of the system are not consistent. Why would throughput drop from 16 users to 20 users to 24 users but then rise quite dramatically at 28 users before dropping again? It seems that it is likely that there were other factors involved here. Typically throughput will start to drop when we are approaching the limit of a resource, disk, CPU, software etc. This is just simple queuing theory. Were these results from a single run of each concurrent user count? I would not expect the surge in throughput at 28 users to reproduce.
(BTW. Did you see that the guys in Pythian say you are old?)
I know! Pesky kids!
However, we do agree that it will be impossible to tune for 100% of the queries, and that tuning for 50% is probably not good enough.
Yes, we agree about that.
I'm sorry for derailing the discussion away from throughput...
Don't be. Any frustration on my part might include an aversion to anything that sounds like dogma but also contains a large chunk of frustration with myself because I obviously didn't communicate what I was trying to very well.
Do I care more about throughput or response times?
That's what I was trying to show. That there is more than one tuning goal - individual session response time - and that they might conflict with each other.
with your homogeneous swingbench test
That's a very important point. I'd hoped that nobody thought that this looked like any realistic system but it's probably to other peoples credit that most of the comments (including one or two emails) started by trying to apply my comments to their own real world systems and pointed out that this was all a bit, erm, artificial. It was meant to be, but I didn't realise how much that would undermine what I was trying to say. I think it's safe to say I was wrong to over-simplify things, too, but that was a deliberate approach (albeit an ineffective one! LOL)
It was precisely *because* the workload was so limited, repeated across all sessions that all started and ran for the same length of time that I thought it acceptable to use high level averages.
Then again, the nature of the workload also means it would have been safer to look at one of the sessions individually, too!
This has allowed me to save time by not tuning at all when the bulk of the time was outside the database (most of this was done just using DB Time, Elapsed Time, txn/sec from AWR reports)
Very interesting point because it reflects my personal experience - that DB Time is often an extremely effective measure of when the database workload is *not* the problem. Funny how often that is the case these days
Oh, and I noticed you're on the Hotsos agenda this year. Good luck with that and I'll be interested to hear how it goes.
Graham - your less than sign cut off your comment initially, which confused me for a moment. It's fixed now.
(not metres/seconds Doug )
Don't Oracle guys use metres/sec, then? You don't know what you're missing - better than that DB Time nonsense.
Good catch and fixed.
Frankly, an end use is not likely to notice a 0.5sec increase in response time of a transaction.
Oh, ok, it won't matter to a user, but if I'd picked a different transaction with a higher single user response time then that 0.5 sec would, in turn, have been much higher.
there will be a much higher variance
There was. My original table had the maxium transaction response time, too, but I took it out because I thought it would add more detail that would detract from what I was trying to say. Probably a mistake, in retrospect, because it would have shown the increasingly erratic behaviour of the system.
It seems that it is likely that there were other factors involved here.
I don't doubt that there were.
This is running on a laptop where, with all my best efforts to hold my breath, close other programs, not touch the thing, avoid AWR snapshots, etc, etc ... there are no guarantees that some background service or other wasn't a factor.
But that's why ...
Were these results from a single run of each concurrent user count?
They weren't based on a single run. I tend to be quite paranoid about these things before showing numbers and probably perform even more runs than necessary, just to build my own level of confidence.
Now, whilst not every run showed the same surge at the same workload levels, if there was a large discrepancy, I repeated runs to check that.
In the end, I decided that
a) At those higher workload levels, the laptop is, frankly, on it's *rse. (I'm sure I could put it more delicately, but think you'll appreciate the description.) That's going to cause such variable madness, I decided it was a lost cause to try to get consistent looking figures.
b) I thought I had illustrated clearly enough that increased transaction response times are the consequence of the increased workload but that *might be what we want*! Sorry to sound like a stuck record, but I think it's necessary in a world where people think the only tuning is single session response time tuning.
I should know better than to try to simplify any performance testing discussion too much, though, but it was worth it for the discussion in the comments.
Nice post. I think you've done a great job of putting into a few simple sentences quite a complex issue. I'm sure I would have written twice as much and bee half as clear.
I went to the the last UKOUG Oracle Rac and HA SIG, the one in Slough. At the beginning of the session, a chap, can't remember his name, conducted a straw poll on what people were using, operating systems, storage arrays and middleware. This confirmed something that I have long suspected, that most people, which was indeed the case from the show of hands, have some sort of application server sitting above the database. Best performance practice is to use some form of connection pooling when connecting to the database, therefore the relationship between a session in the application server and a session in the database can become very fluid. In the good old days when all the business logic resided in the database, one Oracle session would be used for the end to end process. With an application server this will not necessarily be the case. Therefore, tracing a single session will not necessarily lead to individual trace files associated with an end to end process. Some people might argue that you can use trsess to aggregate trace files and that instrumentation in the application should aid associating specific SQL statements with what the application is doing. However, how may applications are well architected and designed let alone decently instrumented ?. Dan Fink covers this sort of stuff of in his presentation at the UKOUG conference, DBAs having to deal with performance issues, because of poor decisions the business, designers, coders etc have made. I'm not saying that extended tracing does not have its place or value, but I'm alluding to the old adage of "different horses for different courses". Also, as we now live in day and age of composite applications, i.e. applications linked together using middleware, tuning can become an exercise akin to measuring throughput and resource usage flowing around a system of pipes. When doing this one of the most valuable tools is a one stop shop metric for measuring database utilisation. In "Why You Didn't See The Performance Problems" by Cary Millsap, he mentions application profiles, i.e. measuring where the time is going in individual components of the application / software stack. Despite the fact that it is well documented that db time is not equal to wall clock time, db time does help in these situations. I've got a presentation on slideshare.net: http://www.slideshare.net/chris1adkin/j2ee-batch-processing-presentation which covers the use of db time for this sort of work including some graphs. There is some stuff on the beginning around J2EE and batch process frameworks that will be of little interest to you, but the graphs and process of shifting the load between the app server and database might be.
Comes back to those lovely discussions you see SysAdmins having with users:
User: My screen is slow...
SysAdmin: CPU is at 63% which is within targets...Thanks for your call
Still seeing *lots* of that.
In fact, we're going through something a bit like that at the moment ...
I just had a look at the slides and they were very interesting. In fact, they all were as I'm interested in performance from the wider perspective, not just Oracle. As you've identified, that's essential in a modern-day environment.
I know only too well the difficulty in tuning systems that use app server connection pooling. Yes, if the code was instrumented, then DBMS_MONITOR/trcsess are terrific but that doesn't apply to most of the apps I have to look at.
But, as you say ...
I'm not saying that extended tracing does not have its place or value, but I'm alluding to the old adage of "different horses for different courses
For example, it's a very effective way of identifying 'chatty' applications as I blogged about here a while ago.
Thanks for the link to the slides - I like seeing other's tuning efforts.
Peter Stadler used the results in this post to investigate how well Gunther's Universal Scalability Law applies to them. He's posted his Hotsos Symposium 2010 slides here. Definitely worth a look.
Thanks to Peter for sharing.
how did you measure avg response time? It seems not o agree with throughput: 4203
trx/3mins --> 23.4 trx/s --> 42.8 ms/trx. I see an average delta between of about 60 ms between calculated and measured response time, with 2 peaks at 16 and 32 concurrent sessions. Do you have also CPU and disk busy percentage at each run? Just curious how these data fits to Gunther USLaw
I have had a number of follow-up comments asking for more detailed metrics from this test but the test, the hardware and the results have long since disappeared into the ether so I can't help you I'm afraid.
However, I do know that the results were both accurate and illustrated my main point, which was that response time is not the only way to measure performance.
How this fits into Gunters USLaw I couldn't say, but as it's just data and not a model, maybe it doesn't count for much anyway
nice clear discussion as usual, plus fun comments:
"I have in fact been involved with systems were data was actually delayed in being returned to the users so that a more consistent user response time was perceived i.e. slow down the fast stuff,"
"User: My screen is slow...
SysAdmin: CPU is at 63% which is within targets...Thanks for your call"