Jul 29: Oracle Workload Metrics
Someone, usually a manager, wants to see the workload history for each instance in the company. There are several reasons why this might be useful. Here are just a few off the top of my head.
- To identify workload peaks and troughs so that you have a good feel for what is normal or not.
- To identify workload that's increasing steadily and might require a future hardware upgrade.
- To help define service level agreements so that, if the workload rises above a certain threshold, all bets are off.
- Because it makes a pretty picture when you graph it in Excel.
- What do you mean by 'workload'?
- How much detail are you looking for?
- Over what time intervals?
Why do you want this information and what is the question you're trying to answer? Most times the purpose of the information will exclude many possible approaches. For example, you might only want to measure the workload of a specific application, which would defeat instance-wide approaches.
You might even discover that the question being asked doesn't make sense (at least to me or you).
The most recent case was one of those. The requirement was for a single number per day for an entire database instance. Think about that. A single number to describe a day's workload? What does that really tell you? Even more worrying, the information about which number might be most useful was fuzzy.
In the end, after all the warnings have been issued (and I never hesitate in that role), the business gets what the business wants. So here are a few examples of statistics I've used in the past. But, as you're reading them, consider how useful one value per day would be?
- Redo Size, as an indication of the volume of insert/update/delete activity during a given interval.
- User Commits (and possibly rollbacks) for transactions per minute/second/hour or whatever.
- Switching on auditing to measure i/o per session. When I suggested this the other week, I searched the net to remind myself of the precise meaning of one of the AUD$ columns and lo-and-behold, Jonathan Lewis has written about this in the past. Which was somewhat disappointing as I'd just re-written almost the same document in order to illustrate it to a colleague and possibly blog about it Regardless, this one has worked well for me in the past, particularly when you're interested in the workload for a particular user account (as we were in this case).
- The type of question that crops up in day to day DBA life.
- The compromises we face.
- The fact that Oracle usually offers many different approaches. The best will depend on the requirement.
Oh, what did we go with in the end? Well, we use Quest's Foglight tool and one of the reports it offers is transactions per minute (I think it is, I only played a consultative role on this and stuck to the native Oracle approaches). It hardly seemed worth working on a new solution when we've got a report that does this already. My guess is that it just uses user commits over time.
Oh, and I still think a single number per day is virtually meaningless
Yesterday was the best day so far, particularly the morning. To start with, I was feeling a little more human (although still out of bed at 4am). I managed to squeeze in breakfast with APC, three conference sessions and a lovely burger for lunch First w
Tracked: Nov 15, 17:42
Well I like gets as a read or a write will issue a get, None of your metrics cover reads.
Interesting stuff and interesting problem.
I sometimes do mathematical modeling for capacity planing purposes and I am then routinely confronted to workload definitions, so I'm a little biased towards this activity.
- In the "usefulness" category I would therefore add workload definition for forecasting. The usual case is "we expect October 1st this business activity to be multiplied by x". The easiest approach (but not always good enough) is to divide the DB activity between what's going to change and what's not, which makes forecasting easier.
- In the "how to define a workload section": most of the times, companies(don't put the blame on the admins alone) are just unable to express business activity in terms of IT usage (i.e. resource consumption per transaction in a given category). Different techniques exists to extrapolate workloads out of a given perimeter (a machine, a set of machines, an IP or FC network, etc.) based on resource consumption patterns. Daniel Ménascé describes some of them in "Capacity Planning for Web Performance", a very good and interesting reading (the book goes far beyond web performance).
The beauty of a metric like "Number of Transactions per Day" is that you can so easily "improve productivity" _and_ worsen performance by increasing the rate at which you commit, or write a script that just issues commits until you get the number you want.
Of course, that would be irresponsible ...
My manager also demands the same meaningless number.
My answer: 100 (buffer cache hit ratio)
Seriously though, I am interested in user adoption (or otherwise) so I would probably be interested in tracking user logins and length of session.
I think standard Oracle session auditing (underrated IMHO) would cover this.
I am also interested in users with multiple sessions. In Siebel implementations, this normally means a a user issuing a query from hell, who then kills the browser session and logs in again (although the query persists, consuming resources).
Sorry, perhaps I should have been specific and said 'activity' in this sentence instead of 'i/o'?
to measure i/o per session
The original plan I had for this blog included recording session activity via auditing and looking at the *various* measurements available. Once I realised JL had covered it on his own website, I merely included a link to that and assumed people would read it. It needn't be i/o as such.
companies(don't put the blame on the admins alone) are just unable to express business activity in terms of IT usage
Ha, tell me about it!
Even if you don't limit this to IT usage, it's extremely difficult to define (together) what to measure. As soon as I get close to asking the questions of 'why' and 'what' and 'which' when I first hear these requests, the boundaries and time available for any such exercise narrow rapidly. I often think that businesses know they *should* be doing some capacity planning (and often instigate the exercise) but when they realise it will require some effort, their appetite disappears. Which is a shame because there'll always be some related crisis around the corner one day.
Bear in mind, too, that I'm talking about retrospective action on production systems running third-party applications that haven't had any performance or capacity testing performed on them - ever - so we have no idea how they work or what they do
So, *that's* why those third party applications commit so frequently
You hit on the two most important points I was trying to make
1) There are genuine questions to be answered. (e.g. how many sessions/transactions/queries in a given time frame at peak time; response times; workload per session etc.)
2) Oracle session auditing is under-rated.
I wish I'd shown the auditing example now, but didn't see the point when there was an article out there already. I hope people will read that in case it solves a problem for them one day.