Doug's Oracle Blog

Entries tagged as Swingbench

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Entries tagged as Swingbench

Related tags
ash awr grid control hotsos 2010 time matters

Sep 19: Alternative Pictures Demo

Note - features in this post require the Diagnostics Pack license

Not long after I'd finished the last post, I realised I could reinforce the points I was making with a quick post showing another one of the example tests supplied with Swingbench - the Calling Circle (CC) application. Like the Sales Order Entry application, CC is a mixed read/write test consisting of small transactions. As always, there's more information at Dominic's website.

One of the main differences to the SOE test is that the CC test consumes data so you need to generate a new set of data before each test using the supplied ccwizard utility. I won't show you the entire workflow here but enough to give you a flavour of the process. The utility is the same one used to create the necessary CC schema in the first place but the option I'm looking for here is "Generate Data for Benchmark Run".



I'd already decided that my CC schema is populated with data for 1 million customers when I created it so I just need to specify the number of transactions the next test will be for. I happen to know that on my particular configuration, a 1000 transaction test will take around 5 minutes to run.



I ran the test twice. I've highlighted the first run here in the Top Activity page.



It should be clear that CC suffers significant log file sync waits on my particular test platform, just like the SOE test. Therefore I'll regenerate the test data set, enable asynchronous commits and re-run the test. Here I've highlighted the second test run.



As well as seeing a similar change in the activity profile according to the ASH samples (the log file sync activity has disappeared as has the LGWR System I/O), there's a significant difference to the SOE test. Because this test run is based on a specific workload volume, as defined by the size of the test data, rather than a fixed time period, the second test run completed more quickly than the first run. The activity only fills the 5 minute activity bar partially, rather than the first test which filled the whole bar.

If you test a specific and limited workload volume it is much clearer from the Top Activity page which test is processing transactions more quickly, based on the Time axis. That's why I didn't pick this example the first time - it's too obvious what's going on!
Posted by Doug Burns Comments: (0) Trackbacks: (0)
Defined tags for this entry: ash, awr, grid control, swingbench, time matters

Sep 12: That Pictures demo in full

Note - features in this post require the Diagnostics Pack license

With so many potential technical posts in my pile, it was initially difficult to decide where to start again but I figured I should avoid the stats series until I'm back into the swing of things ;-) Instead I decided to fulfill a commitment I made to myself (and others, whether they knew about it or not) almost three months ago.

When I gave the evening demo session in the Amis offices I think the 2 hours went pretty well but, as usual with the OEM presentations, I got a little carried away and didn't conclude the demo properly. (This is also the demo I *would* have done at Hotsos last year if the damn thing had worked first time ;-)) It was a shame because as well as showing the neat and useful side of OEM Performance Pages, it also illustrates one of the common pitfalls in interpreting what the graphs are showing you.

I began by running a 4 concurrent user Sales Order Entry (SOE) test using Dominic Giles' Swingbench utility. I won't got into the details of the SOE test because I don't think it's particularly relevant here but you can always download and/or read about Swingbench for yourself at Dominics website.

I ran the test for a fixed period of 5 minutes using no think-time delay.




Using the capability to look at ASH data in the recent past, the OEM Top Activity page looks like this.




- There was a fairly consistent average of 4-5 active sessions over the 5 minutes period and, looking at the Top Sessions panel in the bottom right of the screen, these were four SOE sessions of similar activity levels and the LGWR process.

- The majority of time was spent on User I/O, System I/O and Commit Wait Class activity, with a little CPU.

- Three PL/SQL blocks were responsible for most of the Commit activity.

- The LGWR process was responsible for most of the System I/O activity.

I'll leave it there for now and won't drill down into any more detail.

In terms of optimising the performance of this test, what might I consider doing?

The most important aspect is to optimise the application to reduce the resource consumption to the minimum required to achieve our objectives. There's a whole bunch of User I/O activity that could perhaps be eliminated? But I'm going to ask you to accept the big assumption here that this application has been optimised and that I'm just using a Swingbench test as an illustration of the type of system-wide problem you could see. In that case, my eye is drawn to the Commit activity.

When I'm teaching this stuff, I'm usually deliberately simplistic (at least at the end of the process) and highlight that what I'm interested in 'tuning' is whatever most sessions are waiting on according to the ASH samples this screen uses. I used to explain how I'd look for the biggest areas of colour, drill down into those and identify what's going on. Sadly, I later heard that someone (I think it was JB at Oracle*) had already come up with a nifty acronym for this - COBS. Click on the Big Stuff! One day I will come up with a nifty acronym for something too, but you shouldn't hold your breath waiting.

So, if I click on the big stuff here, I can see that the Commit Class waits are log file sync.



How might I reduce the time that sessions are waiting for log file sync? Here are a few reasons why the test sessions might be waiting on log file sync more often or for longer than I'd like.

- Application design - committing too frequently
- CPU starvation
- Slow I/O to online redo log files

Whether waits are predominantly the result of CPU overload or slow I/O can be determined by looking at the underlying log file parallel write wait times on the LGWR process but that's a bigger subject for another time.

You can look into all of these in more depth - and should - but as this is designed to be a fun demo of the pretty pictures (it used to be 'the USB stick demo'), I'll simply try to eliminate that activity and re-run the same test. Here's how Top Activity looks now.



Oh. Maybe that wasn't what you expected? OK, the LGWR activity has disappeared, but it seems the system is almost as busy as it was before but that the main bottleneck is now User I/O activity. That's often the way, though - you eliminate one bottleneck in a system and it just shows up somewhere else. It must be good to get rid of log file sync waits though, right? User I/O seems like more productive work and I've managed to make the LGWR activity disappear completely.

But then if you were to look at this graph in terms of Average Active Sessions or DB Time or (as it's more likely to be expressed) how big that spike looks, the two tests would look similarly busy from a system-wide perspective. They were but the real question is - busy doing *what*? There's some important information missing here and Swingbench is able to provide it.



TotalCompletedTransactions 22,868

Mmmm, so I wonder what that value was for the first run?



TotalCompletedTransactions 12,232

Woo-hoo! *That's* what I call tuning a benchmark - processing almost twice the number of transactions in the same 5 minute period.

So it turns out that the sessions in the database *were* just as busy during the second run (not too surprising seeing as the test has no user think time so keeps hammering the database with as many requests as it can handle) but that they were busy doing the more productive work of reading and processing data rather than just waiting for COMMITs to complete.

I raised this issue of DB Time not showing activity details with Graham Wood* at Oracle in relation to a previous blog post. I think he made the point to me that that's why the OEM Performance Home Page is *not* the Top Activity page. If I take a look at that home page, it shows me the same information as the Swingbench results output did, albeit not as clearly




Looking at the Throughput graph, I can see that the second test processesd around double the number of Transactions per second for the same test running on the same system.

To wrap up (and be a little defensive) ...

- Yes, I could have traced one or more sessions and generated a complete and detailed response time profile that should have lead me to the same conclusion.

- Yes, as this is a controlled test environment and I'm the only 'user', AWR/Statspack would have been an even more powerful analysis tool in the right hands.

- The Top Activity page is not the most appropriate tool for this job but it is handy for illustrating concepts.

- Lest I seem a slavish pictures fan, I'm showing how people might misuse or misundersand ASH/Top Activity. In this case, the Home Performance Page is a much better tool because we're looking at system-wide data and not drilling into session or SQL details.

Oh, and what is my Top Secret Magic Silver Bullet Tuning Tip for OLTP-type applications? (only to be used by Advanced Oracle Performance Wizards)

alter system set commit_write='BATCH, NOWAIT';

Done! In fact, why not just use this on all of your systems, just in case people are waiting on log file sync?

(Leaves space below for angry responses and my withering humorous retorts)

* This is not name-dropping, this is giving due credit to the people who really know what they're talking about
Posted by Doug Burns Comments: (6) Trackbacks: (2)
Defined tags for this entry: ash, awr, grid control, swingbench, time matters

Mar 28: Time Matters: Throughput vs. Response Time - Part 2

In one of my Hotsos Symposium 2010 posts I mentioned that Peter Stalder had plugged some test results from an earlier blog post into Neil Gunther's Universal Scalability Law to see how well the model applied. Peter's posted his slides now and I've added the URL to the comments thread of the original post so people can see another perspective.

He also pointed out a recent blog post discussing similar subjects at Neil Gunther's blog although I must admit I've only had a quick glance at it because I'm up to my eyeballs in mail at the moment :-(
Posted by Doug Burns Comments: (0) Trackbacks: (0)
Defined tags for this entry: hotsos 2010, swingbench, time matters

Feb 8: Time Matters: Throughput vs. Response Time

Niall Litchfield made an interesting comment in an email thread that prompted this post.

"... I can see that once you start to define workload, or transactions, in business terms (I need to get all these things done, what works best overall?) then workload response time* does make sense, both as a metric, and more pertinently as a tuning target. I think, and have done for a while, that for much of the last 10 years the tuning dialog has been around query or session optimization with either an explicit or implicit assumption that optimization of particular queries will result in workload optimisation from a business perspective. In many cases this is true, I suspect though that for many cases it isn't. Response time will still be the core metric, but perhaps some of the tuning methods will vary. It may be as simple as redefining stage 1 of the tuning process from

'
define the task of interest to the business'
to
'
define the workload of interest to the business'"

I agree with Niall, particularly with regard to the current emphasis on optimising individual sessions or queries, rather than the overall workload.

I'll state this another way. Maybe ...

1) the fastest individual session response time

... isn't always the most important optimisation goal if you're trying to achieve ...

2) the highest workload throughput?

(That's why I have a few slides early on in the course I wrote for Oracle that define a few ways of looking at system performance, including response time, throughput and scalability.)

I'll try to illustrate these two different tuning goals using Dominic Giles' Swingbench utility. I executed the Sales Order Entry (SOE) benchmark with different numbers of concurrent users for a period of 3 minutes. The Oracle instance, hardware configuration (dual-core laptop with the usual lousy hard drive) and application code were identical for all of the tests. The only significant variable was the number of concurrent sessions. Here are the results (which Swingbench helpfully displays in the Output tab after each test run).

Concurrent Sessions Avg. Response Time (ms) Transactions Completed
1 79 4,203
2 108 6,772
4 133 10,481
8 198 13,346
12 244 13,639
16 310 14,798
20 337 14,749
24 369 14,176
28 428 15,181
32 563 13,278
36 533 14,151
40 587 13,302


Without question, the average response time is best when only one user is running the test, and gets worse steadily as the number of active sessions grows. However, having only one active session also delivers the worst number of completed transactions during the test. If we want to process as many transactions as possible in those 3 minutes, then it looks like we should have around 28 concurrent execution streams.

Do we care that the average transaction completion time has increased to 587 milliseconds? Well, if the transaction is part of a user interaction, then the answer is undoubtedly yes. I'm sure your users like the fastest possible response and if we have 28 active sessions, then an individual user is going to experience performance 5 times slower than if they were the only user. Oh, and that's just the average response time - the maximum will be much worse!

If, however, this is all part of an overnight batch process and the tuning goal is to get through all of the workload as quickly as possible, then the number of transactions completed is our focus and who cares if each transaction takes a little longer? In this case, it's better to load the system more heavily and increase the response time of individual transactions. Scheduling the workload such that the transaction response time increases might seem counter-intuitive. For example, if I were to trace one of the sessions at random during each of the test runs and I analysed the trace files, wouldn't 'performance' look much 'better' in the first test than the last, because I'm only focussing on one session?

Is performance better in the first test than the last? It all depends on what you mean by performance.

Now, let me address what I might appear to be implying here, that tracing and tuning individual sessions is flawed. In fact, all of these tests would have yielded even more impressive results if I had tuned the SOE application at the session level *first* so that each transaction uses less resources and completes more quickly. That's a good thing. I do understand that analysis and tuning at the session scope is vital but, if each individual session's work has been optimised already, there are other aspects we need to consider and looking at a single session trace will not give me all of the information I need. (Gosh, I'd better be careful because I'm getting close to suggesting that Statspack and AWR might be useful tools for some situations after all!)

However, when looking at this useful improvement in throughput on my laptop by increasing the load during a basic test, be very careful.

- Whilst this particular 'batch process' processed more work in the same period of time with more concurrent sessions, it also slowed response time for *everything else* on my laptop (server). That's why it's not sensible to mix batch and interactive workloads.

- As the number of sessions increases past a certain point, the number of completed transactions goes *down* as the system starts to suffer contention so we get further away from our goal whilst using more resources. Achieving the correct balance is the key.

- Looking at the results again, is the best balance to run 28 concurrent sessions when we can deliver almost the same throughput from 16? Well, I'd plump for 16 for this job on this configuration every time, if it meets the system requirements. Squeezing every last ounce of performance from a system is likely to lead to a serious performance problem one day because we're reducing the breathing room to cope with unplanned events.

All I'm really saying in this post is that performance analysis, response time tuning, workload management (call it what you will) can't and shouldn't be limited to looking at the performance of an individual session in isolation.

* Is it just me, or does 'Workload Response Time' sound weird? It's the same idea as session response time - how quickly do we get from the beginning to the end - but it seems different to me. That might be my lack of understanding, reading, or rigour, but I find the idea of a system, workload or multiple users having an aggregated response time a strange idea to get my head around. I understand start - do lots of work - stop. But is the time that that takes a response time or throughput? Have I just got too used to Response Time being directly related to a user sitting, waiting for a response?
Posted by Doug Burns Comments: (27) Trackbacks: (6)
Defined tags for this entry: Swingbench, time matters

Apr 19: Swingbench

A short post in praise of Swingbench. (There's much more information at that link.)

Without this utility, demonstrating performance problems during the course would have been much more difficult so I owe Dominic Giles. It's simple to set up using one small configuration file to set up environment variables, a JVM and an Oracle client installation. It's supplied with 4 usable benchmark applications which are useful to demonstrate different aspects of Instance behaviour.
SH - Sales History application based on the Oracle-supplied SH schema. A read-only benchmark.
CC and SOE - OLTP-type applications
Stress Test - basic INSERT/UPDATE/DELETE/SELECT test.

It's easy to change the number of sessions, think time, load of different parts of the application and as it's a framework, you can use it to test your own PL/SQL-based apps.

The only problem I've run into is getting the Overview screen to work properly on all benchmarks but I suspect that the problem lurks in one of the XML files describing the benchmarks or my JVM. Still working on that one.

As I said, I owe Dominic, so take a look as it's very easy to try and if it serves you as well as it has me, there's a Paypal donation link on Dominic's site ;-) Anyone who comes to any of my presentations this year is likely to get a chance to see Swingbench in action.


Posted by Doug Burns Comments: (0) Trackbacks: (0)
Defined tags for this entry: Swingbench
« previous page   (Page 1 of 1, totaling 5 entries)   next page »

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

jonathanlewis.wordpress.com about 10053 Trace Files - Different Plan in Different Environments
Sat, 01.06.2013 11:26
Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]

Upcoming Presentations


Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl