Hotsos 2010 - Day 4

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Mar 11: Hotsos 2010 - Day 4

First up was Cary Millsap's - Lessons Learned, Version 2010.03 As Cary pointed out, they always try to put the best speakers in the toughest slots - 8:30 in the morning post-party. I think local guys are slightly more reliable too because they might have actually gone home the night before! He started with a quick Hangover Survey (me - check!) and then pressed on talking about how we test system performance.

He showed a video of Boeing stress-testing the wing of the 787 and, as he pointed out, aircraft manufacturers really know how to stress-test! (Of course whether that reassures you as it does me, or makes you wish no-one would talk about wings disintegrating, as it probably would Mads, is personal.) They showed Boeing test equpiment which is complicated, expensive and non-revenue generating. Those tests are expensive but when people's lives are on the line, what choice is there? Boeing knows that it has to test the analytic models used in the design. He spent a lot of time talking about good test design. A few thoughts that stood out to me ...

- Some stress tests are a waste of time. Will the Boeing 787 land on the moon? If this test fails, what has it proven? If it passes, then it's awesome but it would be a very expensive way to prove it can cope with commercial flights in Earth's atmosphere.

- Why test for more than you will see in Production? Because you don't really know for sure what you'll see in Production.

- At some point, but I can't remember the context, he used a Scottish phrase that he'd heard Billy Connolly shout (although the Big Yin was only fully credited later in the day) ...

    "There's no such thing as bad weather, just the wrong clothes"

... looked over at me and said - "I'd love to hear you say that, with the proper accent". I declined politely.

- Most people try to prove only that their systems will work.

- Most tests of systems that are destined to fail never proved it in advance.

- Test to destruction

    a) Test
    b) Until the system melts
    c) Decide whether your real requirements are likely to be lower or higher than melting point.

There was a small amount of time for questions and once it looked like they were done, I granted Cary's wish (never thought I'd say that), stuck my hand up and repeated The Big Yin's words. It was only after the laughter had stopped that I realised I might have ruined his big closing, but I think he was ok about it ;-)

Next was Tanel Poder talking about LGWR, log file sync waits and COMMIT performance and shock, horror, I was actually going to say that this was one of the least rewarding sessions of the week for me. What?!? Tanel? But he's, like, an Oracle God! LOL But there were reasons

- I realise that I know a *lot* about how log file sync and log file parallel write work, how they relate to each other and some of the problems they might help you identify. Because it's a subject I'm *so* familiar with, I didn't learn much.

- His main demo didn't quite show what he wanted it to because it didn't run multiple sessions but, frankly, I'm in no position to talk about demos this week!

By the end, the presentation turned out ok, not least because there was another unexpected appearance from Bob Sneed to talk about the I/O components involved in  redo log management including a suggestion that LGWR be put into a higher scheduling class (but not Real Time!) Updated later - make sure you read Bob and Kevin's comments below. I'll try to find a link to his slides and let you take a look yourself.

I loved Tanel's Big Log File Sync Tuning Secret, though ...

    COMMIT LESS!

It was particularly relevant to me because I had a Big Log File Sync Tuning Secret as the closing moment of my own presentation. The problem was I couldn't use it after the demos went wrong!

    USE ASYNCHRONOUS COMMITS

But, in my case, that was supposed to be funny, too.

I ran off to try and use the free breakfast voucher that Marco had given me but I was just too late. No food again, then :-( Well, I had a couple of slices of cold meat at lunchtime, but mainly to catch up with Alex G before he had to present and then head back to Ottawa. I managed to skip one session at this stage but, after a quick call home, I decided to go along to Alex's RAC Connection Management presentation after all (a little late). Although I have seen some of this stuff before, I always enjoy watching Alex's demos and was particularly impressed by the fact that he'd managed to write his own RAC connection load balancer! I was waiting for the applause in the room but either people didn't quite get it or there was just a lack of energy post-lunch on the last day. I suspect the latter.

Of course, once I'd said goodbye to Alex properly (don't see him nearly enough), I was a little late for whichever session was going to be my final one of the conference and I was hopelessly torn between Kyle Hailey's modern SQL performance tools presentation (Kyle's done a lot of cool work in the area of Oracle Performance Visualisation) and Chris Antognini's Diagnosing Parallel Executions Performance. In the end I plumped for the latter because I thought it was going to be like something I'd unsuccessfully attempted a couple of years ago and I wanted to see if Chris had a different angle on it and had been more successful. In the end, I probably made the wrong choice because although Chris' presentation was great, it was really all stuff I already knew. Definitely my bad call, though. Hopefully I'll get a chance to catch up with Kyle's presentation at some point in the future too!

After that there was just the usual short farewell and thanks from Gary Goodman of Hotsos. Although the thanks were appreciated, I'm glad they were spread around everybody because the attendees are one of the things that make this conference great and Becky and Rhonda did their usual sterling job of organising everything.

Then it was time for some Fajitas with a few friends (actually, a whopping great number of friends who practically filled the Mexican restaurant!) and a few very sedate beers. (We are old men (and women) now and the night before was a big one!) While we were waiting to go to the Mexican, I had one great surprise left - Alex's flights weren't going to get him home, so he came back from the airport and had to check in overnight! At least I got a chance to talk to him properly when I wasn't hopelessly drunk and didn't try to seduce him this time.

Now I need to stop blogging and get back to listening to Tanel's Training Day (good stuff, too, but more about that later)
Posted by Doug Burns Comments: (10) Trackbacks: (0)
Defined tags for this entry: conferences, hotsos 2010
Related entries by tags:
Time Matters: Throughput vs. Response Time - Part 2
Hotsos 2010 - Summary
Hotsos 2010 - Day 5 - Training Day with Tanel Poder
Hotsos 2010 - Monique
Hotsos 2010 - What's THAT?
Hotsos 2010 - Day 3 - An excellent one (part 1)
Hotsos 2010 - Congratulations, Marco!
Hotsos 2010 - My Presentation
Hotsos 2010 - Day 2 - The conference begins
Hotsos 2010 - Day 1.79 - Friends show up

Trackbacks
Trackback specific URI for this entry

PingBack
Weblog: kerryosborne.oracle-guy.com
Tracked: Mar 17, 02:10

Comments
Display comments as (Linear | Threaded)

#1 - Kevin Closson said:
2010-03-12 18:52 - (Reply)

LGWR, LGWR, when will they ever learn. I don't care who's talking about elevating LGWR priority. It makes no difference on a busy system. Here's a little problem for you all to consider. When a foreground process is waiting for LGWR to post it (signaling the session's redo has been written), what state and mode is it in? It's sleeping in kernel mode. When a kernel mode process becomes runable what gets CPU first? A runable user mode process (with elevated priority) or a runable kernel mode process? Yep, you guessed right.

So, if you have, say, 8 cores and LGWR is servicing a group commit for 11 sessions, uh, what do you think happens when he posts all of them and what do you think an elevated user mode priority for LGWR has to do with any of that? :-)

Must be time for me to blog all that again.

#2 - Doug Burns said:
2010-03-12 23:48 - (Reply)

I don't care who's talking about elevating LGWR priority.

Neither do I ;-) In fact, I spent some time questioning this at my current client a few months ago, to no avail. (Well, it probably does make a bit of a difference who's saying it, if I'm honest)

But, yeah, I think it is time for you to blog again ;-)

It was great to see you again, albeit through fairly drunken eyes on that last evening!

#3 - Bob Sneed said:
2010-03-13 20:59 - (Reply)

Who said anything about *elevating* LGWR priority? The point was to keep its priority from being inappropriately reduced and otherwise prevent its operation from being degraded by competition.

I'll have my LGWR slides posted soon; after I add a bit about "exactly what command do I type?" (for implementing the widely-proven FX 60 strategy). I'll also add that to my "CPU QoS" slide set.

The importance of this topic is not conjecture, fellows. It has been dramatically demonstrated in innumerable customer scenarios and consideration of the topic is pretty much routine in benchmarking activities.

#4 - Doug Burns said:
2010-03-14 19:51 - (Reply)

Hi Bob,

Thanks for stopping by. I'm glad you noticed Kevin's comment and I was going to drop you a mail to mention it.

Sincere apologies for completely misrepresenting what you said. My excuses are

a) I was trying to avoid specifics until I could post a link to your slides and people could read your own words. So I over-generalised and then changed what you were saying, too!

b) These posts during conferences are often a little rushed and I need to decide whether it's worth persevering with them or not.

Anyway, I've updated the post to hopefully reflect your actual words more effectively and be sure to post a link to your slides here when they're online.

As I find myself repeating fairly regularly, one of the things I like about blogging is that people can correct me when I screw up.

#5 - Doug Burns said:
2010-03-14 19:56 - (Reply)

Actually, re-reading the post and comments properly (sigh, I am *so* jet-lagged and busy on top!) I'm not sure I did misrepresent you that much, so I won't modify the blog, but just draw attention to the comments and hopefully your slides, too.

#6 - Bob Sneed said:
2010-03-15 11:35 - (Reply)

No worries, mate! Sorry if I came off huffy!

There are certainly circumstances where the benefit of assuring LGWR CPU service may go unnoticed, especially on throughput-oriented benchmarks with ample CPU to spare. However, even in those cases, I'd expect to see reduced average response times and reduced response time variance by taking some measures to assure that LGWR does not experience degraded service.

For production workloads on systems that approach 100% CPU utilization, LGWR priority inversion will probably be calamitous. Other effects such as 'interrupt pinning' of LGWR are also root causes of misery when LGWR lands often-enough on interrupt-hot CPUs. My mission to prevent these things is the result of being called into innumerable escalations at Sun where they proved to be crucial.

Anyways - I'll look forward to discussing my "Brute Force Parallelism" material with you sometime between now and Hotsos 2011! Cheers!

#7 - Doug Burns said:
2010-03-16 05:05 - (Reply)

Sorry if I came off huffy!

Oh, not at all! I was already twitchy about trying to blog about presentations and reducing the full story to a couple of bullet points. But (stuck record) at least blogging allows people to expand on and/or correct the comments in the same place they were made. i.e. Tom Kyte's 'yes, but what about ...' point he reiterated in his keynote.

#8 - Tanel Poder said:
2010-03-23 02:24 - (Reply)

Hm, noticed this reply. Not sure whether you were in the room and heard the real story (about just committing less and preventing priority decay as the main things).

So, what do you think about your own blog article then? :-)

http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/

"It is a scheduling storm. For this reason you should always set LGWR’s scheduling priority to the highest possible (with renice for instance)."

#9 - Doug Burns said:
2010-03-23 05:43 - (Reply)

... and that's the thing about blogging quickly about presentations - highlighting the wrong things.

I thought you might want to know that I used your slides yesterday to explain log file sync and log file parallel write wait times and the diagrams really did the trick, so thanks!

#10 - Kevin Closson said:
2010-03-23 17:22 - (Reply)

Tanel,

I wasn't in the room so I'm not taking a position against anything you've said. As if I'd argue with you anyway. Good grief. In my comment above I explained the perfect case in point where elevated LGWR priority makes no difference. LGWR scheduling priority makes no difference if there are 11 runable kernel-mode processes and 8 CPUs. Kernel mode trumps user mode. That is the point I'm making. That is one of the main points I was making in my Manly Man LGWR SSD post (the one you quoted).

Yep, you found a blog post where I said to elevate LGWR priority. That's because it can't hurt. Neither can it change the fact that kernel-mode trumps user mode and that's why anyone that reads my blog entry (http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/) would be better served by paying particular attention the main points I made in that blog entry which were:

1. Process mode (kernel/user) is more important and process priority
2. Simple processes (e.g., the "noise" process") can get in the way of LGWR and add to LFS.
3. Log file sync waits are generally not LGWR I/O related.
4. Linux offers no lightweight callable preemption protection APIs as did best of breed Unix back in the olden days.

I suppose I should edit that nearly three year old blog post and change:

"For this reason you should always set LGWR’s scheduling priority to the highest possible (with renice for instance)."

to:
"[...] highest possible (with renice for instance). It can't hurt."

Having revisited this topic I realize now that my Manly Men LGWR SSD post morphed into more of a conversation about priority than it should have (in the comment section). Ugh. All that talk about process mode and we got stuck talking about the less interesting topic of process priority.

I have an upcoming fresh post about the topic. We'll see how the saga progresses.


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl