Aug 5: Get Up Offa That Thing
(Otherwise known as #E42014 to the Twitterati. Note to the casual reader ... like a lot of conference posts, this is more personal diary entry than having any tech content whatsoever. You have been warned.)
Yes, so this post is hopelessly late, but I really have been busy this time!
Although I feel like I've done quite a few presentations over the past year, a lot of them have been at client sites and the conferences have been a little more spaced out so it felt good to be back in the wild a couple of months ago, particularly as it was going to be my last conference before moving to Singapore for work (more on that later). It was supposed to be a packed week with E4 covering the first few days and OUGF the last few days. Happily for me, @HeliFromFinland was good enough to understand that one conference would steal a lot less Singapore preparation time and the refundable travel made it attractive too. Thanks, Heli! (Although when reading the #OUGF14 tweets in between packing, I'm wasn't so sure!)
I was lucky enough to be able to use some frequent flyer miles to upgrade both my flights to DFW, which was nice. But with no slides to work on and with an unusually low desire to watch movies, I mainly ate, read, slept and drank and the flight seemed to zip by. I believe I might have had one cocktail too many so that, by the time Jason Osborne picked me up in his extremely nice sports car, I could barely form a sentence. (I may be exaggerating slightly) I was nursing a Mojito slowly by the time everyone else was ready to meet at the hotel bar. It turned out to be the first of quite a few (no idea how that happened!) and although I was in bed relatively early, I felt like death when I woke up. I think that's the first time that's ever happened.
Sunday was all about registering and getting a nice Enkitec speaker goodie bag before settling in for an afternoon of Tanel Poder presenting on Exadata Internals and Advanced Performance Metrics - two 90 minute sessions - and although he was splendid as usual, I ended up only managing the first couple of hours before I really had to *get some food* and *sleep*. I got the first half done but the second consisted largely of me lying in a hotel room in a zombified state trying to rise myself for the speakers dinner that Enkitec had laid on
By the time they were done, though, I was able to catch up with them as they returned for a far more sedate last few beers and an early night in bed.
One of the aspects I like most about E4 is that it's available as a webinar for virtual attendees at a reasonable cost and, because it's recorded, I can watch presentations that I might have missed the first time later, when I have some down-time. It means it's much easier to attend for those who can't get travel permission and also that, if you are particularly interested in presentations I just touch on here, anyone can register and see them all for themselves! (and no, I'm not on commission, I just think some of this stuff should have a wider audience than it already has ...)
One of the presentations that you're probably not likely to see at too many other conferences (I'm certainly not familiar with it) was the initial keynote - Exadata: The Untold Story of a Startup within Oracle with Kodi Umamageswaran who is VP Exadata Product Development at Oracle. I know that lining up this particular speaker took a lot of work from the organisers and it was well worth it for some classic keynote stuff. An entertaining and wide-ranging look at the pre-history of Exadata (SAGE) development, some of the key stages since and a look at the most recent developments. I thought it hit just the right level of being technical enough to be entertaining for a tech crowd, but without getting too bogged down in any one area.
Next up was Tom Kyte's keynote, titled What Needs to Change, during which he talked about some of the many changes in performance expectations and system capabilities in the time he's been working with Oracle and a good sprinkling of the content from the Real World Performance Days he does with Andrew Holdsworth and Graham Wood. My favourite bit of this is always when he talks about how 100% CPU usage is a bad idea for an OLTP-type system (essentially a very bad idea for response times) because I've had so many people at customer sites quote me some Tom Kyte thread or another suggesting that 100% CPU usage is a great thing, because you've paid for that capacity after all. Erm, yes, sure it is for some workloads but possibly a more important Tom Kyte quote is ... 'it depends'!
I wanted to attend Exadata Resource Manager Deep Dive by Akshay Shah of Oracle, but felt I needed to skip at least one session to eat something (Fajitas!) and prepare for my own. Having lunch with Cary Millsap, I was surprised to hear that he would be an Enkitec employee from that day (with a nice sideline in books and tools and training too, of course). It strikes me that this is a good move for all concerned. Nobody sane wouldn't want Cary on board and those Accenture people can help sell the Method-R skills into as many customers as possible. Good move by Kerry Osborne, if you ask me.
I always look forward to watching Tyler Muth present and this time it was a continuation of his central areas of interest these days - High Throughput Computing on Exadata - which was a collection of tips and a sense of the approach he takes when working with large ETL processes and the like. Encouragingly for me, they were very similar to some client work I did a year or two ago at a client, presented at last year's OUG Finland conference and it's always good to know you're heading in the right direction. I remember saying I would write some blog posts about that too! Maybe some day ... I could watch Tyler present all day, whether he sticks to his planned ideas or goes a little off-piste because he always has something interesting to say in an entertaining way and feels like a kindred spirit. (With less swearing perhaps! ) One take-away I noticed was that he recommended the ODA sizing document as a nice guide to stop people over-consolidating their workloads. e.g. Do you really want 42 databases on a 8 core server? I think it was the tables in this section that he was talking about.
Next up was my presentation which I think went well from a presentation perspective and contained some hopefully useful real world DBaaS experiences but I think the problem with that particular presentation is that it's not really technical enough on the one hand and on the other I *want* to keep it light. The truth is that I could probably talk sensibly about the subject for three hours so I've walked away thinking about what I didn't say! Still, it wasn't too bad and Martin Bach seemed to like it, which was comforting :-)
Not too long later, I had somehow been roped into taking part in a Hadoop vs. Oracle Database Panel with my old mate Alex Gorbachev as the moderator and Tanel Poder, Eric Sammer and Kerry Osborne debating the strengths and weaknesses of the two approaches and whether the RDBMS is on it's way out as a useful technology for most data analysis tasks. I must confess I'm not the greatest fan of panel sessions because you can never get into enough detail and argue the case properly but at least not everyone agreed, although that might have been the beer and jetlag combining in my case Eventually we split into two groups to try to illustrate the different design approaches to a problem from an Oracle or a Hadoop perspective, so I roped in an impossibly young and smart backup team of Martin Back and Karl Arao. I probably still managed to talk over them though I suppose it was all a good bit of knockabout fun and came with a free beer attached, so I mustn't grumble! I guess I walked away still thinking it's horses for courses ...
That finished off day one nicely for me as I felt I'd both learned a few things and had fun at the same time. The fun is always guaranteed but I rarely learn as much as I do at E4.
The keynote the following morning was probably worth the price of admission alone. The Exadata SmartScan Deep Dive delivered by Roger MacNicol (who is a Consulting Engineer working on Smart Scan at Oracle) was precisely the type of presentation techies are looking for but with the added value that his presentation style and slides are as excellent as his content. It's so long ago now that I'm a bit short on details but plan to watch it again and, if you get a chance to hear Roger speak at a future conference (are you listening UKOUG?) then you should grab it with both hands, feet and whatever else you have at your disposal!
Maria Colgan, on the other hand, was far too busy working on the upcoming launch of Oracle In-Memory or something to bother about actually attending conferences to deliver her keynotes in person! Which wasn't a big deal for me personally as I've already heard a *lot* about IMO but it was a good opportunity for me to take the p*** out of her on Twitter!
My last full presentation of the day before I had time for a few beers and to head to the airport was Think Exa! with Martin Bach & Frits Hoogland, which was a collection of a handful of subject areas they highlighted as being worth thinking carefully about during initial implementations on Exadata servers. At first I thought Mr. Bach was going to do all the talking, but they did take it in lengthy alternating sections, which worked really well and set me up nicely for a last few beers with Martin, Frits and soon-to-be-Oak-Table-Network-member Alex Fatkulin, who I've known electronically for a while but only get to see at E4.
It's the second time I've been able to attend E4 in person (the other I attended remotely) and it's quickly become one of my favourite conferences. Sure, it's organised by good people doing a great job and some are friends, but I think as well as taking good care of people, Kerry Osborne's contacts within Oracle on top of Enkitec's consultants and customers merge together to help create a truly special agenda. I'd thoroughly recommend attending if you have the opportunity, even virtually.
Thanks again to everyone at Enkitec for another top job organising this thing and for giving me the opportunity for another trip to Dallas!
However, the complete highlight of the conference for me was getting to spend some decent time with a happy and healthy-looking Peter Bach, which amazed me after the health problems he's faced this year. Good to have you back, Peter, and I bet Oracle are glad to have you back working for them too! LOL
(Reminder, just in case we still need it, that the use of features in this post require Diagnostics Pack license.)
Damn me for taking so long to write blog posts these days. By the time I get around to them, certain very knowledgeable people have commented on part 1 and given the game away!
I finished the last part by suggesting that a narrow AWR interval makes less sense in a post-10g Diagnostics Pack landscape than it used to when we used Statspack.
Why do people argue for a Statspack/AWR interval of 15 or 30 minutes on important systems? Because when they encounter a performance problem that is happening right now or didn’t last for very long in the past, they can drill into a more narrow period of time in an attempt to improve the quality of the data available to them and any analysis based on it. (As an aside, I’m sure most of us have generated additional Statspack/AWR snapshots manually to *really* reduce the time scope to what is happening right now on the system, although this is not very smart if you’re using AWR and Adaptive Thresholds!)
However, there are better tools for the job these days.
If I have a user complaining about system performance then I would ideally want to narrow down the scope of the performance metrics to that user’s activity over the period of time they’re experiencing a slow-down. That can be a little difficult on modern systems that use complex connection pools, though. Which session should I trace? How do I capture what has already happened as well as what’s happening right now? Fortunately, if I’ve already paid for Diagnostics Pack then I have *Active Session History* at my disposal, constantly recording snapshots of information for all active sessions. In which case, why not look at
- The session or sessions of interest (which could also be *all* active sessions if I suspect a system-wide issue)
- For the short period of time I’m interested in
- To see what they’re actually doing
Rather than running a system-wide report for a 15 minute interval that aggregates the data I’m interested in with other irrelevant data? (To say nothing of having to wait for the next AWR snapshot or take a manual one and screwing up the regular AWR intervals ...)
When analysing system performance, it’s important to use the most appropriate tool for the job and, in particular, focus your data collection on what is *relevant to the problem under investigation*. The beauty of ASH is that if I’m not sure what *is* relevant yet, I can start with a wide scope of all sessions to help me find the session or sessions of interest and gradually narrow my focus. It has the history that AWR has, but with finer granularity of scope (whether that be sessions, sql statements, modules, actions or one of the many other ASH dimensions). Better still, if the issue turns out to be one long-running SQL statement, then a SQL Monitoring Active Report probably blows all the other tools out of the water!
With all that capability, why are experienced people still so obsessed with the Top 5 Timed Events section of an AWR report as one of their first points of reference? Is it just because they’ve become attached to it over the years of using Statspack? AWR has it’s uses (see JB’s comments for some thoughts on that and I’ve blogged about it extensively in the past) but analysing specific performance issues on Production databases is not it’s strength. In fact, if we’re going to use AWR, why not just use ADDM and let software perform automatically the same type of analysis most DBAs would do anyway (and in many cases, not as well!)
Remember, there’s a reason behind these Recurring Conversations posts. If I didn’t keep finding myself debating these issues with experienced Oracle techies, I wouldn’t harbour doubts about what seem to be common approaches. In this case, I still think there are far too many people using AWR where ASH or SQL Monitoring are far more appropriate tools. I also think that if we stick with a one hour interval rather than a 15 minute interval, we can retain four times as much *history* in the same space! When it comes to AWR – give me long retention over a shorter interval every time!
P.S. As well as thanking JB for his usual insightful comments, I also want to thank Martin Paul Nash. When I was giving an AWR/ASH presentation at this springs OUGN conference, he noticed the bullet point I had on the slide suggesting that we *shouldn’t* change the AWR interval and asked why. Rather than going into it at the time, I asked him to remind me at the end of the presentation and then because I had no time to answer, I promised I’d be blogging about it that weekend. That was almost 4 months ago! Sigh. But at least I got there in the end!
That subject has been covered well enough, in my opinion. (To pick one example, this post and it's comments are around 5 years old.) Diagnostics Pack customers should almost always increase the default AWR retention period for important systems, even allowing for any additional space required in the SYSAUX tablespace.
However, I've found myself talking about the best default AWR snapshot *interval* several times over recent months and years and realising that I'm slightly out of step with the prevailing wisdom on the subject, so let's talk about intervals.
I'll kick off by saying that I think people should stick to the default 1 hour interval, rather than the 15 or 30 minute intervals that most of my peers seem to want. Let me explain why.
Initially I was influenced by some of the performance guys working in Oracle and I remember being surprised by their insistence that one hour is a good interval, which is why they picked it. Hold on, though - doesn't everyone know that a 1 hour AWR report smoothes out detail too much?
Then I got into some discussions about Adaptive Thresholds and it started to make more sense. If you want to compare performance metrics over time and trigger alerts automatically based on apparently unusual performance events or workload profiles, then comparing specific hours today to specific hours a month ago makes more sense than getting down to 15 minute intervals which would be far too sensitive to subtle changes. Adaptive Thresholds would become barking mad if the interval granularity was too fine. But when nobody used Adaptive Thresholds too much even though they seemed like a good idea (sorry JB ) this argument started to make less sense to me.
However, I still think that there are very solid reasons to stick to 1 hour and they make more sense when you understand all of the metrics and analysis tools at your disposal and treat them as a box of tools appropriate to different problems.
Let's go back to why people think that a 1 hour interval is too long. The problem with AWR, Statspack and bstat/estat is that they are system-wide reporting tools that capture the difference (or deltas) between the values of various metrics over a given interval. There are at least a couple of problems with that that come to mind.
1) Although a bit of a simplification, almost all of the metrics are system-wide which makes them a poor data source for analysing an individual users performance experience or an individual batch job because systems generally have a mixture of different activities running concurrently. (Benchmarks and load tests are notable exceptions.)
2) Problem 1 becomes worse when you are looking at *all* of the activity that occurred over a given period of time (the AWR Interval), condensed into a single data set or report. The longer the AWR period you report on, the more useless the data becomes. What use is an AWR report covering a one week period? So much has happened during that time and we might only be interested in what was happening at 2:13 am this morning.
In other words, AWR reports combine a wide activity scope (everything on the system) with a wide time scope (hours or days if generated without thought). Intelligent performance folks reduce the impact of the latter problem by narrowing the time scope and reducing the snapshot interval so that if a problem has just happened or is happening right now, they can focus on the right 15 minutes of activity1.
Which makes complete sense in the Statspack world they grew up in, but makes a lot less sense since Oracle 10g was released in 2004! These days there are probably better tools for what you're trying to achieve.
But, as this post is already getting pretty long, I'll leave that for Part 2.
1The natural endpoint to this narrowing of time scope is when people use tools like Swingbench for load testing and select the option to generate AWR snapshots immediately before and after the test they're running. Any AWR report of that interval will only contain the relevant information if the test is the only thing running on the system. At last year's Openworld, Graham Wood and I also covered the narrowing of the Activity scope by, for example, running the AWR SQL report (awrrpt.sql) to limit the report to a single SQL statement of interest. It's easy for people to forget - it's a *suite* of tools and worth knowing the full range so that you pick the appropriate one for the problem at hand.
May 28: OUG Scotland
As always, kudos to Thomas Presslie and all the good people in the UKOUG office who work so hard putting this together.