Topsy-Turvy

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Blog

Apr 6: Topsy-Turvy

It's been a topsy-turvy couple of days. I had my last night shift on Tuesday night/Wednesday morning, which was the busiest of them all, although there was only one small problem in the last 8 hours of it.

My favourite bit, earlier in the evening, was improving the performance of one of the weekly batch jobs from 5 hours down to less than a minute. The original job used 48Gb of temporary space too - in each of the 6 databases that run the job. As the customer said - cancel those disk deliveries ;-) I don't want to produce the query here because I'm not sure my bosses would be keen, but the source of the problem was that the SQL statement included a reference to a table in a central repository via a database link. Here's a small section

AND pe.open_item_id IN (SELECT int_value
FROM v_extn_system_parameters
WHERE process_key = 'CAFTE003'
AND parameter_name LIKE 'ACCRUED%'))

The problem was that the CBO decided to take this subquery and convert it into a Hash Join against one of the other tables. What you can't see here is that this was a long statement, joining many tables together. The subquery itself should always return 4 numeric values and if those values were plugged in directly, like this, the problem disappeared.

AND pe.open_item_id IN (1, 2, 10, 25))

That's not a reasonable solution for a production system, though - what happens when a new code appears in 18 months?


When you see the solution, the words 'Silver' and 'Bullet' might pop into your mind. If so, let me point out that I worked it out by poring over execution plans, trying different approaches in full-sized test environments and, when you look at the problem, it's actually a very localised problem with a large query (although it was having a dramatic effect on the whole query). The problem is that Oracle was joining this remote table to a local table at a very early stage of the query, impacting later stages of the execution plan. All I really wanted to do was to stop it doing that. Here was the particular solution that worked for this particular query

AND pe.open_item_id IN (SELECT /*+ NO_UNNEST */ int_value
FROM v_extn_system_parameters
WHERE process_key = 'CAFTE003'
AND parameter_name LIKE 'ACCRUED%'))

This hint made the CBO treat this subquery as a seperate subquery and not a candidate for a join. i.e. It didn't unnest the subquery.

In a bad situation like this (the developers were frantic with the next weekend run approaching - the word embarassing was being thrown around) we were just happy to get the damn thing fixed. It still needs to be tested, change requests completed and the change rolled out, but I must admit I was quite pleased. It is my job, but it's very rare to get such a dramatic performance improvement from a small change that is genuinely valuable to the business.

I went home and went to bed satisfied. Later on, I woke up and logged in to have a quick look at my emails from home. (This practice seems to be the subject of some ridicule at my work - I'm off work and at home, why am I checking my emails? Well, it only takes 10 minutes and I'd rather know everything is okay. Doesn't everyone do that during critical work periods?) Regardless, I was absolutely (sarcasm on) delighted (sarcasm off) to discover that the night-shifts have been cancelled. Just my luck - being the last person to have to do that shift. Having said that, it's a sign of how well things have gone overall and I wouldn't really wish those shifts on anyone.

Since then I've been sleeping. Really, I've never slept so much in my life!
Posted by Doug Burns Comments: (22) Trackbacks: (0)

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

#1 - David Aldridge said:
2006-04-06 13:13 - (Reply)

Nice one Doug.

Was the CBO getting the cardinality of the subquery wrong, or something?

#2 - Pete_S said:
2006-04-06 13:20 - (Reply)

Nice result, Doug!

I often see real problems when trying to do complex (and not so complex) stuff with remote tables. Far too much data is dragged around the network and accessed through sub-optimal plans

One thing that we sometimes get good results from is to wrap the remote selection into a view on the remote database and re-write the local query to join to the view, that way only the results are moved between the databases

#3 - Doug Burns said:
2006-04-06 13:20 - (Reply)

Precisely.

I hesitated to post this because I didn't want to give too much detail, but wanted to pat myself on the back ;-) I also used the phrase 'sledgehammer to crack a nut' when I was talking to the developers. Using NO_UNNEST was a pretty blunt tool because I didn't say - 'here's what that table really looks like' - I just said - 'don't even think about joining this'.

Maybe this would be a better summary of what was tuned here ...

The CBO was getting the cardinality of the remote table wrong, deciding it was a good join candidate, when it wasn't, so I forced it not to consider it.

Maybe I'll add the plans later ...

#4 - Doug Burns said:
2006-04-06 13:24 - (Reply)

Ooops, Pete S jumped in while I was commenting ;-)

Nice idea, Pete. I must admit, there was an element of 'Oh dear, there's a remote access there - I bet that's the problem' prior to the in-depth analysis. I worried about that because it's similar to saying 'Oh dear, there's a full table scan there' when we both know that often isn't a problem.

Having said that, I've had similar previous experiences to yours and it did turn out that that was (at least part of) the source of the problem. So there was a mixture of prior experience and analysis here.

#5 - -brady- said:
2006-04-06 16:01 - (Reply)

The word, 'Blackberry' comes to mind. You can check your email without ever leaving the warm, soft bed.

#6 - Gary Myers said:
2006-04-06 23:30 - (Reply)

"One thing that we sometimes get good results from is to wrap the remote selection into a view on the remote database and re-write the local query to join to the view, that way only the results are moved between the databases"

You can get burned doing exactly that.
I've done some investigations on views over DB links (http://igor-db.blogspot.com/2005/09/views-db-links-and-cbo.html and http://igor-db.blogspot.com/2005/09/views-db-links-cbo-and-indexes.html)
Basically the upshot is that, when there's a view on the other side of the DB link, the local optimizer gets a very poor level of information of the remote objects stats (because its a view and doesn't really have stats).
In fact, given the identifier "v_extn_system_parameters", I wouldn't be surprised if that remote object is a view.

#7 - Pete_S said:
2006-04-07 10:41 - (Reply)

True, Gary
But I did say sometimes ;-)

#8 - Doug Burns said:
2006-04-07 11:30 - (Reply)

Interesting comments.

I think it'll be worth posting another blog on this once I'm back at work and can gather some more information.

#9 - Mathew Butler said:
2006-04-07 13:16 - (Reply)

Very interesting.

I'm curious, was the issue reproducible with the table on the local system with db stats from the remote site?

Do you think that refreshing the stats on the remote system *may* have resolved the issue? ( clearly not a good permanent solution if changes to the execution plan can so drastically impact the query performance ).

Just curious whether the root cause is stale optimiser stats on the remote object or the assumptions coded into the optimizer. I believe that the optimizer costs remote access the same as if the table was on the local system. ie: there is no cost for the network overhead.

So, if this was caused by an optimizer problem hinting the query was probably the lowest cost solution in this case.

Is this a good example of when hinting should be used?

#10 - Doug Burns said:
2006-04-07 17:30 - (Reply)

All,

There have been so many good comments on this blog that I'll probably do a follow-up once I'm back online at work

Cheers,

Doug

#11 - David Aldridge said:
2006-04-07 19:39 - (Reply)

Oooh, I have another idea -- slapping the definition of the remote view into a subquery factoring clause, and optionally materializing the result set on the local database.

#12 - Doug Burns said:
2006-04-07 19:43 - (Reply)

The Sponge said...

"Oooh, I have another idea -- slapping the definition of the remote view into a subquery factoring clause, and optionally materializing the result set on the local database. "

Tee hee ... too late my bearded friend. That was the first thing I went for ;-) Having said that, I did give up on it before testing it when the UNNEST approach popped into my head ;-)

#13 - Anonymous 2006-04-08 15:46 - (Reply)

A "where rownum > 0" might have achieved the same thing too.

#14 - Anonymous 2006-04-08 16:57 - (Reply)

Actually some method to materialize the subquery may be just half of the fix.

The other half, assuming the CBO doesnât have good visibility into what comes out of the subquery, would be to add a âcardinalityâ hint â so the plan for the big query takes into account this small intermediate result set(approx. 4 rows). Maybe few more things for you to try â

#15 - Anonymous 2006-04-08 22:22 - (Reply)

It's possible that the comments about cardinality, materilizing, subquery factoring etc. are the wrong way round.

Doug's fix was to use the NO_UNNEST hint, which means the remote subquery was probably executed as a late running filter which (a) had been transformed from an "IN" to an "EXISTS", (b) avoided an early multiplication of data in the hash join, (c) probably benefited from the scaler subquery run-time optimisation that meant the remote query only ran 4 times (once for each of the item_id values listed).

I suspect that the problem appeared when Oracle instantiated the subquery through unnesting and did a hash join with it - and all the 'alternative strategies' mentioned look as if they are just different ways of making that happen.


Regards

Jonathan Lewis

#16 - Anonymous 2006-04-09 04:55 - (Reply)

Jonathan: "I suspect that the problem appeared when Oracle instantiated the subquery through unnesting and did a hash join with it"

Iâm not clear what "it" refers to â the "v_extn_system_parameters" object itself? â or the subquery (predicates included)?

The CBO replaced the "IN" with a join.
The programmer couldâve easily done that too (written the query that way in the first place).

So, if for the resulting query, the CBO chose a highly suboptimal plan, wouldnât that be an indication of an issue related to the remote table/statistics or with the CBO itself? Could it be that, if the CBO knew the result set out of that subquery was in fact 4, it wouldnât have chosen to hash join with "v_extn_system_parameters" (or, at least, not have done it at the point in the execution tree it actually chose)?

Hope Doug will show us some plans.

Thanks.

#17 - Jonathan Lewis said:
2006-04-09 07:28 - (Reply)

In this case "it" refers to the result of instantiating the subquery.

The problem may have been a bad estimate of filter cardinality on the remote view - or on the join cardinality with the local table (and the latter could mean a bad estimate locally, not just remotely).

But 9i does not cost unnesting, so even perfect statistics would have resulted in an unnest - and sometimes the optimum plan is to keep the filter.

#18 - Anonymous 2006-04-09 21:35 - (Reply)

Jonathan â thank you. I think I understand your point.
One last uncertainty, if you could explain/comment on it.

The sql coerced with the no_unnest hint â would this plan be resilient to changes in data (volume/distribution) on the local side? The remote data seems fairly static (volume/distribution)â the local data maybe more volatile. Wouldnât the scalar subquery runtime optimization ensure, at best, a number of executions equal to the numer of distinct values in the local column âpe.open_item_idâ rather than the distinct values from the âv_extn_system_parameters.int_valueâ remote column? This subquery as a late running filter with only 4 runs seems indicative of not much filtering actually taking place â that is, most/all values in âpe.open_item_idâ are in fact 1, 2, 10 or 25.

Since this is a batch job, wouldnât a safer alternative be to completely take the subquery out into a separate statement building the list of ids and implement the main query as dynamic sql? Take the hard parse hit but avoid the need for the hint.

Thanks Doug for the interesting case.

#19 - Doug Burns said:
2006-04-09 22:09 - (Reply)

Gabe (and others)

Just to let you know that I have grabbed the plans and I think there'll be enough detail to talk about then which will make more sense.

The follow-up blog was actually finished some hours ago but I'm thinking how to format it best.

Cheers,

Doug

#20 - Doug Burns said:
2006-04-09 22:11 - (Reply)

Gabe said

"Since this is a batch job, wouldnât a safer alternative be to completely take the subquery out into a separate statement building the list of ids and implement the main query as dynamic sql? Take the hard parse hit but avoid the need for the hint."

On that particular point, that was the original plan - to use dynamic SQL. However, this short term fix was easier to develop, test and deploy and was an improvement on a hard-coded list of values. However, seperate statements might be the final choice here.

#21 - Doug Burns said:
2006-04-12 21:21 - (Reply)

There's a new blog relating to this blog, including the execution plans, at

http://tinyurl.com/jp4yb

#22 - Doug Burns said:
2006-04-16 13:25 - (Reply)

Gabe,

To add extra information on your specific point

"This subquery as a late running filter with only 4 runs seems indicative of not much filtering actually taking place â that is, most/all values in âpe.open_item_idâ are in fact 1, 2, 10 or 25."

SQL> select open_item_id, count(*) from product_elements

2 group by open_item_id;



OPEN_ITEM_ID COUNT(*)


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications
BBCode format allowed
 
 

Upcoming Appearances

Hotsos Symposium 2010 - 7th-11th March

Comments

Doug Burns about Advert: Symposium Countdown
Tue, 09.02.2010 09:03
Well I'll be running stuff in VMs, that's for sure, and people have gone on and [...]
Pete Scott about Advert: Symposium Countdown
Tue, 09.02.2010 08:55
It is such a relief to get the paper in (so well done, Doug).... I dispatched [...]
Doug Burns about Parallel Query and 11g
Sun, 07.02.2010 10:09
That could be a long reply, so [...]
Links in Comments

It's a minor source of frustration to me that you can't just paste a Hypertext link into the comments form here but, should you ever want to include a link, all you need to do is use the BBCode format, as mentioned below the comment form.

Here is a link to the relevant part of the document that explains how.

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ash
xml Audit Vault
xml AWR
xml Blogging
xml Cuddly Toys
xml Database Refresh
xml Direct Path Reads
xml Fun
xml listener
xml locking
xml oow
xml oow2009
xml OTN
xml Parallel
xml Patching
xml Swingbench
xml The Reality Gap
xml Time Matters
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl