Diagnosing Locking Problems using ASH - Part 3

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Mar 30: Diagnosing Locking Problems using ASH - Part 3

Some features in this post require a Diagnostics Pack license.

In the last part I described a locking scenario where the blocking session had only executed one SQL statement that was quick enough to avoid being sampled by ASH and is now inactive. As a result, there is no ASH data for the blocking session, so diagnosing the root cause of the problem after the event is next to impossible. In fairness, it would be impossible anyway without ASH or similar instrumentation.

However, as I suggested in that post, I would argue that's a fairly unusual situation arising out of the nature of the demo and that the blocking session would normally appear in ASH. So let's try a more realistic situation where the blocking session does more work.

Session 1 - logged in as TESTUSER
TESTUSER@TEST1020> select count(*) from test_tab1;

  COUNT(*)
----------
   1024000

TESTUSER@TEST1020> select pk_id from test_tab1 where rownum=1 for update;

     PK_ID
----------
      4051

So this time, I ran a simple query before the SELECT FOR UPDATE. Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.

Session 2 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where pk_id=4051 for update;

Session 2 hangs, waiting for session 1 to release the lock

Session 1 - commits, to release the lock
TESTUSER@TEST1020> commit;

Commit complete.

Session 2 - Is now able to acquire the lock, so locks the row and returns the result
     PK_ID
----------
      4051

So everything about this test is the same as the last post, but Session 1 is doing a little more work than the simple SELECT FOR UPDATE. Let's look at the ASH data for all TESTUSER sessions.

SYS@TEST1020> select a.sample_id,a.sample_time,a.session_id,
  2             a.session_state,a.event,a.sql_id,
  3     a.blocking_session,a.blocking_session_status
  4  from v$active_session_history a, dba_users u
  5  where u.user_id = a.user_id
  6* and u.username = 'TESTUSER';

 SAMPLE_ID
----------
SAMPLE_TIME
---------------------------------------------------------------------------
SESSION_ID SESSION
---------- -------
EVENT                            SQL_ID        BLOCKING_SESSION BLOCKING_SE
-------------------------------- ------------- ---------------- -----------
    445637
30-MAR-09 22.23.59.263
       147 ON CPU
                                 0ut22yp7mh229                  NOT IN WAIT

    445636
30-MAR-09 22.23.58.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445635
30-MAR-09 22.23.57.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445634
30-MAR-09 22.23.56.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445633
30-MAR-09 22.23.55.252
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445611
30-MAR-09 22.23.33.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445610
30-MAR-09 22.23.32.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445609
30-MAR-09 22.23.31.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445608
30-MAR-09 22.23.30.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER


9 rows selected.

This time I'm going to work my way from the top of the data to the bottom (or from the most recent data to the oldest).

The most recent sample shows session 147 on CPU but immediately prior to that, I can see that it was stuck waiting for a TX row lock since some time between 22:23:54 and 22:23:55 and that the blocking session was 148. Things are a little better this time because there are samples for the blocking session 148 but, if I look at the ASH sample immediately prior to the locking problem, session 148 is executing SQL_ID "cda14zb83bb5u'. If I check which SQL statement that is

SYS@TEST1020> select sql_text from v$sql where sql_id='cda14zb83bb5u';

SQL_TEXT
--------------------------------------------------------------------------------
select count(*) from test_tab1

Which isn't the statement that was holding the lock. I've hit the same problem as the previous example because the SELECT FOR UPDATE activity wasn't sampled.

The up-side to this example is that I can at least see some information about the blocking session in the lead-up to the locking problem. That could be very useful in identifying who the user was, what application they were running, which module they were executing and so on. (Then again, who is to say that that session 148 is the same session as the blocking session 148 and not an earlier session that has disconnected? Updated later - as Graham Wood pointed out, I could have used SESSION_SERIAL# to ensure that I am looking at the right session 148.)

The down-side is that it's easier to fool yourself into thinking you're looking at the root cause of the problem, when you aren't. In this case, it wouldn't take long to realise that the simple query is not blocking the other session and that there must be something missing, but imagine if the SQL had been a transactional statement that could be the source of the problem, but wasn't?

Conclusion
I started off the first post by saying "There are few better illustrations of the utility of ASH data than the ability to detect and diagnose locking problems, even after they've occurred" and I still think that's true. Diagnosing concurrency problems without having access to information on all sessions is next to impossible and ASH's low-cost sampling approach to recording session activity is the very reason it can be switched on all the time, waiting for problems to occur. Of course, the nature of sampling implies an approximate view of reality (and I've deliberately picked examples here to illustrate the gaps in the approximation).

- If the problem is happening right now, then you will have all the information you need, whether you have ASH data or not

- If the blocking SQL statement runs for more than a second because it's resource intensive or is experiencing it's own bottleneck, then you'll have good samples of the root cause.

- If the blocking SQL statement runs in less than a second then you might have an ASH sample of it or you might not, but then you might have samples covering previous actions that could help diagnose the problem.

- Or you could be unlucky, as I was last week, and have no samples at all :-(
I also said in the first post that ASH data "has proved very useful to me in solving several production performance problems in the past" and that still holds true. It just works better with some problems than others and you need to be aware of it's limitations if you're going to use it.
Posted by Doug Burns Comments: (0) Trackbacks: (5)
Defined tags for this entry: ash, locking
Related entries by tags:
Network Events in ASH
Alternative Pictures Demo
That Pictures demo in full
Diagnosing Locking Problems using ASH/LogMiner – The End
Diagnosing Locking Problems using ASH/LogMiner – Part 9
Diagnosing Locking Problems using ASH/LogMiner – Part 8
Diagnosing Locking Problems using ASH/LogMiner – Part 7
Diagnosing Locking Problems using ASH – Part 6
Diagnosing Locking Problems using ASH – Part 5
Diagnosing Locking Problems using ASH – Part 4

Trackbacks
Trackback specific URI for this entry

Diagnosing Locking Problems using ASH – Part 4
No sooner had I finished part 3 with some conclusions than I thought of another example I should have included and then someone else made a comment in an email which suggested another. (Thanks, JB!)"It's probably worth pointing out that NO TOOL can
Weblog: Doug's Oracle Blog
Tracked: Apr 13, 17:29
Diagnosing Locking Problems using ASH – Part 6
"Toto, I've a feeling we're not in Kansas any more."What started as a simple write-up of a course demo gone wrong (or right, depending the way you look at these things) has grown arms and legs and staggered away from the original intention to ta
Weblog: Doug's Oracle Blog
Tracked: Apr 20, 20:20
PingBack
Weblog: jonathanlewis.wordpress.com
Tracked: May 15, 10:24
PingBack
Weblog: karlarao.wordpress.com
Tracked: Aug 15, 12:59
PingBack
Weblog: www.oraclerealworld.com
Tracked: Jul 25, 16:32

Comments
Display comments as (Linear | Threaded)

No comments


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

A couple of posts about Incremental Stats confusion

Part 1
Part 2

Comments

personal blog about Moving Sideways
Wed, 01.06.2016 17:34
That is a good tip particularl y to those fresh to the blogos phere. Short [...]
odziezprestige.pl about Moving Sideways
Wed, 01.06.2016 16:07
Please let me know if you're l ooking for a article writer fo r your site. [...]
Doug Burns about Moving Sideways
Tue, 10.05.2016 21:43
Oh, I won't give it that long unless I enjoy it ;-)

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml dbms_stats
xml Direct Path Reads
xml Fun
xml Grid Control
xml hotsos 2010
xml listener
xml locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml partitions
xml Patching
xml Swingbench
xml The Reality Gap
xml Time Matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl