Doug's Oracle Blog

Entries tagged as Locking

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Entries tagged as Locking

Related tags
ash

Apr 17: Diagnosing Locking Problems using ASH – Part 5

... and so it continues.

Coincidentally, the subject of tracking down locking problems after they occurred cropped up on the Oak Table mailing list just after the last post. Several suggestions were offered but I think the nearest and most detailed suggestion was that proposed by Kyle Hailey, ex-Oracle and now working Embarcadero. There were one or two problems with the original email which I've fixed up here ...

For the wait event class I was teaching I wanted to show an example of taking ASH data and finding the blocking statement in the log miner data, but found this was less than reliable. Sometimes it would work sometimes it didn't. Not sure what the issue was. I think dumping the redo log itself would be more dependable. I think logminer might not report everything - ie I think there is some filtering going on.

Anyway here are some steps that might be of use to people. I get  XID but you probably don't need it, but I recall the XID being useful in doing the redo log dumps


Setup

SYS@TEST1020> set lines 120
SYS@TEST1020> column member format a60
SYS@TEST1020>
SYS@TEST1020> alter system switch logfile;

System altered.

SYS@TEST1020> select group#, thread#, status from v$log;

    GROUP#    THREAD# STATUS
---------- ---------- ----------------
         1          1 CURRENT
         2          1 ACTIVE
         3          1 ACTIVE

SYS@TEST1020> select group#, member from v$logfile;

    GROUP# MEMBER
---------- ------------------------------------------------------------
         1 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG
         2 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO02.LOG
         3 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO03.LOG

SYS@TEST1020> exec dbms_logmnr.add_logfile('&&my_member');
Enter value for my_member: C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG

PL/SQL procedure successfully completed.

SYS@TEST1020> exec dbms_logmnr.start_logmnr(options => dbms_logmnr.dict_from_online_catalog);

PL/SQL procedure successfully completed.

Test Case


Session 1
TESTUSER@TEST1020> create table test_tab1
                      as select object_id pk_id, object_name from all_objects;

Table created.

TESTUSER@TEST1020> select rowid, object_name from test_tab1 where pk_id=313;

ROWID              OBJECT_NAME
------------------ ------------------------------
AAAN2NAAKAAAaBLAEX SYSTEM_PRIVILEGE_MAP

TESTUSER@TEST1020> update test_tab1 set object_name='SESSION 1' where pk_id=313;

1 row updated.

Session 2
TESTUSER@TEST1020> update test_tab1 set object_name='SESSION 2' where pk_id=313;

So Session 1 is now blocking session 2 ... have a look in V$TRANSACTION
SYS@TEST1020> select start_time, xid, xidusn, xidslot, xidsqn, start_scn, 
              to_char(start_scn, 'XXXXXXXXXX') 
              from v$transaction order by start_time;

START_TIME           XID                  XIDUSN    XIDSLOT     XIDSQN  START_SCN TO_CHAR(STA
-------------------- ---------------- ---------- ---------- ---------- ---------- -----------
04/20/09 20:46:39    0A00240064180000         10         36       6244          0           0

Commit Session 1
TESTUSER@TEST1020> commit;

Commit complete.

... so that Session 2 is released.
1 row updated.

and take a look in V$TRANSACTION at the new transaction details
SYS@TEST1020> select start_time, xid, xidusn, xidslot, xidsqn, start_scn, to_cha
r(start_scn, 'XXXXXXXXXX') from v$transaction order by start_time;

START_TIME           XID                  XIDUSN    XIDSLOT     XIDSQN  START_SCN TO_CHAR(STA
-------------------- ---------------- ---------- ---------- ---------- ---------- -----------
04/20/09 20:46:54    08002000681F0000          8         32       8040    9061094      8A42E6

So the blocking lock problem has now cleared. Let's start gathering information to see if we can identify the problem.

Analysis

Parameters for 'enq: TX - Row Lock Contention'
    P2=usn<<16||slot
    P3=Sequence
     =>  XID

Let's find our WAITER and the BLOCKING XID
SYS@TEST1020> select * from
  2     (select lpad(ltrim(to_char(p2,'XXXXXX')),6,'0')||
  3             '00'||
  4             ltrim(to_char(mod(p3,256),'XX'))||
  5             ltrim(to_char(trunc(p3/256),'XX'))||
  6             '0000' block_xid,
  7             to_char(p2,'XXXXXXXX') p2hex,
  8             to_char(p3,'XXXXXXXX') p3hex,
  9             trunc(p2/65536) usn,
 10             mod(p2,65536) slot,
 11             p3 sqn, xid wait_xid
 12     from v$active_session_history
 13     where event like 'enq: T%'
 14     order by sample_time desc) where rownum < 2;

BLOCK_XID          P2HEX     P3HEX            USN       SLOT        SQN WAIT_XID

------------------ --------- --------- ---------- ---------- ---------- --------
0A00240064180000       A0024      1864         10         36       6244


Now let's use logminer to find out what the BLOCKING SQL was
SYS@TEST1020> select username, session# sid, serial# , sql_redo from v$logmnr_co
ntents where  XID = '&&blocking_xid'
  2  /
Enter value for blocking_xid: 0A00240064180000
old   1: select username, session# sid, serial# , sql_redo from v$logmnr_content
s where  XID = '&&blocking_xid'
new   1: select username, session# sid, serial# , sql_redo from v$logmnr_content
s where  XID = '0A00240064180000'

USERNAME                              SID    SERIAL#
------------------------------ ---------- ----------
SQL_REDO
--------------------------------------------------------------------------------
UNKNOWN                                 0          0
commit;

Oh :-(

Well, how about trying a different approach?
SYS@TEST1020> select distinct xid , xidusn, xidslt, xidsqn, username, session# s
id, serial# , sql_redo
  2  from v$logmnr_contents
  3  where timestamp > sysdate- &minutes/(60*24)
  4  and xidusn=10
  5  and xidslt=36;
Enter value for minutes: 10
old   3: where timestamp > sysdate- &minutes/(60*24)
new   3: where timestamp > sysdate- 10/(60*24)

XID                  XIDUSN     XIDSLT     XIDSQN USERNAME
        SID    SERIAL#
---------------- ---------- ---------- ---------- ------------------------------
 ---------- ----------
SQL_REDO
--------------------------------------------------------------------------------
----------------------------------------
0A00240064180000         10         36       6244 UNKNOWN
          0          0
commit;

Which isn't too helpful - I can't see any information about Session 1's row update. If anyone has any ideas why, I'm all ears.

In the next post I'll look at a redo log file dump to see if that has any more information.
Posted by Doug Burns Comments: (5) Trackbacks: (4)
Defined tags for this entry: ASH, Locking

Apr 9: Diagnosing Locking Problems using ASH – Part 4

Some features in this post require a Diagnostics Pack license.

No sooner had I finished part 3 with some conclusions than I thought of another example I should have included and then someone else made a comment in an email which suggested another. (Thanks, JB!)
"It's probably worth pointing out that NO TOOL can claim to identify the blocking SID->SQL in such a case as there is no relationship kept in Oracle between locks acquired by users and the SQL that acquired the locks"

I’ll try to deal with that point and some other thoughts I had here in a further example. For this example, I decided to suggest how to create a quick table so that you can test this out for yourself.

SQL> create user testuser identified by testuser
  2  default tablespace USERS
  3  temporary tablespace TEMP;

User created.

SQL> grant create session, create table to testuser;

Grant succeeded.

SQL> connect testuser/testuser

Connected.

SQL> create table test_tab1 as select object_id pk_id, object_name from all_objects;

Table created.

As easy as that.

In the last part I showed that the SQL_ID for the last sample of a blocking session immediately before it went idle might not be the SQL statement holding the lock. In that case, it was fairly obvious that it wasn’t because it was a simple query that couldn’t have been responsible. However, it’s likely to be even more confusing if the SQL_ID captured by the ASH sample *looks* like it’s responsible, but isn’t, and the previous example didn't illustrate that clearly.

Session 1 – Connected as TESTUSER
SQL> select pk_id, object_name from test_tab1 order by pk_id desc for update;

<Trimmed>

       890 DBMS_STANDARD
       889 DBMS_STANDARD

     PK_ID OBJECT_NAME
---------- ------------------------------
       887 STANDARD
       886 RE$NV_LIST
       605 MAP_OBJECT
       319 STMT_AUDIT_OPTION_MAP
       317 STMT_AUDIT_OPTION_MAP
       316 TABLE_PRIVILEGE_MAP
       314 TABLE_PRIVILEGE_MAP
       313 SYSTEM_PRIVILEGE_MAP
       311 SYSTEM_PRIVILEGE_MAP
       259 DUAL
       258 DUAL

4477 rows selected.

SQL> rollback;

Rollback complete.

SQL> select pk_id from test_tab1 where object_name='SYSTEM_PRIVILEGE_MAP' for update;

     PK_ID
----------
       311
       313

SQL> rollback;

Rollback complete.

SQL> select pk_id from test_tab1 where pk_id=313 for update;

     PK_ID
----------
       313

SQL>

So session 1 has executed three different queries, all of which lock one or more rows including the row with PK_ID=313, has rolled back the first two (releasing the locks) and has just that row locked now.

Session 2 – Connected as TESTUSER
SQL> select pk_id from test_tab1 where pk_id=313 for update;

Session 2 hangs, waiting for the lock.

Session 1 – Connected as TESTUSER
SQL> rollback;

Rollback complete.

So the lock is released and session 2 acquires the lock and then releases it

Session 2 – Connected as Testuser
     PK_ID
----------
       313

SQL> rollback;

Rollback complete.

OK, time to look at the ASH data for the last two minutes for TESTUSER sessions. (Note that I have added each session’s serial# to the query. Graham Wood corrected a statement I made in the last post …

“who is to say that that session 148 is the same session as the blocking session 148 and not an earlier session that has disconnected?”

… by pointing out that session_serial# would allow me to differentiate between different sessions where SID=148. I’ve updated the previous post to that effect.)

break on MY_TIME
column my_sid format 999
column my_ser format 99999
column my_state format a30
column my_blkr format 999
select to_char(a.sample_time, 'HH24:MI:SS') MY_TIME,a.session_id MY_SID,a.session_serial# MY_SER,
        DECODE(a.session_state, 'WAITING' ,a.event, a.session_state) MY_STATE,a.xid, a.sql_id,
        a.blocking_session MY_BLKR
from v$active_session_history a, dba_users u
where u.user_id = a.user_id
and u.username = 'TESTUSER'
and a.sample_time > SYSTIMESTAMP-(2/1440);

MY_TIME  MY_SID MY_SER MY_STATE                       XID              SQL_ID        MY_BLKR
-------- ------ ------ ------------------------------ ---------------- ------------- -------
12:53:01    300   2739 enq: TX - row lock contention                   2pa7rnqf65gj3     304
12:53:00    300   2739 enq: TX - row lock contention                   2pa7rnqf65gj3     304
12:52:59    300   2739 enq: TX - row lock contention                   2pa7rnqf65gj3     304
12:52:50    304   1445 ON CPU                         0006000E0000884D 95k0s60jznz76

So what statement is SQL_ID ‘95k0s60jznz76’?

SQL> select sql_text from v$sql where sql_id='95k0s60jznz76';

SQL_TEXT
-----------------------------------------------------------------------------
select pk_id, object_name from test_tab1 order by pk_id desc for update

Let’s face it, if you were to look at this data, how likely is it that you would think that that statement was responsible for blocking session 2, when it wasn't?
Posted by Doug Burns Comments: (0) Trackbacks: (2)
Defined tags for this entry: ash, locking

Mar 31: Diagnosing Locking Problems using ASH - Part 3

Some features in this post require a Diagnostics Pack license.

In the last part I described a locking scenario where the blocking session had only executed one SQL statement that was quick enough to avoid being sampled by ASH and is now inactive. As a result, there is no ASH data for the blocking session, so diagnosing the root cause of the problem after the event is next to impossible. In fairness, it would be impossible anyway without ASH or similar instrumentation.

However, as I suggested in that post, I would argue that's a fairly unusual situation arising out of the nature of the demo and that the blocking session would normally appear in ASH. So let's try a more realistic situation where the blocking session does more work.

Session 1 - logged in as TESTUSER
TESTUSER@TEST1020> select count(*) from test_tab1;

  COUNT(*)
----------
   1024000

TESTUSER@TEST1020> select pk_id from test_tab1 where rownum=1 for update;

     PK_ID
----------
      4051

So this time, I ran a simple query before the SELECT FOR UPDATE. Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.

Session 2 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where pk_id=4051 for update;

Session 2 hangs, waiting for session 1 to release the lock

Session 1 - commits, to release the lock
TESTUSER@TEST1020> commit;

Commit complete.

Session 2 - Is now able to acquire the lock, so locks the row and returns the result
     PK_ID
----------
      4051

So everything about this test is the same as the last post, but Session 1 is doing a little more work than the simple SELECT FOR UPDATE. Let's look at the ASH data for all TESTUSER sessions.

SYS@TEST1020> select a.sample_id,a.sample_time,a.session_id,
  2             a.session_state,a.event,a.sql_id,
  3     a.blocking_session,a.blocking_session_status
  4  from v$active_session_history a, dba_users u
  5  where u.user_id = a.user_id
  6* and u.username = 'TESTUSER';

 SAMPLE_ID
----------
SAMPLE_TIME
---------------------------------------------------------------------------
SESSION_ID SESSION
---------- -------
EVENT                            SQL_ID        BLOCKING_SESSION BLOCKING_SE
-------------------------------- ------------- ---------------- -----------
    445637
30-MAR-09 22.23.59.263
       147 ON CPU
                                 0ut22yp7mh229                  NOT IN WAIT

    445636
30-MAR-09 22.23.58.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445635
30-MAR-09 22.23.57.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445634
30-MAR-09 22.23.56.253
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445633
30-MAR-09 22.23.55.252
       147 WAITING
enq: TX - row lock contention    0ut22yp7mh229              148 VALID

    445611
30-MAR-09 22.23.33.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445610
30-MAR-09 22.23.32.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445609
30-MAR-09 22.23.31.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER

    445608
30-MAR-09 22.23.30.181
       148 WAITING
db file scattered read           cda14zb83bb5u                  NO HOLDER


9 rows selected.

This time I'm going to work my way from the top of the data to the bottom (or from the most recent data to the oldest).

The most recent sample shows session 147 on CPU but immediately prior to that, I can see that it was stuck waiting for a TX row lock since some time between 22:23:54 and 22:23:55 and that the blocking session was 148. Things are a little better this time because there are samples for the blocking session 148 but, if I look at the ASH sample immediately prior to the locking problem, session 148 is executing SQL_ID "cda14zb83bb5u'. If I check which SQL statement that is

SYS@TEST1020> select sql_text from v$sql where sql_id='cda14zb83bb5u';

SQL_TEXT
--------------------------------------------------------------------------------
select count(*) from test_tab1

Which isn't the statement that was holding the lock. I've hit the same problem as the previous example because the SELECT FOR UPDATE activity wasn't sampled.

The up-side to this example is that I can at least see some information about the blocking session in the lead-up to the locking problem. That could be very useful in identifying who the user was, what application they were running, which module they were executing and so on. (Then again, who is to say that that session 148 is the same session as the blocking session 148 and not an earlier session that has disconnected? Updated later - as Graham Wood pointed out, I could have used SESSION_SERIAL# to ensure that I am looking at the right session 148.)

The down-side is that it's easier to fool yourself into thinking you're looking at the root cause of the problem, when you aren't. In this case, it wouldn't take long to realise that the simple query is not blocking the other session and that there must be something missing, but imagine if the SQL had been a transactional statement that could be the source of the problem, but wasn't?

Conclusion
I started off the first post by saying "There are few better illustrations of the utility of ASH data than the ability to detect and diagnose locking problems, even after they've occurred" and I still think that's true. Diagnosing concurrency problems without having access to information on all sessions is next to impossible and ASH's low-cost sampling approach to recording session activity is the very reason it can be switched on all the time, waiting for problems to occur. Of course, the nature of sampling implies an approximate view of reality (and I've deliberately picked examples here to illustrate the gaps in the approximation).

- If the problem is happening right now, then you will have all the information you need, whether you have ASH data or not

- If the blocking SQL statement runs for more than a second because it's resource intensive or is experiencing it's own bottleneck, then you'll have good samples of the root cause.

- If the blocking SQL statement runs in less than a second then you might have an ASH sample of it or you might not, but then you might have samples covering previous actions that could help diagnose the problem.

- Or you could be unlucky, as I was last week, and have no samples at all :-(
I also said in the first post that ASH data "has proved very useful to me in solving several production performance problems in the past" and that still holds true. It just works better with some problems than others and you need to be aware of it's limitations if you're going to use it.
Posted by Doug Burns Comments: (0) Trackbacks: (2)
Defined tags for this entry: ash, locking

Mar 30: Diagnosing Locking Problems using ASH - Part 2

Some features in this post require a Diagnostics Pack license.

I tried the graphical approach to tracking down the root cause of a locking problem in the last part. Now let's look at the underlying ASH data. I ran the same test after bouncing my test instance to start from a clean slate. So, here it is again.

Session 1 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where rownum=1 for update;

     PK_ID
----------
      4051

Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.

Session 2 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where pk_id=4051 for update;

Session 2 hangs, waiting for session 1 to release the lock. So I've just created 'a locking problem'. Let's get some details about the two sessions.

SYS@TEST1020> select s.sid, s.state, s.event, s.sql_id, l.type, l.lmode
  2  from v$session s, v$lock l
  3  where s.sid = l.sid  (+)
  4  and s.username='TESTUSER'
  5  order by s.sid;

       SID STATE       EVENT                            SQL_ID        TY LMODE
---------- ----------- -------------------------------- ------------- -- -----
       150 WAITING     enq: TX - row lock contention    0ut22yp7mh229 TM     3
       150 WAITING     enq: TX - row lock contention    0ut22yp7mh229 TX     0
       157 WAITING     SQL*Net message from client                    TX     6
       157 WAITING     SQL*Net message from client                    TM     3

Session 1 is Session ID 157 and currently has the TX lock against the table (mode 6), whilst Session 2 is Session ID 150 and is waiting for the lock to be released.

Session 1 - commits, to release the lock
TESTUSER@TEST1020> commit;

Commit complete.

Session 2 - Is now able to acquire the lock, so locks the row and returns the result
     PK_ID
----------
      4051

... and we can see that Session 2 (SID 150) now has the TX lock, session 157 is holding no locks and neither session is waiting for the other any more - they're both idle (SQL*Net message from client).

SYS@TEST1020> /

       SID STATE       EVENT                            SQL_ID        TY LMODE
---------- ----------- -------------------------------- ------------- -- -----
       150 WAITING     SQL*Net message from client                    TM     3
       150 WAITING     SQL*Net message from client                    TX     6
       157 WAITING     SQL*Net message from client

How does the ASH data look? I'll query all ASH rows for TESTUSER sessions.

SYS@TEST1020> select a.sample_id,a.sample_time,a.session_id,a.event,
  2             a.session_state,a.event,a.sql_id,
  3     a.blocking_session,a.blocking_session_status
  4  from v$active_session_history a, dba_users u
  5  where u.user_id = a.user_id
  6* and u.username = 'TESTUSER';

 SAMPLE_ID
----------
SAMPLE_TIME
---------------------------------------------------------------------------
SESSION_ID EVENT                            SESSION
---------- -------------------------------- -------
EVENT                            SQL_ID        BLOCKING_SESSION BLOCKING_SE
-------------------------------- ------------- ---------------- -----------
    432483
30-MAR-09 14.59.37.509
       150                                  ON CPU
                                 0ut22yp7mh229                  NOT IN WAIT

    432482
30-MAR-09 14.59.36.509
       150                                  ON CPU
                                 0ut22yp7mh229                  NOT IN WAIT

    432481
30-MAR-09 14.59.35.509
       150 db file scattered read           WAITING
db file scattered read           0ut22yp7mh229                  NO HOLDER

    432480
30-MAR-09 14.59.34.509
       150 db file scattered read           WAITING
db file scattered read           0ut22yp7mh229                  NO HOLDER

    432479
30-MAR-09 14.59.33.499
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432478
30-MAR-09 14.59.32.499
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432477
30-MAR-09 14.59.31.499
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432476
30-MAR-09 14.59.30.489
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432475
30-MAR-09 14.59.29.489
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432474
30-MAR-09 14.59.28.489
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432473
30-MAR-09 14.59.27.489
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432472
30-MAR-09 14.59.26.479
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID

    432471
30-MAR-09 14.59.25.479
       150 enq: TX - row lock contention    WAITING
enq: TX - row lock contention    0ut22yp7mh229              157 VALID


13 rows selected.

Working my way from bottom to top (oldest data to most recent), I can see that Session ID 150 (that's Session 2, remember) started off waiting for the row level lock and that lasted for 9 samples and then it started to read data (db file scattered read). I can also see that the session ID of the blocking session was 157. Excellent - just what I need to get to the bottom of this.

But there's a problem, isn't there? Where are the samples for SID 157, which is the root cause of the problem? If we can't see any samples for that session, how can we know what it was doing?

That's exactly what happened when I was demonstrating this last week and, once I'd pondered where the ASH data was for a little while, it hit me. The SQL statement that the blocking session executed was over so quickly, that the ASH sampling process missed it completely. The query started, selected the data and locked the row in less than a second, so it never appeared as an Active session as far as ASH is concerned. In fact, it was connected throughout the demo but was waiting on SQL*Net message from client and, as an Idle session, it wasn't included in the ASH data. Yes, it was Idle, but it was also the root cause of this problem. Of course if I was looking at this problem while it was happening then I could use the BLOCKING_SESSION information from the blocked session's ASH samples to look at V$SESSION, V$LOCK etc and work out what was going on, but that would be useless for diagnosing past locking problems with sessions that are now disconnected, which was the point of the demo.

It was very obvious once it hit me and ASH is behaving exactly as designed and advertised. Just so long as you know that when ASH excludes sessions because they're Idle at sample points, it might exclude the very sessions that are at the root cause of the problem.

During the class, I suggested that this was a slightly artificial test because *all* that the blocking session had done was login, run a very quick SELECT FOR UPDATE and then sit Idle. In practise, most genuine user application sessions are likely to be generating enough activity that they would have appeared somewhere in the ASH samples. In fact, as I've been running tests while writing this, ASH captured the activity of the blocking session three or four times in a row, which was a little frustrating! ;-)

The design of ASH depends on the likelihood that there is enough activity to capture the essence of each session's activity (and at very low cost) so maybe this isn't such a big problem? To put it another way, the blocking session has hardly consumed any Database Time to speak of so it doesn't really need to be optimised, does it? It's the blocked session that needs to be 'fixed' and there are plenty of samples for that. So maybe we just shouldn't use the data this way?

If the blocking session had been active for a little longer - if it was experiencing it's own bottleneck or was a long-running statement or transaction - then there would have been ASH samples for it. The next post will look at an example of that.
Posted by Doug Burns Comments: (3) Trackbacks: (2)
Defined tags for this entry: ash, locking

Mar 30: Diagnosing Locking Problems using ASH - Part 1

Some features in this post require a Diagnostics Pack license.

There are few better illustrations of the utility of ASH data than the ability to detect and diagnose locking problems*, even after they've occurred. Because ASH is sampling all active sessions all of the time, you should have information both for the blocked and blocking sessions, even when problems are first reported after they have disappeared and the sessions have disconnected. This has proved very useful to me in solving several production performance problems in the past, so I usually try to demonstrate it during the course.

Something cropped up when one of the demos went wrong last week and it took me a minute to work out what the problem was. I made a simple mistake in the way I approached the demo, but I decided to blog about it because I think it reinforces the nature and inherent limitations of ASH data.

This is a recreation of the demo, which captures the gist of what went 'wrong'.

Session 1 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where rownum=1 for update;

     PK_ID
----------
      4051

Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.

Session 2 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where pk_id=4051 for update;

Session 2 hangs, waiting for session 1 to release the lock

Session 1 - commits, to release the lock
TESTUSER@TEST1020> commit;

Commit complete.

Session 2 - Is now able to acquire the lock, so locks the row and returns the result
     PK_ID
----------
      4051

So I've just created 'a locking problem' that occurred in the past but has been resolved now. At this point, there are a few different things we could look at. In last week's class I dived straight into the contents of V$ACTIVE_SESSION_HISTORY at the command line but I'll take the GUI approach first here. Here is how the Top Activity screen in DB/Grid Control looks. (Click on the thumbnails to see the full-sized images)



The period of the test is highlighted and I can see the spike for one active session, which is reflected in the Top Sessions panel (Session 137 is responsible for most of the activity) and I can see from the Top SQL panel that a single SQL statement is responsible for most of the activity. The majority of samples are waiting on "Application" events, with some "User I/O" as well. Looking at the chart, I can see that the User I/O activity followed the Application waits.

If I drill down into the Application events they are all "enq: TX - row lock contention", all associated with the same Session and SQL statement. (Note the more descriptive 10g event description, rather than "enqueue" in previous versions.)



Next I'll drill down into the SQL statement.



As well as the lock waits followed by db file scattered reads, I can see the SQL statement is the one I ran in session 2.

Next I'll drill down into the Session details for the blocked session and we'll see the first sign of limitations.



On the Blocking Tree tab, I find 'No sessions found to be currently blocking other sessions". That's true now because Session 1 released the lock and Session 2 (SID 137) was able to acquire it, but for the period of time the ASH data covers, there was a session blocking this one.

Note that if I had run this demo *before* Session 1 commited, this screen would have highlighted the problem, like this.



This was taken from another run of the test, but it should be obvious that the idle Session 1 (SID 150) is blocking Session 2 (SID 147) and that there's a handly "Kill Session" button! (Actually, that "Idle" is a giveaway to what's coming next.)

I can look at the ASH data for the blocked session in more detail by selecting the "Activity" tab and then changing the view to "Show Raw Data". In reverse time order (the way ASH data is returned if you don't order it), it's clear that there were db file scattered reads against TESTUSER.TEST_TAB1, preceded by those lock waits.



But that's only showing the ASH data for one session and doesn't include the useful BLOCKING_SESSION% columns that help diagnose locking problems, so in the next part I'll switch to looking at the raw data in V$ACTIVE_SESSION_HISTORY.

*Why am I still seeing locking problems? I think it's because I often have to support Third-Party "Database-independent" (yuck!) applications, some of which insist on implementing their own locking mechanisms. Sigh.
Posted by Doug Burns Comments: (0) Trackbacks: (3)
Defined tags for this entry: ash, locking
« previous page   (Page 2 of 2, totaling 10 entries)   next page »

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl