Mar 30: Diagnosing Locking Problems using ASH - Part 1
Some features in this post require a Diagnostics Pack license.
There are few better illustrations of the utility of ASH data than the ability to detect and diagnose locking problems*, even after they've occurred. Because ASH is sampling all active sessions all of the time, you should have information both for the blocked and blocking sessions, even when problems are first reported after they have disappeared and the sessions have disconnected. This has proved very useful to me in solving several production performance problems in the past, so I usually try to demonstrate it during the course.
Something cropped up when one of the demos went wrong last week and it took me a minute to work out what the problem was. I made a simple mistake in the way I approached the demo, but I decided to blog about it because I think it reinforces the nature and inherent limitations of ASH data.
This is a recreation of the demo, which captures the gist of what went 'wrong'.
Session 1 - logged in as TESTUSER
Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.
Session 2 - logged in as TESTUSER
Session 2 hangs, waiting for session 1 to release the lock
Session 1 - commits, to release the lock
Session 2 - Is now able to acquire the lock, so locks the row and returns the result
So I've just created 'a locking problem' that occurred in the past but has been resolved now. At this point, there are a few different things we could look at. In last week's class I dived straight into the contents of V$ACTIVE_SESSION_HISTORY at the command line but I'll take the GUI approach first here. Here is how the Top Activity screen in DB/Grid Control looks. (Click on the thumbnails to see the full-sized images)

The period of the test is highlighted and I can see the spike for one active session, which is reflected in the Top Sessions panel (Session 137 is responsible for most of the activity) and I can see from the Top SQL panel that a single SQL statement is responsible for most of the activity. The majority of samples are waiting on "Application" events, with some "User I/O" as well. Looking at the chart, I can see that the User I/O activity followed the Application waits.
If I drill down into the Application events they are all "enq: TX - row lock contention", all associated with the same Session and SQL statement. (Note the more descriptive 10g event description, rather than "enqueue" in previous versions.)

Next I'll drill down into the SQL statement.

As well as the lock waits followed by db file scattered reads, I can see the SQL statement is the one I ran in session 2.
Next I'll drill down into the Session details for the blocked session and we'll see the first sign of limitations.

On the Blocking Tree tab, I find 'No sessions found to be currently blocking other sessions". That's true now because Session 1 released the lock and Session 2 (SID 137) was able to acquire it, but for the period of time the ASH data covers, there was a session blocking this one.
Note that if I had run this demo *before* Session 1 commited, this screen would have highlighted the problem, like this.

This was taken from another run of the test, but it should be obvious that the idle Session 1 (SID 150) is blocking Session 2 (SID 147) and that there's a handly "Kill Session" button! (Actually, that "Idle" is a giveaway to what's coming next.)
I can look at the ASH data for the blocked session in more detail by selecting the "Activity" tab and then changing the view to "Show Raw Data". In reverse time order (the way ASH data is returned if you don't order it), it's clear that there were db file scattered reads against TESTUSER.TEST_TAB1, preceded by those lock waits.

But that's only showing the ASH data for one session and doesn't include the useful BLOCKING_SESSION% columns that help diagnose locking problems, so in the next part I'll switch to looking at the raw data in V$ACTIVE_SESSION_HISTORY.
*Why am I still seeing locking problems? I think it's because I often have to support Third-Party "Database-independent" (yuck!) applications, some of which insist on implementing their own locking mechanisms. Sigh.
There are few better illustrations of the utility of ASH data than the ability to detect and diagnose locking problems*, even after they've occurred. Because ASH is sampling all active sessions all of the time, you should have information both for the blocked and blocking sessions, even when problems are first reported after they have disappeared and the sessions have disconnected. This has proved very useful to me in solving several production performance problems in the past, so I usually try to demonstrate it during the course.
Something cropped up when one of the demos went wrong last week and it took me a minute to work out what the problem was. I made a simple mistake in the way I approached the demo, but I decided to blog about it because I think it reinforces the nature and inherent limitations of ASH data.
This is a recreation of the demo, which captures the gist of what went 'wrong'.
Session 1 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where rownum=1 for update; PK_ID ---------- 4051
Session 1 holds a lock on that row and I'll use the PK_ID that was returned from the query to make sure I try to lock the same row in the next session.
Session 2 - logged in as TESTUSER
TESTUSER@TEST1020> select pk_id from test_tab1 where pk_id=4051 for update;
Session 2 hangs, waiting for session 1 to release the lock
Session 1 - commits, to release the lock
TESTUSER@TEST1020> commit; Commit complete.
Session 2 - Is now able to acquire the lock, so locks the row and returns the result
PK_ID ---------- 4051
So I've just created 'a locking problem' that occurred in the past but has been resolved now. At this point, there are a few different things we could look at. In last week's class I dived straight into the contents of V$ACTIVE_SESSION_HISTORY at the command line but I'll take the GUI approach first here. Here is how the Top Activity screen in DB/Grid Control looks. (Click on the thumbnails to see the full-sized images)
The period of the test is highlighted and I can see the spike for one active session, which is reflected in the Top Sessions panel (Session 137 is responsible for most of the activity) and I can see from the Top SQL panel that a single SQL statement is responsible for most of the activity. The majority of samples are waiting on "Application" events, with some "User I/O" as well. Looking at the chart, I can see that the User I/O activity followed the Application waits.
If I drill down into the Application events they are all "enq: TX - row lock contention", all associated with the same Session and SQL statement. (Note the more descriptive 10g event description, rather than "enqueue" in previous versions.)
Next I'll drill down into the SQL statement.
As well as the lock waits followed by db file scattered reads, I can see the SQL statement is the one I ran in session 2.
Next I'll drill down into the Session details for the blocked session and we'll see the first sign of limitations.
On the Blocking Tree tab, I find 'No sessions found to be currently blocking other sessions". That's true now because Session 1 released the lock and Session 2 (SID 137) was able to acquire it, but for the period of time the ASH data covers, there was a session blocking this one.
Note that if I had run this demo *before* Session 1 commited, this screen would have highlighted the problem, like this.
This was taken from another run of the test, but it should be obvious that the idle Session 1 (SID 150) is blocking Session 2 (SID 147) and that there's a handly "Kill Session" button! (Actually, that "Idle" is a giveaway to what's coming next.)
I can look at the ASH data for the blocked session in more detail by selecting the "Activity" tab and then changing the view to "Show Raw Data". In reverse time order (the way ASH data is returned if you don't order it), it's clear that there were db file scattered reads against TESTUSER.TEST_TAB1, preceded by those lock waits.
But that's only showing the ASH data for one session and doesn't include the useful BLOCKING_SESSION% columns that help diagnose locking problems, so in the next part I'll switch to looking at the raw data in V$ACTIVE_SESSION_HISTORY.
*Why am I still seeing locking problems? I think it's because I often have to support Third-Party "Database-independent" (yuck!) applications, some of which insist on implementing their own locking mechanisms. Sigh.
Posted by Doug Burns
Comments: (0)
Trackbacks: (3)
Related entries by tags:
Network Events in ASH
Alternative Pictures Demo
That Pictures demo in full
Diagnosing Locking Problems using ASH/LogMiner – The End
Diagnosing Locking Problems using ASH/LogMiner – Part 9
Diagnosing Locking Problems using ASH/LogMiner – Part 8
Diagnosing Locking Problems using ASH/LogMiner – Part 7
Diagnosing Locking Problems using ASH – Part 6
Diagnosing Locking Problems using ASH – Part 5
Diagnosing Locking Problems using ASH – Part 4
Network Events in ASH
Alternative Pictures Demo
That Pictures demo in full
Diagnosing Locking Problems using ASH/LogMiner – The End
Diagnosing Locking Problems using ASH/LogMiner – Part 9
Diagnosing Locking Problems using ASH/LogMiner – Part 8
Diagnosing Locking Problems using ASH/LogMiner – Part 7
Diagnosing Locking Problems using ASH – Part 6
Diagnosing Locking Problems using ASH – Part 5
Diagnosing Locking Problems using ASH – Part 4


I tried the graphical approach to tracking down the root cause of a locking problem in the last part. Now let's look at the underlying ASH data. I ran the same test after bouncing my test instance to start from a clean slate. So, here it is again. Ses
Tracked: Mar 30, 22:07
"Toto, I've a feeling we're not in Kansas any more."What started as a simple write-up of a course demo gone wrong (or right, depending the way you look at these things) has grown arms and legs and staggered away from the original intention to ta
Tracked: Apr 20, 21:20
(With my tongue causing a minor gash on the inside of my cheek ....)I'm working on an application that seems to suffer constant long waits on log file parallel write and, consequently, log file sync. I thought that the 54 ms log file parallel writes where
Tracked: Apr 22, 02:36
Tracked: May 15, 11:24