Entries tagged as ash
Apr 23: Diagnosing Locking Problems using ASH/LogMiner – Part 8
So what about those SELECT FOR UPDATEs?
I know that they’ll generate redo entries and so something should appear in both log file dumps and the LogMiner output, but what exactly will appear? (This is all on Oracle 10.2.0.4)
For this post I’ll go back to the example from Part 4, where Session 1 performs three different SELECT FOR UPDATE statements against the same table, TEST_TAB1, and rolls the first two back before leaving the third as the statement that’s blocking Session 2. i.e. Three possible guilty parties in very quick succession, which makes the exact source harder to find. This time, I granted select privileges on V$TRANSACTION to TESTUSER, so that we could take a quick peek at the contents after each SELECT FOR UPDATE. I've also set up LogMiner access in the SYS session, as in the last couple of posts.
Session 1 – Connected as TESTUSER
So Session 1 has executed three different queries, all of which lock one or more rows including the row with PK_ID=313, has rolled back the first two (releasing the locks) and has just PK_ID=313 locked now.
Session 2 – Connected as TESTUSER
Session 2 hangs, waiting for the lock
Session 1 - Connected as TESTUSER
The lock is released and Session 2 acquires the lock and then releases it.
Session 2 - Connected as TESTUSER
Taken as a whole, the time-line looks like this
Transaction ID Session 1 Activity Transaction ID Session 2 Activity
0003001100008615 Whole Table Locked
Locks Released
0002000100008DAA Two Rows Locked
Locks Released
000900110000891B PK_ID=313 Locked Waiting to lock PK_ID=313
Lock Released 00080005000081D3 PK_ID=313 Locked
OK, let’s see what LogMiner makes of this. First, let’s look for any entries associated with the specific transaction that was the blocker.
Now that's a bit interesting because I think we'll find that 'Unsupported' operation is the third SELECT FOR UPDATE (which blocks Session 2) but the Transaction ID looks wrong. Oh, and why is the SQL_REDO 'Unsupported'? Well, would you really want to redo a SELECT FOR UPDATE that merely locks rows?
Next I’ll try displaying all operations against TEST_TAB1 in the past 5 minutes. I’ll group the results so we only see the discrete actions and how many entries there are for each.
Good - that's starting to look more like it. I can see all 4 discrete SELECT FOR UPDATE transactions from the test (the rollback operation returned by the previous query isn't necessarily specific to TEST_TAB1 so I'm not surprised it doesn't appear). The XIDs still look a little screwy and you're trusting me at this stage that these are SELECT FOR UPDATEs. Notice the number of entries for the different transactions - two single row updates, one 2 row update and a multiple row update.
However, the transaction IDs for three of the transactions have the wrong sequence number of FFFFFFFF and even this output is the best I've seen. I've run this several times and sometimes it's captured by LogMiner, sometime it isn't. I appreciate that's a little vague, but I have very little confidence in some of the results I've seen on different tests.
I dumped the log file for this example, so I'll look at that in the next part.
Believe me when I say I'm aware of how far I've strayed from identifying the blocking SQL statement (you won't be getting that from redo entries) but I suppose I might as well carry on for one more post, maybe two.
I know that they’ll generate redo entries and so something should appear in both log file dumps and the LogMiner output, but what exactly will appear? (This is all on Oracle 10.2.0.4)
For this post I’ll go back to the example from Part 4, where Session 1 performs three different SELECT FOR UPDATE statements against the same table, TEST_TAB1, and rolls the first two back before leaving the third as the statement that’s blocking Session 2. i.e. Three possible guilty parties in very quick succession, which makes the exact source harder to find. This time, I granted select privileges on V$TRANSACTION to TESTUSER, so that we could take a quick peek at the contents after each SELECT FOR UPDATE. I've also set up LogMiner access in the SYS session, as in the last couple of posts.
Session 1 – Connected as TESTUSER
SQL> select pk_id, object_name from test_tab1 order by pk_id desc for update; Trimmed ... 319 STMT_AUDIT_OPTION_MAP 317 STMT_AUDIT_OPTION_MAP 316 TABLE_PRIVILEGE_MAP 314 TABLE_PRIVILEGE_MAP 313 SYSTEM_PRIVILEGE_MAP 311 SYSTEM_PRIVILEGE_MAP 259 DUAL 258 DUAL 4477 rows selected. SQL> select start_time, xid, xidusn, xidslot, 2 xidsqn, start_scn, to_char(start_scn, 'XXXXXXXXXX') 3 from v$transaction 4 order by start_time; START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA -------------------- ---------------- ------ ------- ------- ---------- ----------- 04/23/09 09:40:16 0003001100008615 3 17 34325 0 0 SQL> rollback Rollback complete. SQL> select pk_id from test_tab1 where object_name='SYSTEM_PRIVILEGE_MAP' for update; PK_ID ---------- 311 313 SQL> select start_time, xid, xidusn, xidslot, 2 xidsqn, start_scn, to_char(start_scn, 'XXXXXXXXXX') 3 from v$transaction 4 order by start_time; START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA -------------------- ---------------- ------ ------- ------- ---------- ----------- 04/23/09 09:40:20 0002000100008DAA 2 1 36266 84788253 50DC41D SQL> rollback; Rollback complete. SQL> select pk_id from test_tab1 where pk_id=313 for update; PK_ID ---------- 313 SQL> select start_time, xid, xidusn, xidslot, 2 xidsqn, start_scn, to_char(start_scn, 'XXXXXXXXXX') 3 from v$transaction 4 order by start_time; START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA -------------------- ---------------- ------ ------- ------- ---------- ----------- 04/23/09 09:40:20 000900110000891B 9 17 35099 84788256 50DC420
So Session 1 has executed three different queries, all of which lock one or more rows including the row with PK_ID=313, has rolled back the first two (releasing the locks) and has just PK_ID=313 locked now.
Session 2 – Connected as TESTUSER
SQL> select pk_id from test_tab1 where pk_id=313 for update;
Session 2 hangs, waiting for the lock
Session 1 - Connected as TESTUSER
SQL> rollback; Rollback complete.
The lock is released and Session 2 acquires the lock and then releases it.
Session 2 - Connected as TESTUSER
PK_ID ---------- 313 SQL> select start_time, xid, xidusn, xidslot, 2 xidsqn, start_scn, to_char(start_scn, 'XXXXXXXXXX') 3 from v$transaction 4 order by start_time; START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA -------------------- ---------------- ------ ------- ------- ---------- ----------- 04/23/09 09:40:22 00080005000081D3 8 5 33235 84788259 50DC423 SQL> rollback; Rollback complete.
Taken as a whole, the time-line looks like this
Transaction ID Session 1 Activity Transaction ID Session 2 Activity
0003001100008615 Whole Table Locked
Locks Released
0002000100008DAA Two Rows Locked
Locks Released
000900110000891B PK_ID=313 Locked Waiting to lock PK_ID=313
Lock Released 00080005000081D3 PK_ID=313 Locked
OK, let’s see what LogMiner makes of this. First, let’s look for any entries associated with the specific transaction that was the blocker.
SQL> select username,session# sid,serial#,sql_redo from v$logmnr_contents
where XID = '&&blocking_xid';
Enter value for blocking_xid: 000900110000891B
old 1: select username,session# sid,serial#,sql_redo from v$logmnr_contents
where XID = '&&blocking_xid'
new 1: select username,session# sid,serial#,sql_redo from v$logmnr_contents
where XID = '000900110000891B'
USERNAME SID SERIAL#
------------------------------ ---------- ----------
SQL_REDO
-----------------------------------------------------------------------------------------
0 0
rollback;
Mmmm … ‘rollback’. Not too helpful, is it? Maybe if I look at the undo segment and slot from another of Kyle's queries?
SQL> select distinct xid , xidusn, xidslt, xidsqn,
username, session# sid, serial# , sql_redo
2 from v$logmnr_contents
3 where timestamp > sysdate- &minutes/(60*24)
4 and xidusn=&my_usn
5 and xidslt=&my_slot;
Enter value for minutes: 5
old 3: where timestamp > sysdate- &minutes/(60*24)
new 3: where timestamp > sysdate- 5/(60*24)
Enter value for my_usn: 9
old 4: and xidusn=&my_usn
new 4: and xidusn=9
Enter value for my_slot: 17
old 5: and xidslt=&my_slot
new 5: and xidslt=17XID XIDUSN XIDSLT XIDSQN USERNAME SID SERIAL#
---------------- ---------- ---------- ---------- ------------------- ---------- ----------
SQL_REDO
---------------------------------------------------------------------------------
000900110000891B 9 17 35099 0 0
rollback;
00090011FFFFFFFF 9 17 4294967295 0 0
Unsupported
Now that's a bit interesting because I think we'll find that 'Unsupported' operation is the third SELECT FOR UPDATE (which blocks Session 2) but the Transaction ID looks wrong. Oh, and why is the SQL_REDO 'Unsupported'? Well, would you really want to redo a SELECT FOR UPDATE that merely locks rows?
Next I’ll try displaying all operations against TEST_TAB1 in the past 5 minutes. I’ll group the results so we only see the discrete actions and how many entries there are for each.
SQL> select xid, xidusn, xidslt, xidsqn, session#, serial#, sql_redo, count(*) 2 from v$logmnr_contents 3 where timestamp > sysdate- &minutes/(60*24) 4 and table_name='TEST_TAB1' 5* group by xid, xidusn, xidslt, xidsqn, session#, serial#, sql_redo Enter value for minutes: 60 old 3: where timestamp > sysdate- &minutes/(60*24) new 3: where timestamp > sysdate- 60/(60*24) XID XIDUSN XIDSLT XIDSQN SESSION# SERIAL# SQL_REDO COUNT(*) ---------------- -------- -------- ---------- ---------- ---------- ----------- ---------- 00080005FFFFFFFF 8 5 4294967295 0 0 Unsupported 1 0003001100008615 3 17 34325 0 0 Unsupported 8858 00020001FFFFFFFF 2 1 4294967295 0 0 Unsupported 2 00090011FFFFFFFF 9 17 4294967295 0 0 Unsupported 1
Good - that's starting to look more like it. I can see all 4 discrete SELECT FOR UPDATE transactions from the test (the rollback operation returned by the previous query isn't necessarily specific to TEST_TAB1 so I'm not surprised it doesn't appear). The XIDs still look a little screwy and you're trusting me at this stage that these are SELECT FOR UPDATEs. Notice the number of entries for the different transactions - two single row updates, one 2 row update and a multiple row update.
However, the transaction IDs for three of the transactions have the wrong sequence number of FFFFFFFF and even this output is the best I've seen. I've run this several times and sometimes it's captured by LogMiner, sometime it isn't. I appreciate that's a little vague, but I have very little confidence in some of the results I've seen on different tests.
I dumped the log file for this example, so I'll look at that in the next part.
Believe me when I say I'm aware of how far I've strayed from identifying the blocking SQL statement (you won't be getting that from redo entries) but I suppose I might as well carry on for one more post, maybe two.
Apr 22: Diagnosing Locking Problems using ASH/LogMiner – Part 7
Picking up from the end of the last example, I immediately generated a log file dump as follows. (You'll need to look back at Part 5 to see that the ROWID, log file name etc. match up or just trust me that these steps came from a continuation of the same SYS session used in that test.)
First I'm going to work out the file and block number for the block containing the locked row in the test. That's why I selected the ROWID before I locked it.
Now that I have a file and block number, I'll dump the redo log file but restrict the dump to entries related to that block.
The trace file is created in user_dump_dest and I've uploaded it here. Near the start of the file, I can check that block range.
The first redo entry appears soon after
CLS: 1 - Data Class Block
OP:11.19 - Operation Update Multiple Rows.
Verify OBJ:56717 is TEST_TAB1 -
Verify DBA:
Transaction ID 0x000a.024.00001864 matches XID 0A00240064180000 from V$TRANSACTION during the test, albeit in the rearranged format.
So this is definitely the blocking transaction and I can see the new value for the OBJECT_NAME column below ...
Which is the hexadecimal version of the ASCII string ''SESSION 1'. Further on still, we'll see it being updated to 'SESSION 2'.
I could wade through that dump all day (and it might be a subject for future posts) but a couple of things should be clear.
1) The redo log file contains plenty of information about the row updates, so if Log Miner is returning a single COMMIT action, either it's not working properly or I'm doing something wrong.
2) The contents of the redo log file might be an indicator of the SQL statement that caused the blocking problem, but it won't tell me what that statement was.
At this stage, I would say that ASH is starting to look more effective (in those cases where it can be used) than the alternatives being proposed. Remember, the whole point with ASH is that I don't need to predict when these problems might occur, so any solution that needs me to switch on traces is limited.
If you're interested in learning more about log file dumps, I recommend you read Riyaj Shamsudeen's Redo Internals paper and Julian Dyke's Redo Internals presentation.
First I'm going to work out the file and block number for the block containing the locked row in the test. That's why I selected the ROWID before I locked it.
SYS@TEST1020> select file_id, file_name, dbms_rowid.rowid_block_number('&&my_rowid')
2 from dba_data_files
3 where file_id = dbms_rowid.rowid_to_absolute_fno('&&my_rowid','TESTUSER','TEST_TAB1');
Enter value for my_rowid: AAAN2NAAKAAAaBLAEX
old 1: select file_id, file_name, dbms_rowid.rowid_block_number('&&my_rowid')
new 1: select file_id, file_name, dbms_rowid.rowid_block_number('AAAN2NAAKAAAaBLAEX')
old 3: where file_id = dbms_rowid.rowid_to_absolute_fno('&&my_rowid', 'TESTUSER',
'TEST_TAB1')
new 3: where file_id = dbms_rowid.rowid_to_absolute_fno('AAAN2NAAKAAAaBLAEX','TESTUSER',
'TEST_TAB1')
FILE_ID
----------
FILE_NAME
---------------------------------------------------
DBMS_ROWID.ROWID_BLOCK_NUMBER('AAAN2NAAKAAAABLAEX')
---------------------------------------------------
10
C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\TEST_DATA01.DBF
106571Now that I have a file and block number, I'll dump the redo log file but restrict the dump to entries related to that block.
SYS@TEST1020> alter session set max_dump_file_size=unlimited; Session altered. SYS@TEST1020> alter system dump logfile '&&my_member' 2 dba min 10 106571 dba max 10 106571; old 1: alter system dump logfile '&&my_member' new 1: alter system dump logfile 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG' System altered.
The trace file is created in user_dump_dest and I've uploaded it here. Near the start of the file, I can check that block range.
DUMP OF REDO FROM FILE 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG' Opcodes *.* DBAs: (file # 10, block # 106571) thru (file # 10, block # 106571) RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
The first redo entry appears soon after
REDO RECORD - Thread:1 RBA: 0x0000b3.00000009.0010 LEN: 0x0210 VLD: 0x0d SCN: 0x0000.008a42e6 SUBSCN: 1 04/20/2009 20:46:57 CHANGE #1 TYP:2 CLS: 1 AFN:10 DBA:0x0281a04b OBJ:56717 SCN:0x0000.008a4239 SEQ: 2 OP:11.19 KTB Redo op: 0x11 ver: 0x01 op: F xid: 0x000a.024.00001864 uba: 0x0080009d.031a.27
CLS: 1 - Data Class Block
OP:11.19 - Operation Update Multiple Rows.
Verify OBJ:56717 is TEST_TAB1 -
TESTUSER@TEST1020> select object_name, object_type 2 from user_objects 3 where object_id=56717; OBJECT_NAME ------------------- OBJECT_TYPE ------------------- TEST_TAB1 TABLE
Verify DBA:
SYS@TEST1020> select to_char(dbms_utility.make_data_block_address(10,106571), 2 'XXXXXXXX') from dual; TO_CHAR(D --------- 281A04B
Transaction ID 0x000a.024.00001864 matches XID 0A00240064180000 from V$TRANSACTION during the test, albeit in the rearranged format.
So this is definitely the blocking transaction and I can see the new value for the OBJECT_NAME column below ...
col 1: [ 9] 53 45 53 53 49 4f 4e 20 31
Which is the hexadecimal version of the ASCII string ''SESSION 1'. Further on still, we'll see it being updated to 'SESSION 2'.
I could wade through that dump all day (and it might be a subject for future posts) but a couple of things should be clear.
1) The redo log file contains plenty of information about the row updates, so if Log Miner is returning a single COMMIT action, either it's not working properly or I'm doing something wrong.
2) The contents of the redo log file might be an indicator of the SQL statement that caused the blocking problem, but it won't tell me what that statement was.
At this stage, I would say that ASH is starting to look more effective (in those cases where it can be used) than the alternatives being proposed. Remember, the whole point with ASH is that I don't need to predict when these problems might occur, so any solution that needs me to switch on traces is limited.
If you're interested in learning more about log file dumps, I recommend you read Riyaj Shamsudeen's Redo Internals paper and Julian Dyke's Redo Internals presentation.
Apr 20: Diagnosing Locking Problems using ASH – Part 6
"Toto, I've a feeling we're not in Kansas any more."
What started as a simple write-up of a course demo gone wrong (or right, depending the way you look at these things) has grown arms and legs and staggered away from the original intention to talk about ASH. so I thought it might be worth taking a breather and see where we’ve been so far, before pressing on with another couple of posts.
Part 1 – Introduced a blocking lock problem and showed how it would look in the OEM DB/Grid Control GUI.
Part 2 – Demonstrated that you might not see any ASH samples for the blocking session if it doesn’t generate enough activity to be captured.
Part 3 – Demonstrated that even if the blocking session *has* been active enough to be sampled, you might not have the specific activity that caused the problem.
Part 4 – Reinforced part 3 to show that you might *think* you’re looking at the SQL statement which was holding the blocking lock when you actually aren’t and highlighted JB’s point that there is no way of knowing the SQL statement that was holding a particular lock. I see that Jonathan Lewis has posted an illustration of this although let’s wait to see if Tanel Poder comes up with something
I decided at this point to draw some conclusions as to the limitations of ASH as a tool in diagnosing locking problems after they’ve cleared. In amongst the meandering detail, it's important to reiterate that objective. There are lots of ways of diagnosing locking problems that are happening right now or that you expect to happen in future.
Part 5 – The same subject cropped up on the Oak Table list and in amongst the suggestions were a couple to use LogMiner. One was from Kyle Hailey and I posted his scripts and output verbatim, but there were a few cut and paste errors and so I was faffing around with the post for a while and eventually added a comment to say I’d fix it later. I’ve now done so, re-running the tests with the correct code and removed the comment from the post. Sadly, none of the corrections fixed the two central problems with this approach
a) Kyle and I have both found LogMiner behaviour to be very flaky indeed and it doesn’t seem to return the results I’d expect, over multiple tests by two different ‘testers’. (See the last post for an example.)
b) I had to change the examples to use UPDATE statements rather than SELECT FOR UPDATEs.
I’ll be looking at these two issues in Parts 7 and 8 (I think) but who knows what turn these posts will take next! Here’s where I think I might be heading
Part 6 – You’re reading it.
Part 7 – Redo log dump confirmation that the LogMiner queries in Part 5 aren't showing what I'd expect.
Part 8 – Redo log dump confirmation that redo entries won’t help with the original SELECT FOR UPDATE example anyway.
Part 9 – Can I stop now, or should I attempt to reach a double-digit part number for the first (and hopefully last) time?
What started as a simple write-up of a course demo gone wrong (or right, depending the way you look at these things) has grown arms and legs and staggered away from the original intention to talk about ASH. so I thought it might be worth taking a breather and see where we’ve been so far, before pressing on with another couple of posts.
Part 1 – Introduced a blocking lock problem and showed how it would look in the OEM DB/Grid Control GUI.
Part 2 – Demonstrated that you might not see any ASH samples for the blocking session if it doesn’t generate enough activity to be captured.
Part 3 – Demonstrated that even if the blocking session *has* been active enough to be sampled, you might not have the specific activity that caused the problem.
Part 4 – Reinforced part 3 to show that you might *think* you’re looking at the SQL statement which was holding the blocking lock when you actually aren’t and highlighted JB’s point that there is no way of knowing the SQL statement that was holding a particular lock. I see that Jonathan Lewis has posted an illustration of this although let’s wait to see if Tanel Poder comes up with something
I decided at this point to draw some conclusions as to the limitations of ASH as a tool in diagnosing locking problems after they’ve cleared. In amongst the meandering detail, it's important to reiterate that objective. There are lots of ways of diagnosing locking problems that are happening right now or that you expect to happen in future.
Part 5 – The same subject cropped up on the Oak Table list and in amongst the suggestions were a couple to use LogMiner. One was from Kyle Hailey and I posted his scripts and output verbatim, but there were a few cut and paste errors and so I was faffing around with the post for a while and eventually added a comment to say I’d fix it later. I’ve now done so, re-running the tests with the correct code and removed the comment from the post. Sadly, none of the corrections fixed the two central problems with this approach
a) Kyle and I have both found LogMiner behaviour to be very flaky indeed and it doesn’t seem to return the results I’d expect, over multiple tests by two different ‘testers’. (See the last post for an example.)
b) I had to change the examples to use UPDATE statements rather than SELECT FOR UPDATEs.
I’ll be looking at these two issues in Parts 7 and 8 (I think) but who knows what turn these posts will take next! Here’s where I think I might be heading
Part 6 – You’re reading it.
Part 7 – Redo log dump confirmation that the LogMiner queries in Part 5 aren't showing what I'd expect.
Part 8 – Redo log dump confirmation that redo entries won’t help with the original SELECT FOR UPDATE example anyway.
Part 9 – Can I stop now, or should I attempt to reach a double-digit part number for the first (and hopefully last) time?
Apr 17: Diagnosing Locking Problems using ASH – Part 5
... and so it continues.
Coincidentally, the subject of tracking down locking problems after they occurred cropped up on the Oak Table mailing list just after the last post. Several suggestions were offered but I think the nearest and most detailed suggestion was that proposed by Kyle Hailey, ex-Oracle and now working Embarcadero. There were one or two problems with the original email which I've fixed up here ...
For the wait event class I was teaching I wanted to show an example of taking ASH data and finding the blocking statement in the log miner data, but found this was less than reliable. Sometimes it would work sometimes it didn't. Not sure what the issue was. I think dumping the redo log itself would be more dependable. I think logminer might not report everything - ie I think there is some filtering going on.
Anyway here are some steps that might be of use to people. I get XID but you probably don't need it, but I recall the XID being useful in doing the redo log dumps
Setup
Test Case
Session 1
Session 2
So Session 1 is now blocking session 2 ... have a look in V$TRANSACTION
Commit Session 1
... so that Session 2 is released.
and take a look in V$TRANSACTION at the new transaction details
So the blocking lock problem has now cleared. Let's start gathering information to see if we can identify the problem.
Analysis
Parameters for 'enq: TX - Row Lock Contention'
P2=usn<<16||slot
P3=Sequence
=> XID
Let's find our WAITER and the BLOCKING XID
Now let's use logminer to find out what the BLOCKING SQL was
Oh
Well, how about trying a different approach?
Which isn't too helpful - I can't see any information about Session 1's row update. If anyone has any ideas why, I'm all ears.
In the next post I'll look at a redo log file dump to see if that has any more information.
Coincidentally, the subject of tracking down locking problems after they occurred cropped up on the Oak Table mailing list just after the last post. Several suggestions were offered but I think the nearest and most detailed suggestion was that proposed by Kyle Hailey, ex-Oracle and now working Embarcadero. There were one or two problems with the original email which I've fixed up here ...
For the wait event class I was teaching I wanted to show an example of taking ASH data and finding the blocking statement in the log miner data, but found this was less than reliable. Sometimes it would work sometimes it didn't. Not sure what the issue was. I think dumping the redo log itself would be more dependable. I think logminer might not report everything - ie I think there is some filtering going on.
Anyway here are some steps that might be of use to people. I get XID but you probably don't need it, but I recall the XID being useful in doing the redo log dumps
Setup
SYS@TEST1020> set lines 120
SYS@TEST1020> column member format a60
SYS@TEST1020>
SYS@TEST1020> alter system switch logfile;
System altered.
SYS@TEST1020> select group#, thread#, status from v$log;
GROUP# THREAD# STATUS
---------- ---------- ----------------
1 1 CURRENT
2 1 ACTIVE
3 1 ACTIVE
SYS@TEST1020> select group#, member from v$logfile;
GROUP# MEMBER
---------- ------------------------------------------------------------
1 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG
2 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO02.LOG
3 C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO03.LOG
SYS@TEST1020> exec dbms_logmnr.add_logfile('&&my_member');
Enter value for my_member: C:\ORACLE\PRODUCT\10.2.0\ORADATA\TEST1020\REDO01.LOG
PL/SQL procedure successfully completed.
SYS@TEST1020> exec dbms_logmnr.start_logmnr(options => dbms_logmnr.dict_from_online_catalog);
PL/SQL procedure successfully completed.Test Case
Session 1
TESTUSER@TEST1020> create table test_tab1
as select object_id pk_id, object_name from all_objects;
Table created.
TESTUSER@TEST1020> select rowid, object_name from test_tab1 where pk_id=313;
ROWID OBJECT_NAME
------------------ ------------------------------
AAAN2NAAKAAAaBLAEX SYSTEM_PRIVILEGE_MAP
TESTUSER@TEST1020> update test_tab1 set object_name='SESSION 1' where pk_id=313;
1 row updated.Session 2
TESTUSER@TEST1020> update test_tab1 set object_name='SESSION 2' where pk_id=313;
So Session 1 is now blocking session 2 ... have a look in V$TRANSACTION
SYS@TEST1020> select start_time, xid, xidusn, xidslot, xidsqn, start_scn,
to_char(start_scn, 'XXXXXXXXXX')
from v$transaction order by start_time;
START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA
-------------------- ---------------- ---------- ---------- ---------- ---------- -----------
04/20/09 20:46:39 0A00240064180000 10 36 6244 0 0Commit Session 1
TESTUSER@TEST1020> commit; Commit complete.
... so that Session 2 is released.
1 row updated.
and take a look in V$TRANSACTION at the new transaction details
SYS@TEST1020> select start_time, xid, xidusn, xidslot, xidsqn, start_scn, to_cha r(start_scn, 'XXXXXXXXXX') from v$transaction order by start_time; START_TIME XID XIDUSN XIDSLOT XIDSQN START_SCN TO_CHAR(STA -------------------- ---------------- ---------- ---------- ---------- ---------- ----------- 04/20/09 20:46:54 08002000681F0000 8 32 8040 9061094 8A42E6
So the blocking lock problem has now cleared. Let's start gathering information to see if we can identify the problem.
Analysis
Parameters for 'enq: TX - Row Lock Contention'
P2=usn<<16||slot
P3=Sequence
=> XID
Let's find our WAITER and the BLOCKING XID
SYS@TEST1020> select * from 2 (select lpad(ltrim(to_char(p2,'XXXXXX')),6,'0')|| 3 '00'|| 4 ltrim(to_char(mod(p3,256),'XX'))|| 5 ltrim(to_char(trunc(p3/256),'XX'))|| 6 '0000' block_xid, 7 to_char(p2,'XXXXXXXX') p2hex, 8 to_char(p3,'XXXXXXXX') p3hex, 9 trunc(p2/65536) usn, 10 mod(p2,65536) slot, 11 p3 sqn, xid wait_xid 12 from v$active_session_history 13 where event like 'enq: T%' 14 order by sample_time desc) where rownum < 2; BLOCK_XID P2HEX P3HEX USN SLOT SQN WAIT_XID ------------------ --------- --------- ---------- ---------- ---------- -------- 0A00240064180000 A0024 1864 10 36 6244
Now let's use logminer to find out what the BLOCKING SQL was
SYS@TEST1020> select username, session# sid, serial# , sql_redo from v$logmnr_co ntents where XID = '&&blocking_xid' 2 / Enter value for blocking_xid: 0A00240064180000 old 1: select username, session# sid, serial# , sql_redo from v$logmnr_content s where XID = '&&blocking_xid' new 1: select username, session# sid, serial# , sql_redo from v$logmnr_content s where XID = '0A00240064180000' USERNAME SID SERIAL# ------------------------------ ---------- ---------- SQL_REDO -------------------------------------------------------------------------------- UNKNOWN 0 0 commit;
Oh
Well, how about trying a different approach?
SYS@TEST1020> select distinct xid , xidusn, xidslt, xidsqn, username, session# s
id, serial# , sql_redo
2 from v$logmnr_contents
3 where timestamp > sysdate- &minutes/(60*24)
4 and xidusn=10
5 and xidslt=36;
Enter value for minutes: 10
old 3: where timestamp > sysdate- &minutes/(60*24)
new 3: where timestamp > sysdate- 10/(60*24)
XID XIDUSN XIDSLT XIDSQN USERNAME
SID SERIAL#
---------------- ---------- ---------- ---------- ------------------------------
---------- ----------
SQL_REDO
--------------------------------------------------------------------------------
----------------------------------------
0A00240064180000 10 36 6244 UNKNOWN
0 0
commit;Which isn't too helpful - I can't see any information about Session 1's row update. If anyone has any ideas why, I'm all ears.
In the next post I'll look at a redo log file dump to see if that has any more information.
Apr 9: Diagnosing Locking Problems using ASH – Part 4
Some features in this post require a Diagnostics Pack license.
No sooner had I finished part 3 with some conclusions than I thought of another example I should have included and then someone else made a comment in an email which suggested another. (Thanks, JB!)
I’ll try to deal with that point and some other thoughts I had here in a further example. For this example, I decided to suggest how to create a quick table so that you can test this out for yourself.
As easy as that.
In the last part I showed that the SQL_ID for the last sample of a blocking session immediately before it went idle might not be the SQL statement holding the lock. In that case, it was fairly obvious that it wasn’t because it was a simple query that couldn’t have been responsible. However, it’s likely to be even more confusing if the SQL_ID captured by the ASH sample *looks* like it’s responsible, but isn’t, and the previous example didn't illustrate that clearly.
Session 1 – Connected as TESTUSER
So session 1 has executed three different queries, all of which lock one or more rows including the row with PK_ID=313, has rolled back the first two (releasing the locks) and has just that row locked now.
Session 2 – Connected as TESTUSER
Session 2 hangs, waiting for the lock.
Session 1 – Connected as TESTUSER
So the lock is released and session 2 acquires the lock and then releases it
Session 2 – Connected as Testuser
OK, time to look at the ASH data for the last two minutes for TESTUSER sessions. (Note that I have added each session’s serial# to the query. Graham Wood corrected a statement I made in the last post …
… by pointing out that session_serial# would allow me to differentiate between different sessions where SID=148. I’ve updated the previous post to that effect.)
So what statement is SQL_ID ‘95k0s60jznz76’?
Let’s face it, if you were to look at this data, how likely is it that you would think that that statement was responsible for blocking session 2, when it wasn't?
No sooner had I finished part 3 with some conclusions than I thought of another example I should have included and then someone else made a comment in an email which suggested another. (Thanks, JB!)
"It's probably worth pointing out that NO TOOL can claim to identify the blocking SID->SQL in such a case as there is no relationship kept in Oracle between locks acquired by users and the SQL that acquired the locks"
I’ll try to deal with that point and some other thoughts I had here in a further example. For this example, I decided to suggest how to create a quick table so that you can test this out for yourself.
SQL> create user testuser identified by testuser 2 default tablespace USERS 3 temporary tablespace TEMP; User created. SQL> grant create session, create table to testuser; Grant succeeded. SQL> connect testuser/testuser Connected. SQL> create table test_tab1 as select object_id pk_id, object_name from all_objects; Table created.
As easy as that.
In the last part I showed that the SQL_ID for the last sample of a blocking session immediately before it went idle might not be the SQL statement holding the lock. In that case, it was fairly obvious that it wasn’t because it was a simple query that couldn’t have been responsible. However, it’s likely to be even more confusing if the SQL_ID captured by the ASH sample *looks* like it’s responsible, but isn’t, and the previous example didn't illustrate that clearly.
Session 1 – Connected as TESTUSER
SQL> select pk_id, object_name from test_tab1 order by pk_id desc for update; <Trimmed> 890 DBMS_STANDARD 889 DBMS_STANDARD PK_ID OBJECT_NAME ---------- ------------------------------ 887 STANDARD 886 RE$NV_LIST 605 MAP_OBJECT 319 STMT_AUDIT_OPTION_MAP 317 STMT_AUDIT_OPTION_MAP 316 TABLE_PRIVILEGE_MAP 314 TABLE_PRIVILEGE_MAP 313 SYSTEM_PRIVILEGE_MAP 311 SYSTEM_PRIVILEGE_MAP 259 DUAL 258 DUAL 4477 rows selected. SQL> rollback; Rollback complete. SQL> select pk_id from test_tab1 where object_name='SYSTEM_PRIVILEGE_MAP' for update; PK_ID ---------- 311 313 SQL> rollback; Rollback complete. SQL> select pk_id from test_tab1 where pk_id=313 for update; PK_ID ---------- 313 SQL>
So session 1 has executed three different queries, all of which lock one or more rows including the row with PK_ID=313, has rolled back the first two (releasing the locks) and has just that row locked now.
Session 2 – Connected as TESTUSER
SQL> select pk_id from test_tab1 where pk_id=313 for update;
Session 2 hangs, waiting for the lock.
Session 1 – Connected as TESTUSER
SQL> rollback; Rollback complete.
So the lock is released and session 2 acquires the lock and then releases it
Session 2 – Connected as Testuser
PK_ID ---------- 313 SQL> rollback; Rollback complete.
OK, time to look at the ASH data for the last two minutes for TESTUSER sessions. (Note that I have added each session’s serial# to the query. Graham Wood corrected a statement I made in the last post …
“who is to say that that session 148 is the same session as the blocking session 148 and not an earlier session that has disconnected?”
… by pointing out that session_serial# would allow me to differentiate between different sessions where SID=148. I’ve updated the previous post to that effect.)
break on MY_TIME
column my_sid format 999
column my_ser format 99999
column my_state format a30
column my_blkr format 999
select to_char(a.sample_time, 'HH24:MI:SS') MY_TIME,a.session_id MY_SID,a.session_serial# MY_SER,
DECODE(a.session_state, 'WAITING' ,a.event, a.session_state) MY_STATE,a.xid, a.sql_id,
a.blocking_session MY_BLKR
from v$active_session_history a, dba_users u
where u.user_id = a.user_id
and u.username = 'TESTUSER'
and a.sample_time > SYSTIMESTAMP-(2/1440);
MY_TIME MY_SID MY_SER MY_STATE XID SQL_ID MY_BLKR
-------- ------ ------ ------------------------------ ---------------- ------------- -------
12:53:01 300 2739 enq: TX - row lock contention 2pa7rnqf65gj3 304
12:53:00 300 2739 enq: TX - row lock contention 2pa7rnqf65gj3 304
12:52:59 300 2739 enq: TX - row lock contention 2pa7rnqf65gj3 304
12:52:50 304 1445 ON CPU 0006000E0000884D 95k0s60jznz76So what statement is SQL_ID ‘95k0s60jznz76’?
SQL> select sql_text from v$sql where sql_id='95k0s60jznz76'; SQL_TEXT ----------------------------------------------------------------------------- select pk_id, object_name from test_tab1 order by pk_id desc for update
Let’s face it, if you were to look at this data, how likely is it that you would think that that statement was responsible for blocking session 2, when it wasn't?
« previous page
(Page 2 of 4, totaling 17 entries)
next page »


Comments