Not all Deadlocks are created the same

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Mar 11: Not all Deadlocks are created the same

I've blogged about deadlocks in Oracle at least once before. I said then that although the following message in deadlock trace files is usually true, it isn't always.

The following deadlock is not an Oracle error. Deadlocks of  
this type can be expected if certain SQL statements are      
issued. The following information may aid in determining the 
cause of the deadlock.

So when I came across another example recently, it seemed worth a quick blog post. Not least for the benefit of other souls who hit the same issue (and probably hit Google moments later).

But while it's easy to say - "Hey! Look! I found an exception! Aren't I clever?" - it occurred to me that actually Oracle's capabilities in this area might be underrated by raising the occasional anomaly. Because the truth is

1) In most cases, deadlock errors are down to the way you've written your application or some documented restriction in Oracle. The type of restrictions that you're more likely to hit if you're handling high degrees of concurrency with lots of DDL, parallel query, partition management and the like. Such activities often have unusually restrictive locking requirements and most locking issues can be turned into deadlock issues quite easily if you have a few sessions running concurrently.


2) It's still the case that Oracle will handle the deadlock situation, at least to the extent of rolling back one of the statements causing the issue. (Although, whilst writing this post, I noticed that Jonathan Lewis raised the question of what exactly people mean when they suggest that Oracle resolves deadlock issues.

3) Deadlock trace files are typically very useful and not the most difficult to read. Yes, they tend to use Oracle kernel terminology (not surprising) but I'd wager that most people could have a rough idea of the root cause with some initial analysis and could have a very detailed idea, given more time. Even if you can't decipher the things yourself, it gives Oracle Support detailed information to help root cause analysis.

So, to the particular issue we hit. Towards the end of a data loading process that loads around a billion rows in a short period of time (30/60 minutes that also includes a bunch of surrounding activities), we would hit the occasional deadlock error. Fortunately, the site I'm working at just now has a very enlightened policy towards developer access to the alert log and trace files, so I can do my own initial investigation. On digging out the relevant deadlock trace file, it looked like this (some details changed)

Trace file /app/ora/local/admin/PRD/diag/rdbms/PRD_prod_server/PRD/trace/PRD_dia0_1627682.trc 

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production 
With the Partitioning, Automatic Storage Management, OLAP, Data Mining 
and Real Application Testing options 
ORACLE_HOME = /app/ora/local/product/11.2.0.3/db_1 
System name:    Linux 
Node name:      prod_server.ldn.orcldoug.com 
Release:        2.6.32-220.13.1.el6.x86_64 
Version:        #1 SMP Thu Mar 29 11:46:40 EDT 2012 
Machine:        x86_64 
Instance name: PRD 
Redo thread mounted by this instance: 1 
Oracle process number: 8 
Unix process pid: 1627682, image: oracle@prod_server.ldn.orcldoug.com (DIA0)
*** 2013-01-16 09:09:10.925 
*** SESSION ID:(201.1) 2013-01-16 09:09:10.925 
*** CLIENT ID:() 2013-01-16 09:09:10.925 
*** SERVICE NAME:(SYS$BACKGROUND) 2013-01-16 09:09:10.925 
*** MODULE NAME:() 2013-01-16 09:09:10.925 
*** ACTION NAME:() 2013-01-16 09:09:10.925 
  
------------------------------------------------------------------------------- 
  
DEADLOCK DETECTED (id=0xd0102292) 
  
Chain Signature: 'library cache lock'<='row cache lock' (cycle) 
Chain Signature Hash: 0x52a8007d 
  
The following deadlock is not an Oracle error. Deadlocks of  
this type can be expected if certain SQL statements are      
issued. The following information may aid in determining the 
cause of the deadlock.                                       
  
Resolving deadlock by signaling ORA-00060 to 'instance: 1, os id: 3443329, session id: 161' 
  dump location: /app/ora/local/admin/PRD/diag/rdbms/PRD_prod_server/PRD/trace/PRD_ora_3443329.trc 
  
Performing diagnostic dump on 'instance: 1, os id: 3443222, session id: 779' 
  dump location: /app/ora/local/admin/PRD/diag/rdbms/PRD_prod_server/PRD/trace/PRD_ora_3443222.trc 
  
------------------------------------------------------------------------------- 
    Oracle session identified by: 
    { 
                instance: 1 (PRD_prod_server.PRD) 
                   os id: 3443222 
              process id: 127, oracle@prod_server.ldn.orcldoug.com 
              session id: 779 
        session serial #: 605 
    } 
    is waiting for 'row cache lock' with wait info: 
    { 
                      p1: 'cache id'=0x8 
                      p2: 'mode'=0x0 
                      p3: 'request'=0x5 
            time in wait: 1 min 58 sec 
           timeout after: never 
                 wait id: 1655 
                blocking: 1 session 
             current sql: Begin run_manager_pkg.finalize_all_values_prc(:v0); End; 
            wait history: 
              * time between current wait and wait #1: 0.002176 sec 
              1.       event: 'enq: PS - contention' 
                 time waited: 0.000082 sec 
                     wait id: 1654            p1: 'name|mode'=0x50530006 
                                              p2: 'instance'=0x1 
                                              p3: 'slave ID'=0x2f 
              * time between wait #1 and #2: 0.000013 sec 
              2.       event: 'PX Deq: Slave Session Stats' 
                 time waited: 0.000001 sec 
                     wait id: 1653            p1: 'sleeptime/senderid'=0x0 
                                              p2: 'passes'=0x0 
              * time between wait #2 and #3: 0.000001 sec 
              3.       event: 'PX Deq: Slave Session Stats' 
                 time waited: 0.000002 sec 
                     wait id: 1652            p1: 'sleeptime/senderid'=0x0 
                                              p2: 'passes'=0x0 
    } 
    and is blocked by 
=> Oracle session identified by: 
    { 
                instance: 1 (PRD_prod_server.PRD) 
                   os id: 3443329 
              process id: 134, oracle@prod_server.ldn.orcldoug.com 
              session id: 161 
        session serial #: 247 
    } 
    which is waiting for 'library cache lock' with wait info: 
    { 
                      p1: 'handle address'=0x101f8eac98 
                      p2: 'lock address'=0xfdef83738 
                      p3: '100*mode+namespace'=0x10f2000010003 
            time in wait: 1.739719 sec 
           timeout after: never 
                 wait id: 508 
                blocking: 1 session 
             current sql: ALTER INDEX "DOUG"."VALUE_PK" REBUILD PARTITION "SYS_P4089"

            wait history:               * time between current wait and wait #1: 0.000973 sec               1.       event: 'enq: CR - block range reuse ckpt'                  time waited: 0.003220 sec                      wait id: 507             p1: 'name|mode'=0x43520006                                               p2: '2'=0x10086                                               p3: '0'=0x1               * time between wait #1 and #2: 0.000008 sec               2.       event: 'reliable message'                  time waited: 0.000107 sec                      wait id: 506             p1: 'channel context'=0x101c5afa98                                               p2: 'channel handle'=0x101c0f2260                                               p3: 'broadcast message'=0x101b5cfd58               * time between wait #2 and #3: 0.003791 sec               3.       event: 'db file sequential read'                  time waited: 0.000321 sec                      wait id: 505             p1: 'file#'=0x5e                                               p2: 'block#'=0x118784                                               p3: 'blocks'=0x1     }     and is blocked by the session at the start of the chain.


I would hope that a few things would be immediately obvious, particularly if it's 'your' application that generated this issue

1) The two sessions are running the following parts of the application.

Session 1: Begin run_manager_pkg.finalize_all_values_prc(:v0); End;
Session 2: ALTER INDEX "DOUG"."VALUE_PK" REBUILD PARTITION "SYS_P4089"

Which happened to fit in with what we were seeing. We have two concurrent runs which perform similar actions using different input files that load into different partitions.

2) The first session is using Parallel Query (note the enq: PS - contention and different PX Deq wait events)

3) The deadlock is a little unusual because it's not between two transactions trying to lock database objects or rows being locked by the other session but between in-memory structures. One session is waiting on 'row cache lock' and the other is waiting on 'library cache lock', as opposed to waiting for specific row or table-level locks. This is also visible from the chain signature at the start of the trace file.

Chain Signature: 'library cache lock'<='row cache lock' (cycle)

Armed with 2) and 3) in particular, my next step was to go to My Oracle Support, as usual. I find that Google isn't too great with issues like this because some of them are quite specific and might not be affecting too many others. Sure enough, a search turned up :-

Bug 14356507  Deadlock between partition maintenance and parallel query operations

Which is confirmed as affecting versions 11.2.0.2 and 11.2.0.3. The fix is in Bundle Patch 12 for Exadata, in Oracle 12.1 and is also available as one-off patch that we're in the process of applying to different environments.

The issue is that "When a parallel query is hard parsed, first QC hard parses the query and then all the slaves.  When a partition maintenance operation (DDL) comes in between the hard parses of QC and Slaves.", then you can hit the deadlock. There's more detail in the bug notes, but it's worth noting this phrase "This is basically a timing issue, in high concurrency environments.", which means it only affects us very intermittently and is a nightmare to prove we've eliminated without a lot of testing.

What I find a little disconcerting is that there seem to be quite a few of these library cache deadlock issues kicking around in recent versions that I haven't been used to seeing in prior versions. Given some of the library cache madness I've seen in my few years with 11g, I do wonder what on earth they've done to it!

Posted by Doug Burns Comments: (0) Trackbacks: (0)

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

No comments


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

A couple of posts about Incremental Stats confusion

Part 1
Part 2

Comments

personal blog about Moving Sideways
Wed, 01.06.2016 18:34
That is a good tip particularl y to those fresh to the blogos phere. Short [...]
odziezprestige.pl about Moving Sideways
Wed, 01.06.2016 17:07
Please let me know if you're l ooking for a article writer fo r your site. [...]
Doug Burns about Moving Sideways
Tue, 10.05.2016 22:43
Oh, I won't give it that long unless I enjoy it ;-)

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml dbms_stats
xml Direct Path Reads
xml Fun
xml Grid Control
xml hotsos 2010
xml listener
xml locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml partitions
xml Patching
xml Swingbench
xml The Reality Gap
xml Time Matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl