Hanging Audit Vault Warehouse Refresh Job

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Aug 5: Hanging Audit Vault Warehouse Refresh Job

I've been working with Audit Vault 10.2.3 on AIX recently and ran into a problem that someone else might one day. I decided to capture the output as best I can and blog about it later. (As I don't have the same system available any more, the formatting might leave a bit to be desired and there are one or two little steps missing that I would liked to have shown but I don't think they're crucial.)

It was just a Pre-Production system that we could play around with but I noticed that there were no recent audit trail entries when I tried some tests. On closer inspection, the most recent audit record was a couple of days old, so it looked like the job which inserts new audit records into the Audit Vault Data Warehouse wasn't running properly. The first step was to connect to that database instance and check the status of the job.

SQL> select owner, job_name, elapsed_time from all_scheduler_running_jobs; 

OWNER                          JOB_NAME 
------------------------------ ------------------------------ 
ELAPSED_TIME 
--------------------------------------------------------------------------- 
AVSYS                          REFRESH_WAREHOUSE_DATA 
+002 21:06:37.19

OK, so the Warehouse Refresh job has been running for a few days (2 days, 21 hours and 6 minutes). When I looked at the session for that job, it was waiting on "enq: PS - contention" which is (trumpet salute) a parallel execution wait event.

  1  select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, 
            blocking_session, blocking_session_status 
  2  from v$session 
  3* where sid=201 
SQL> / 

       SID USERNAME                       LAST_CALL_ET 
---------- ------------------------------ ------------ 
EVENT                                                             WAIT_TIME 
---------------------------------------------------------------- ---------- 
SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE 
--------------- ------------------- ---------------- ----------- 
       201 AVSYS                                249575 
enq: PS - contention                                                      0 
          48361 WAITING                          169 VALID

It's a bit of a red herring because it's really waiting on the PX slave, so the real event should be in that session 169.

SQL> c/201/169 
  3* where sid=169 
SQL> / 

       SID USERNAME                       LAST_CALL_ET 
---------- ------------------------------ ------------ 
EVENT                                                             WAIT_TIME 
---------------------------------------------------------------- ---------- 
SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE 
--------------- ------------------- ---------------- ----------- 
       169 AVSYS                                249250 
cursor: pin X                                                             0 
              0 WAITING                              UNKNOWN

So, looking at that made me Google around and come up with this. Which implies that I should look at bug number 5908030 which is stated as a duplicate of 5476091. Both RDBMS bugs and both will be fixed in (gulp) 11.2.

The next job was to kill the currently running refresh job and restart it, particularly as this was just Pre-Prod and the lack of audit records in the Warehouse was holding up testing. I tried various approaches to stopping the job, running from the highest level more elegant approaches, down to brute force.

SQL> exec dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA'); 
BEGIN dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA'); END; 

* 
ERROR at line 1: 
ORA-27365: job has been notified to stop, but failed to do so immediately 
ORA-06512: at "SYS.DBMS_ISCHED", line 164 
ORA-06512: at "SYS.DBMS_SCHEDULER", line 483 
ORA-06512: at line 1 

Tried that a few times and decided it was probably well and truly stuck, particularly when the bugs were referring to systemstate dumps, so I tried killing the stuck slave, first

 1  select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, 
           blocking_session, blocking_session_status 
 2  from v$session 
 3* where sid=201 
SQL> / 

       SID USERNAME                       LAST_CALL_ET 
---------- ------------------------------ ------------ 
EVENT                                                             WAIT_TIME 
---------------------------------------------------------------- ---------- 
SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE 
--------------- ------------------- ---------------- ----------- 
       201 AVSYS                                251777 
enq: PS - contention                                                      0 
          50563 WAITING                          169 VALID 
SQL> 3 
  3* where sid=201 
SQL> c/201/169 
  3* where sid=169 
SQL> / 

       SID USERNAME                       LAST_CALL_ET 
---------- ------------------------------ ------------ 
EVENT                                                             WAIT_TIME 
---------------------------------------------------------------- ---------- 
SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE 
--------------- ------------------- ---------------- ----------- 
       169 AVSYS                                251446 
cursor: pin X                                                             0 
              0 WAITING                              UNKNOWN 

SQL> select sid, serial# from v$session where sid=169; 
       SID    SERIAL# 
---------- ---------- 
       169       1585 

SQL> alter system kill session '169,1585'; 
alter system kill session '169,1585' 
* 
ERROR at line 1: 
ORA-00031: session marked for kill 


I waited and tried a few more times, but that slave really didn't want to die ...

SQL> l 
  1  select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, 
            blocking_session, blocking_session_status 
  2  from v$session 
  3* where sid=169 
SQL> / 

       SID USERNAME                       LAST_CALL_ET 
---------- ------------------------------ ------------ 
EVENT                                                             WAIT_TIME 
---------------------------------------------------------------- ---------- 
SECONDS_IN_WAIT STATE               BLOCKING_SESSION BLOCKING_SE 
--------------- ------------------- ---------------- ----------- 
       169 AVSYS                                251725 
cursor: pin X                                                             0 
              0 WAITING                              UNKNOWN 

So I tried try killing the process

SQL> select sid, serial#, paddr, osuser, process, status 
  2  from v$session where sid=169; 

       SID    SERIAL# PADDR            OSUSER 
---------- ---------- ---------------- ------------------------------ 
PROCESS      STATUS 
------------ -------- 
       169       1585 070000005A383348 oracle 
962750       KILLED 

SQL> !ps -ef|grep 962750 
  oracle  962750       1   1   Jul 13      - 11:49 ora_p006_AV01TEST 
  oracle 1519634 1757348   2 12:11:27  pts/1  0:00 grep 962750 
SQL> !kill -9 962750 
SQL> !ps -ef|grep 962750 
  oracle 1519638 1757348   1 12:11:38  pts/1  0:00 grep 962750 

Now it's gone, see if parent session for refresh job is still there

  1  select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, 
            blocking_session, blocking_session_status 
  2  from v$session 
  3* where sid=201 
SQL> / 

no rows selected 

No, so the job should be gone too

SQL> select * from all_scheduler_running_jobs; 

OWNER                          JOB_NAME 
------------------------------ ------------------------------ 
JOB_SUBNAME                    SESSION_ID SLAVE_PROCESS_ID SLAVE_OS_PRO 
------------------------------ ---------- ---------------- ------------ 
RUNNING_INSTANCE RESOURCE_CONSUMER_GROUP 
---------------- -------------------------------- 
ELAPSED_TIME 
--------------------------------------------------------------------------- 
CPU_USED 
--------------------------------------------------------------------------- 
AVSYS                          REFRESH_WAREHOUSE_DATA 
                                      122               19 692392 
               1 
+000 00:01:25.48 
+000 00:00:18.38 


Well, it had restarted and been running for a minute or so. The audit trail entries started to appear in the warehouse, anyway. As I won't be working with this again for a while, I'm counting on colleagues to keep me posted about whether this keeps happening!
Posted by Doug Burns Comments: (6) Trackbacks: (0)
Defined tags for this entry: Audit Vault
Related entries by tags:
Adding a new Oracle host to Audit Vault

Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)

#1 - illiyaz 2008-11-06 19:05 - (Reply)

Doug, I just faced the problem. I am running a OWB mapping (which is again a pl/sql package internally) and i get this same wait event hanging on for ever. The beauty of this problem is that there is no BLOCKING session as shown above, the blocking session shows NULL.This mapping basically does insert-else-update(Merge).It so happens that only one environment has this problem which makes it more difficult to NAIL DOWN the problem.Any pointers from folks would be great..

Illiyaz

#1.1 - Doug Burns said:
2008-11-08 20:21 - (Reply)

I haven't come across this myself, but thanks for another example of the problem.

#1.2 - Krishnan Venugopal 2008-12-10 06:55 - (Reply)

We also encountered the same issue with OWB(10.2.0.1). The mapping (internally a merge statement) that used to take 500 seconds normally was terribly slow and the time_remaining in v$session_longops was showing 177455 secs.We had to restart the DB to resolve this issue. Did you arrive at any reason/solution for this...?

#1.2.1 - Doug Burns said:
2008-12-10 18:06 - (Reply)

All I have is the detail on Metalink. I stopped working with this soon after so I don't know when a suitable patch is likely to appear.

#1.2.1.1 - Tirumal Rao 2009-04-14 10:59 - (Reply)

Hi Doug,

I am new to this Audit Vault and I was assigned to resolve an issue with "REFRESH_WAREHOUSE_DATA", when I below query shows the job running.

select STATE,repeat_interval,last_start_date,next_run_date from dba_scheduler_jobs where job_name='REFRESH_WAREHOUSE_DATA';

STATE
---------------
REPEAT_INTERVAL
---------------------------------
LAST_START_DATE
---------------------------------
NEXT_RUN_DATE
---------------------------------
RUNNING

02-MAR-09 10.24.51.851836 PM +00:00
12-MAR-09 08.00.00.000000 PM +00:00

When trying to stop it gives

SQL> exec dbms_scheduler.stop_job ('AVSYS.REFRESH_WAREHOUSE_DATA');
BEGIN dbms_scheduler.stop_job ('AVSYS.REFRESH_WAREHOUSE_DATA'); END;

*
ERROR at line 1:
ORA-27366: job "AVSYS.REFRESH_WAREHOUSE_DATA" is not running
ORA-06512: at "SYS.DBMS_ISCHED", line 164
ORA-06512: at "SYS.DBMS_SCHEDULER", line 483
ORA-06512: at line 1

SQL> select owner, job_name, elapsed_time from all_scheduler_running_jobs;

The jobs was runing for a while but suddenly stopped.

OWNER JOB_NAME ELAPSED_TIME
------------------------------ ------------------------------ ---------------------------------------------------------------------------
AVSYS REFRESH_WAREHOUSE_DATA +041
18:22:08.63

I found a note 764058.1 which is related to ownership issue.

Any help to resolve this is highly regarded.
Regards

#1.2.1.1.1 - Doug Burns said:
2009-04-14 18:30 - (Reply)

I found a note 764058.1 which is related to ownership issue.

Did you check the ownership issue? I can't see the query from the Metalink note in your comment here.

Also, did you try killing the process to clear the job, as shown in the post above? I think the post contains what you're looking for if you read it completely.

If killing the process and letting the job restart doesn't work, then that's another matter ....


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl