Aug 5: Hanging Audit Vault Warehouse Refresh Job
It was just a Pre-Production system that we could play around with but I noticed that there were no recent audit trail entries when I tried some tests. On closer inspection, the most recent audit record was a couple of days old, so it looked like the job which inserts new audit records into the Audit Vault Data Warehouse wasn't running properly. The first step was to connect to that database instance and check the status of the job.
SQL> select owner, job_name, elapsed_time from all_scheduler_running_jobs; OWNER JOB_NAME ------------------------------ ------------------------------ ELAPSED_TIME --------------------------------------------------------------------------- AVSYS REFRESH_WAREHOUSE_DATA +002 21:06:37.19
OK, so the Warehouse Refresh job has been running for a few days (2 days, 21 hours and 6 minutes). When I looked at the session for that job, it was waiting on "enq: PS - contention" which is (trumpet salute) a parallel execution wait event.
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state,
blocking_session, blocking_session_status
2 from v$session
3* where sid=201
SQL> /
SID USERNAME LAST_CALL_ET
---------- ------------------------------ ------------
EVENT WAIT_TIME
---------------------------------------------------------------- ----------
SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE
--------------- ------------------- ---------------- -----------
201 AVSYS 249575
enq: PS - contention 0
48361 WAITING 169 VALIDIt's a bit of a red herring because it's really waiting on the PX slave, so the real event should be in that session 169.
SQL> c/201/169 3* where sid=169 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 169 AVSYS 249250 cursor: pin X 0 0 WAITING UNKNOWN
So, looking at that made me Google around and come up with this. Which implies that I should look at bug number 5908030 which is stated as a duplicate of 5476091. Both RDBMS bugs and both will be fixed in (gulp) 11.2.
The next job was to kill the currently running refresh job and restart it, particularly as this was just Pre-Prod and the lack of audit records in the Warehouse was holding up testing. I tried various approaches to stopping the job, running from the highest level more elegant approaches, down to brute force.
SQL> exec dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA');
BEGIN dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA'); END;
*
ERROR at line 1:
ORA-27365: job has been notified to stop, but failed to do so immediately
ORA-06512: at "SYS.DBMS_ISCHED", line 164
ORA-06512: at "SYS.DBMS_SCHEDULER", line 483
ORA-06512: at line 1 Tried that a few times and decided it was probably well and truly stuck, particularly when the bugs were referring to systemstate dumps, so I tried killing the stuck slave, first
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state,
blocking_session, blocking_session_status
2 from v$session
3* where sid=201
SQL> /
SID USERNAME LAST_CALL_ET
---------- ------------------------------ ------------
EVENT WAIT_TIME
---------------------------------------------------------------- ----------
SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE
--------------- ------------------- ---------------- -----------
201 AVSYS 251777
enq: PS - contention 0
50563 WAITING 169 VALID
SQL> 3
3* where sid=201
SQL> c/201/169
3* where sid=169
SQL> /
SID USERNAME LAST_CALL_ET
---------- ------------------------------ ------------
EVENT WAIT_TIME
---------------------------------------------------------------- ----------
SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE
--------------- ------------------- ---------------- -----------
169 AVSYS 251446
cursor: pin X 0
0 WAITING UNKNOWN
SQL> select sid, serial# from v$session where sid=169;
SID SERIAL#
---------- ----------
169 1585
SQL> alter system kill session '169,1585';
alter system kill session '169,1585'
*
ERROR at line 1:
ORA-00031: session marked for kill
I waited and tried a few more times, but that slave really didn't want to die ...
SQL> l
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state,
blocking_session, blocking_session_status
2 from v$session
3* where sid=169
SQL> /
SID USERNAME LAST_CALL_ET
---------- ------------------------------ ------------
EVENT WAIT_TIME
---------------------------------------------------------------- ----------
SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE
--------------- ------------------- ---------------- -----------
169 AVSYS 251725
cursor: pin X 0
0 WAITING UNKNOWN So I tried try killing the process
SQL> select sid, serial#, paddr, osuser, process, status 2 from v$session where sid=169; SID SERIAL# PADDR OSUSER ---------- ---------- ---------------- ------------------------------ PROCESS STATUS ------------ -------- 169 1585 070000005A383348 oracle 962750 KILLED SQL> !ps -ef|grep 962750 oracle 962750 1 1 Jul 13 - 11:49 ora_p006_AV01TEST oracle 1519634 1757348 2 12:11:27 pts/1 0:00 grep 962750 SQL> !kill -9 962750 SQL> !ps -ef|grep 962750 oracle 1519638 1757348 1 12:11:38 pts/1 0:00 grep 962750
Now it's gone, see if parent session for refresh job is still there
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state,
blocking_session, blocking_session_status
2 from v$session
3* where sid=201
SQL> /
no rows selected No, so the job should be gone too
SQL> select * from all_scheduler_running_jobs; OWNER JOB_NAME ------------------------------ ------------------------------ JOB_SUBNAME SESSION_ID SLAVE_PROCESS_ID SLAVE_OS_PRO ------------------------------ ---------- ---------------- ------------ RUNNING_INSTANCE RESOURCE_CONSUMER_GROUP ---------------- -------------------------------- ELAPSED_TIME --------------------------------------------------------------------------- CPU_USED --------------------------------------------------------------------------- AVSYS REFRESH_WAREHOUSE_DATA 122 19 692392 1 +000 00:01:25.48 +000 00:00:18.38
Well, it had restarted and been running for a minute or so. The audit trail entries started to appear in the warehouse, anyway. As I won't be working with this again for a while, I'm counting on colleagues to keep me posted about whether this keeps happening!
#1 - illiyaz 2008-11-06 19:05 - (Reply)
Doug, I just faced the problem. I am running a OWB mapping (which is again a pl/sql package internally) and i get this same wait event hanging on for ever. The beauty of this problem is that there is no BLOCKING session as shown above, the blocking session shows NULL.This mapping basically does insert-else-update(Merge).It so happens that only one environment has this problem which makes it more difficult to NAIL DOWN the problem.Any pointers from folks would be great..
Illiyaz
#1.1 - Doug Burns said:
2008-11-08 20:21 - (Reply)
I haven't come across this myself, but thanks for another example of the problem.
#1.2 - Krishnan Venugopal 2008-12-10 06:55 - (Reply)
We also encountered the same issue with OWB(10.2.0.1). The mapping (internally a merge statement) that used to take 500 seconds normally was terribly slow and the time_remaining in v$session_longops was showing 177455 secs.We had to restart the DB to resolve this issue. Did you arrive at any reason/solution for this...?
#1.2.1 - Doug Burns said:
2008-12-10 18:06 - (Reply)
All I have is the detail on Metalink. I stopped working with this soon after so I don't know when a suitable patch is likely to appear.
#1.2.1.1 - Tirumal Rao 2009-04-14 10:59 - (Reply)
Hi Doug,
I am new to this Audit Vault and I was assigned to resolve an issue with "REFRESH_WAREHOUSE_DATA", when I below query shows the job running.
select STATE,repeat_interval,last_start_date,next_run_date from dba_scheduler_jobs where job_name='REFRESH_WAREHOUSE_DATA';
STATE
---------------
REPEAT_INTERVAL
---------------------------------
LAST_START_DATE
---------------------------------
NEXT_RUN_DATE
---------------------------------
RUNNING
02-MAR-09 10.24.51.851836 PM +00:00
12-MAR-09 08.00.00.000000 PM +00:00
When trying to stop it gives
SQL> exec dbms_scheduler.stop_job ('AVSYS.REFRESH_WAREHOUSE_DATA');
BEGIN dbms_scheduler.stop_job ('AVSYS.REFRESH_WAREHOUSE_DATA'); END;
*
ERROR at line 1:
ORA-27366: job "AVSYS.REFRESH_WAREHOUSE_DATA" is not running
ORA-06512: at "SYS.DBMS_ISCHED", line 164
ORA-06512: at "SYS.DBMS_SCHEDULER", line 483
ORA-06512: at line 1
SQL> select owner, job_name, elapsed_time from all_scheduler_running_jobs;
The jobs was runing for a while but suddenly stopped.
OWNER JOB_NAME ELAPSED_TIME
------------------------------ ------------------------------ ---------------------------------------------------------------------------
AVSYS REFRESH_WAREHOUSE_DATA +041
18:22:08.63
I found a note 764058.1 which is related to ownership issue.
Any help to resolve this is highly regarded.
Regards
#1.2.1.1.1 - Doug Burns said:
2009-04-14 18:30 - (Reply)
I found a note 764058.1 which is related to ownership issue.
Did you check the ownership issue? I can't see the query from the Metalink note in your comment here.
Also, did you try killing the process to clear the job, as shown in the post above? I think the post contains what you're looking for if you read it completely.
If killing the process and letting the job restart doesn't work, then that's another matter ....

