Entries tagged as 11g
May 17: Resource Manager and 11g
I will get back to the stats stuff at some point, but I'm quite busy at the moment working on something that I can't talk too much about, but which is throwing up enough generic issues to talk about. This is one that I meant to blog about ages ago when I first noticed it but when it caused us some problems last week, it was a useful reminder.
In summary you need to be careful when you upgrade to 11g because Resource Manager is enabled by default!
I don't want to blog about the ins and outs of Resource Manager and whether it's a good thing or not, but I do think this is a pretty extreme change to implement without a lot of surrounding publicity. It's a bit like the auto stats gather job that appeared in 10g that caused so many problems for Oracle users. It seems like it might be a good idea, but would you really want to introduce it on to a stable system that you're upgrading to 11g?
But rather than just talk about the change, I wanted to highlight how I first realised it was going on ...

Yes, a pretty picture (albeit not too legible at that resolution, even when you click the thumbnail to see the bigger version). This is the sort of change that could have completely passed me by (and I am utterly convinced it has others) but, because I have a habit of looking at the Top Activity page for any system I'm working on, this activity leaps out at me as the new pale green wait class - Scheduler - which sits just above CPU + Wait for CPU (the addition of + Wait for CPU is one of the tiny details I really like in 11g). I should say that the specific event these sessions are waiting on is "resmgr:cpu quantum".
As soon as I upgraded my first laptop db to 11g (quite a while ago now) I noticed the additional activity and though, mmmm, that's interesting and when I checked the resource_manager_plan parameter, it was set to DEFAULT_PLAN, to my surprise.
Low and behold, whilst working on an 11gR2 system last week that we couldn't get enough parallelism out of, I checked and the DEFAULT_PLAN was enabled. Once it was disabled, like so ....
Everything started running as expected.
So I went home to do a little background research and the first obvious place to look was the documentation. But that's pretty misleading because it says there is no default value for resource_manager_plan. I checked v$PARAMETER and, sure enough, the default is not DEFAULT_PLAN. So I started to doubt myself. Maybe it was just because I'd created Mickey Mouse databases on my laptop, using the GUI. However, that wouldn't apply to the database I'm working on and, after some discussion with the DBA who completed the upgrade, it became clear that the Upgrade Assistant (dbua) sets this parameter.
But that's not all. In total, 11gR2 creates 10 plans as far as I can tell and DEFAULT_PLAN is not the only one used, but maintenance plans are also used during maintenance windows. Here is my laptop database right now, while it's in a maintenance window.
I can imagine this leading to a lot of confusion, so thanks again to Jari Kuhanen for pointing out this Metalink Note to me that covers it.
Definitely something to look out for but, in fairness to Oracle, if I had re-read the relevant documentation it is all clear in there. I wonder how many people do that before every upgrade though?
In summary you need to be careful when you upgrade to 11g because Resource Manager is enabled by default!
I don't want to blog about the ins and outs of Resource Manager and whether it's a good thing or not, but I do think this is a pretty extreme change to implement without a lot of surrounding publicity. It's a bit like the auto stats gather job that appeared in 10g that caused so many problems for Oracle users. It seems like it might be a good idea, but would you really want to introduce it on to a stable system that you're upgrading to 11g?
But rather than just talk about the change, I wanted to highlight how I first realised it was going on ...
Yes, a pretty picture (albeit not too legible at that resolution, even when you click the thumbnail to see the bigger version). This is the sort of change that could have completely passed me by (and I am utterly convinced it has others) but, because I have a habit of looking at the Top Activity page for any system I'm working on, this activity leaps out at me as the new pale green wait class - Scheduler - which sits just above CPU + Wait for CPU (the addition of + Wait for CPU is one of the tiny details I really like in 11g). I should say that the specific event these sessions are waiting on is "resmgr:cpu quantum".
As soon as I upgraded my first laptop db to 11g (quite a while ago now) I noticed the additional activity and though, mmmm, that's interesting and when I checked the resource_manager_plan parameter, it was set to DEFAULT_PLAN, to my surprise.
Low and behold, whilst working on an 11gR2 system last week that we couldn't get enough parallelism out of, I checked and the DEFAULT_PLAN was enabled. Once it was disabled, like so ....
alter system set resource_manager_plan='';
Everything started running as expected.
So I went home to do a little background research and the first obvious place to look was the documentation. But that's pretty misleading because it says there is no default value for resource_manager_plan. I checked v$PARAMETER and, sure enough, the default is not DEFAULT_PLAN. So I started to doubt myself. Maybe it was just because I'd created Mickey Mouse databases on my laptop, using the GUI. However, that wouldn't apply to the database I'm working on and, after some discussion with the DBA who completed the upgrade, it became clear that the Upgrade Assistant (dbua) sets this parameter.
But that's not all. In total, 11gR2 creates 10 plans as far as I can tell and DEFAULT_PLAN is not the only one used, but maintenance plans are also used during maintenance windows. Here is my laptop database right now, while it's in a maintenance window.
SQL> show parameter resource_manager_plan NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ resource_manager_plan string SCHEDULER[0x3003]:DEFAULT_MAIN TENANCE_PLAN
I can imagine this leading to a lot of confusion, so thanks again to Jari Kuhanen for pointing out this Metalink Note to me that covers it.
Definitely something to look out for but, in fairness to Oracle, if I had re-read the relevant documentation it is all clear in there. I wonder how many people do that before every upgrade though?
Posted by Doug Burns
Comments: (5)
Trackback: (1)
Defined tags for this entry: 11g, resource manager
Sep 23: Parallel Query and 11g - Part 2
Now, that's weird.
A little surprising might be more accurate and maybe I'm missing something.
During the various tests with and without parallel hints and different parallel_io_cap_enabled settings, I expected the runs that didn't use parallelism to show up "db file scattered read" events in the trace files. For example, here's the explain plan and wait event summary generated by tkprof from a test on version 10.2.0.2
and here's the same section from an 11.1.0.6 trace file. The test isn't quite identical - I don't have the 10g database around any more - but I suppose I could try it later.
It looks to me like Oracle is using Direct Path Reads where it would previously have used Multiblock Reads. Let me try another simple test to see.
(Note the new *.trm Trace Mapping file.)
When I look at the contents of the trace file, sure enough, all of the i/o wait events are direct path reads.
I had a quick look at this page in the documentation and it implies that direct path reads would be associated with parallelism.
but these statements aren't using PX. I probably am missing something.
Any ideas?
A little surprising might be more accurate and maybe I'm missing something.
During the various tests with and without parallel hints and different parallel_io_cap_enabled settings, I expected the runs that didn't use parallelism to show up "db file scattered read" events in the trace files. For example, here's the explain plan and wait event summary generated by tkprof from a test on version 10.2.0.2
Rows Row Source Operation
------- ---------------------------------------------------
0 SORT GROUP BY (cr=0 pr=0 pw=0 time=64 us)
0 HASH JOIN (cr=0 pr=0 pw=0 time=15 us)
65536000 TABLE ACCESS FULL TEST_TAB2 (cr=1118253 pr=1052301 pw=0 time=1441796425 us)
47535880 TABLE ACCESS FULL TEST_TAB1 (cr=832521 pr=832491 pw=0 time=1378575273 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
db file scattered read 135042 0.16 983.67
direct path write temp 5961 6.05 833.35
db file sequential read 15370 0.04 17.56
local write wait 4482 0.39 479.14
SQL*Net break/reset to client 2 0.01 0.01and here's the same section from an 11.1.0.6 trace file. The test isn't quite identical - I don't have the 10g database around any more - but I suppose I could try it later.
Rows Row Source Operation
------- ---------------------------------------------------
113 SORT GROUP BY (cr=2265134 pr=2479358 pw=2479358 time=0 us cost=945776
size=787633200 card=65636100)
65529137 HASH JOIN (cr=2265134 pr=2479358 pw=2479358 time=2198939 us cost=647314
size=787633200 card=65636100)
65536000 TABLE ACCESS FULL TEST_TAB2 (cr=1130519 pr=1130495 pw=1130495 time=8362747 us
cost=267173 size=393816600 card=65636100)
65536000 TABLE ACCESS FULL TEST_TAB1 (cr=1134615 pr=1134591 pw=1134591 time=9715954 us
cost=268141 size=393880800 card=65646800)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 9 0.00 0.00
direct path read 27807 0.90 260.40
direct path write temp 6912 0.71 38.93
direct path read temp 6912 0.04 18.31
SQL*Net message from client 9 0.00 0.00It looks to me like Oracle is using Direct Path Reads where it would previously have used Multiblock Reads. Let me try another simple test to see.
SQL> exec dbms_session.set_identifier('FTS');
PL/SQL procedure successfully completed.
SQL> alter session set tracefile_identifier='FTS';
Session altered.
SQL> exec dbms_monitor.client_id_trace_enable(client_id => 'FTS');
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> select count(*) from testuser.test_tab1;
COUNT(*)
----------
65536000
Execution Plan
----------------------------------------------------------
Plan hash value: 1745491793
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1407K (1)| 01:57:17 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST_TAB1 | 65M| 1407K (1)| 01:57:17 |
------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1134615 consistent gets
1134591 physical reads
0 redo size
420 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> exec dbms_monitor.client_id_trace_disable(client_id => 'FTS');
PL/SQL procedure successfully completed.
So I expect to see a single trace file in the user_dump_dest directory
[oracle@ISP4400 PX]$ cd /u01/app/oracle/diag/rdbms/test11/TEST11/trace [oracle@ISP4400 trace]$ ls -ltra total 9128 drwxr-x--- 13 oracle oracle 4096 Sep 16 11:33 .. -rw-rw-r-- 1 oracle oracle 6725 Sep 19 20:49 ccf.sql -rw-rw-r-- 1 oracle oracle 1650 Sep 19 20:50 ccf2.sql drwxrwxr-x 2 oracle oracle 4096 Sep 22 00:32 cdmp_20070922003230 -rw-rw---- 1 oracle oracle 298602 Sep 23 12:52 TEST11_ora_9977_default.trm -rw-rw---- 1 oracle oracle 5085019 Sep 23 12:52 TEST11_ora_9977_default.trc -rw-rw---- 1 oracle oracle 59 Sep 23 13:01 TEST11_m000_10050.trm -rw-rw---- 1 oracle oracle 882 Sep 23 13:01 TEST11_m000_10050.trc -rw-rw---- 1 oracle oracle 113602 Sep 23 13:23 TEST11_ora_10100_FTS.trm -rw-r----- 1 oracle oracle 196158 Sep 23 13:24 alert_TEST11.log -rw-rw---- 1 oracle oracle 1696437 Sep 23 13:24 TEST11_ora_10100_FTS.trc
(Note the new *.trm Trace Mapping file.)
When I look at the contents of the trace file, sure enough, all of the i/o wait events are direct path reads.
PARSING IN CURSOR #1 len=39 dep=0 uid=0 oct=3 lid=0 tim=1190549917314881
hv=1002667060 ad='6d964890' sqlid='0quzqwnxw6z1n'
select count(*) from testuser.test_tab1
END OF STMT
PARSE #1:c=2999,e=2606,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1190549917314863
EXEC #1:c=0,e=131,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1190549917315423
WAIT #1: nam='SQL*Net message to client' ela= 20 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1190549917315531
WAIT #1: nam='direct path read' ela= 12764 file number=6 first dba=10
block cnt=15 obj#=70940 tim=1190549917337012
WAIT #1: nam='direct path read' ela= 18221 file number=6 first dba=25
block cnt=83 obj#=70940 tim=1190549917359346I had a quick look at this page in the documentation and it implies that direct path reads would be associated with parallelism.
On a healthy system, physical read waits should be the biggest waits after the idle waits. However, also consider whether there are direct read waits (signifying full table scans with parallel query) or db file scattered read waits on an operational (OLTP) system that should be doing small indexed accesses.
but these statements aren't using PX. I probably am missing something.
Any ideas?
Sep 23: Parallel Query and 11g
I've been playing around with 11g on and off this week, re-running some of the tests in this document (PDF file).
The main edits required to the scripts were simple ones to correct the locations of trace files, for example (where $ORACLE_BASE is /ora on the 10g system)
They were always hard-coded because I never planned to re-run the tests much but, even if I had used $ORACLE_BASE/admin/$ORACLE_SID/udump (and bdump), which would have been the correct approach, the scripts still wouldn't have worked because of Oracle's decision to play around with OFA directory structures yet again! I can't help feeling this is going to cause more problems than expected because there are a lot of scripts on systems out there that depend on the existing locations.
Anyway, on to the tests themselves. I was keen to try out the PARALLEL_IO_CAP_ENABLED parameter. The first step was to use the new I/O calibration tool. I simply lifted the example straight out of the documentation and amended the number of available disks.
Well, it's certainly recognised how pathetic my i/o infrastructure is, as I've mentioned in the past. Oh, and I should mention that this is a 5-disk stripe set with a stripe width of 256KB. That's too small, but was the one the Linux installation decided on and matched the one I used in the original paper, so I've left it as it is for now.
Now to see how this impacts the decision to use PX for a query. The problem is that the existing scripts use hints to request a specific degree of parallelism (DOP), e.g.
So I threw together a slightly modified script that requests PX but doesn't specify the DOP in the hint, leaving it to Oracle to decide :-
Next I'll run the test and then check the contents of v$pq_tqstat to see if parallelism was used (note - this is only enough of the output to illustrate my point).
So the DOP is cpu_count (4) * parallel_threads_per_cpu (2) = 8 px slaves per slave set, with two slave sets used. That's what I expected.
With parallel_io_cap_enabled
Now I'll look at the effect of the new parameter.
This time, the contents of v$pq_tqstat were
So, because of the limited i/o resources, Oracle's decided not to use PX for the same query, with the same hints. What I found interesting was that the execution plan still looks like PX is used. (Just a snippet of output again.)
There'll be more to say later, no doubt, but that's all for now.
The main edits required to the scripts were simple ones to correct the locations of trace files, for example (where $ORACLE_BASE is /ora on the 10g system)
rm /ora/admin/TEST1020/udump/*.trc
and $ORACLE_BASE is /u01/app/oracle on the 11g system
rm /u01/app/oracle/diag/rdbms/test11/TEST11/trace/*.trc
They were always hard-coded because I never planned to re-run the tests much but, even if I had used $ORACLE_BASE/admin/$ORACLE_SID/udump (and bdump), which would have been the correct approach, the scripts still wouldn't have worked because of Oracle's decision to play around with OFA directory structures yet again! I can't help feeling this is going to cause more problems than expected because there are a lot of scripts on systems out there that depend on the existing locations.
Anyway, on to the tests themselves. I was keen to try out the PARALLEL_IO_CAP_ENABLED parameter. The first step was to use the new I/O calibration tool. I simply lifted the example straight out of the documentation and amended the number of available disks.
SQL> @io
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 lat INTEGER;
3 iops INTEGER;
4 mbps INTEGER;
5 BEGIN
6 -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (5, 10, iops, mbps, lat);
8
9 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
10 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
11 dbms_output.put_line('max_mbps = ' || mbps);
12 end;
13 /
max_iops = 112
latency = 8
max_mbps = 32
PL/SQL procedure successfully completed.Well, it's certainly recognised how pathetic my i/o infrastructure is, as I've mentioned in the past. Oh, and I should mention that this is a 5-disk stripe set with a stripe width of 256KB. That's too small, but was the one the Linux installation decided on and matched the one I used in the original paper, so I've left it as it is for now.
Now to see how this impacts the decision to use PX for a query. The problem is that the existing scripts use hints to request a specific degree of parallelism (DOP), e.g.
SELECT /*+ parallel(tt1,$DOP) parallel(tt2, $DOP) */ MOD(tt1.pk_id + tt2.pk_id, 113), COUNT(*)
FROM test_tab1 tt1, test_tab2 tt2
WHERE tt1.pk_id = tt2.pk_id
GROUP BY MOD(tt1.pk_id + tt2.pk_id ,113)
ORDER BY MOD(tt1.pk_id + tt2.pk_id ,113); So I threw together a slightly modified script that requests PX but doesn't specify the DOP in the hint, leaving it to Oracle to decide :-
SELECT /*+ parallel(tt1) parallel(tt2) */ MOD(tt1.pk_id + tt2.pk_id, 113), COUNT(*)
Without parallel_io_cap_enabled
SQL> show parameter parallel_io NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ parallel_io_cap_enabled boolean FALSE
Next I'll run the test and then check the contents of v$pq_tqstat to see if parallelism was used (note - this is only enough of the output to illustrate my point).
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes
2 FROM v$pq_tqstat
3 ORDER BY dfo_number DESC, tq_id, server_type DESC , process;
DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES
---------- ---------- ---------- ---------- ---------- ----------
1 0 Producer P000 8220476 58056397
Producer P001 8104871 57233905
Producer P002 8075353 57029517
Producer P003 7816421 55186002
Producer P004 8088829 57061516
Producer P005 8008482 56470221
Producer P006 8491092 59649083
Producer P007 8730476 61174367
Consumer P008 8189988 57718569
Consumer P009 8196864 57766896
Consumer P010 8192339 57734621
Consumer P011 8189512 57715446
Consumer P012 8192294 57734372
Consumer P013 8192216 57734131
Consumer P014 8193654 57745093
Consumer P015 8189133 57711880So the DOP is cpu_count (4) * parallel_threads_per_cpu (2) = 8 px slaves per slave set, with two slave sets used. That's what I expected.
With parallel_io_cap_enabled
Now I'll look at the effect of the new parameter.
SQL> alter system set PARALLEL_IO_CAP_ENABLED=true; System altered.
This time, the contents of v$pq_tqstat were
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes 2 FROM v$pq_tqstat 3 ORDER BY dfo_number DESC, tq_id, server_type DESC , process; no rows selected
So, because of the limited i/o resources, Oracle's decided not to use PX for the same query, with the same hints. What I found interesting was that the execution plan still looks like PX is used. (Just a snippet of output again.)
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%
CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
---------------------------------------------
| 0 | SELECT STATEMENT | | 65M| 751M| | 585K
(13)| 03:22:10 | | | |
| 1 | PX COORDINATOR | | | | |
| | | | |
| 2 | PX SEND QC (ORDER) | :TQ10002 | 65M| 751M| | 585K
(13)| 03:22:10 | Q1,02 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 65M| 751M| 2513M| 585K
(13)| 03:22:10 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 65M| 751M| | 285K
(11)| 01:38:33 | Q1,02 | PCWP | |
| 5 | PX SEND RANGE | :TQ10001 | 65M| 751M| | 285K
(11)| 01:38:33 | Q1,01 | P->P | RANGE |
|* 6 | HASH JOIN | | 65M| 751M| 1126M| 285K
(11)| 01:38:33 | Q1,01 | PCWP | |
| 7 | PX RECEIVE | | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,01 | PCWP | |
| 8 | PX SEND BROADCAST | :TQ10000 | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | P->P | BROADCAST |
| 9 | PX BLOCK ITERATOR | | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | PCWC | |
| 10 | TABLE ACCESS FULL| TEST_TAB2 | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | PCWP | |
| 11 | PX BLOCK ITERATOR | | 65M| 375M| | 112K
(9)| 00:39:00 | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL | TEST_TAB1 | 65M| 375M| | 112K
(9)| 00:39:00 | Q1,01 | PCWP | |There'll be more to say later, no doubt, but that's all for now.
Jul 12: Oracle 11g - Total Recall
I suppose most readers will be aware that Oracle held a big launch for 11g yesterday, including the release of various technical docs at OTN. I've only managed a quick scan because I'm up to my eyeballs in some internal ASH/AWR training I'm giving later today, but what particularly interested me was this paragraph in the New Features Overview.
I'm not sure how much attention that might get because it sounds like just another minor Flashback feature, but I suspect it's really Rob Squire's stuff that I talked about here. Definitely one for further investigation.
“Time Travel” with Oracle Flashback Data Archive
Oracle Database 11g also features Total Recall with Oracle Flashback Data Archive, which enables you to query data in selected tables “as of” earlier times in the past, thereby providing an easy, practical way to add a time dimension to your data for change tracking, ILM, auditing, and compliance. Oracle Flashback Data Archive provides automatic and efficient storage of “change” data to enable fast query access to old versions of the data.
The DBA can set retention policies to automatically purge data when your system reaches specified age thresholds
I'm not sure how much attention that might get because it sounds like just another minor Flashback feature, but I suspect it's really Rob Squire's stuff that I talked about here. Definitely one for further investigation.
« previous page
(Page 1 of 1, totaling 4 entries)
next page »


Comments