Sep 24: 11g and direct path reads
Curioser and curioser ....
I've been thinking about this on and off today and I really can't understand the behaviour I'm seeing. In fact, it's a good job it was a busy day or it might have sent me round the twist
I've been playing around a little tonight, trying to find the obvious mistake I'm making and I can't see it. I tried various different statements on some smaller test tables with similar results.
One of the comments Jonathan Lewis left in the last blog gave me food for thought, though. He commented on the lack of a segment checkpoint at the start of the full table scan.
"One interesting omission is the absence of the segment checkpoint that normally precedes a direct path tablescan."
Maybe the FTS is using direct path reads because it's a straightforward read of the table and no blocks for that segment exist in the buffer cache yet? I decided to try updating one of the rows, which would perform a full table scan (there are no indexes on this table) which I expected to use db file scattered reads.
First I performed another FTS, just to make sure that it was using direct path reads. It was, as confirmed by both an ASH report and a trace file (I'm starting to get paranoid!
)
So I picked out a row to update and updated it. This would require a FTS because I'm using an unindexed column to identify the row.
I monitored this while it was running and finally, from an ASH report, I could see the wait event I was looking for.
I committed the update. So now there should be multiple blocks of this table in the buffer cache, including one that's been updated. In another session I performed the same simple select count against test_tab2 and this time, there was a segment checkpoint in there.
It's still using direct path reads, though, and not a suggestion of parallelism anywhere!
I've been thinking about this on and off today and I really can't understand the behaviour I'm seeing. In fact, it's a good job it was a busy day or it might have sent me round the twist
I've been playing around a little tonight, trying to find the obvious mistake I'm making and I can't see it. I tried various different statements on some smaller test tables with similar results.
One of the comments Jonathan Lewis left in the last blog gave me food for thought, though. He commented on the lack of a segment checkpoint at the start of the full table scan.
"One interesting omission is the absence of the segment checkpoint that normally precedes a direct path tablescan."
Maybe the FTS is using direct path reads because it's a straightforward read of the table and no blocks for that segment exist in the buffer cache yet? I decided to try updating one of the rows, which would perform a full table scan (there are no indexes on this table) which I expected to use db file scattered reads.
First I performed another FTS, just to make sure that it was using direct path reads. It was, as confirmed by both an ASH report and a trace file (I'm starting to get paranoid!
SQL> select count(*) from test_tab2; COUNT(*) ---------- 65536000
So I picked out a row to update and updated it. This would require a FTS because I'm using an unindexed column to identify the row.
SQL> select * from test_tab2 where rownum=1;
PK_ID NUM_CODE
---------- ----------
STR_PAD
--------------------------------------------------------------------------------
1 1
tReEjSRqBLQHiFYNCokRvNANVmohDvdLuuLbtQHwlgwIjNhXezdYxjeOESlmOvBvFbicWgKoUgZfbhkz
mRFAfIZyMxOHLJrqIfkF
SQL> update test_tab2 set str_pad='DOUG_TEST' where pk_id=1;
1 row updated.I monitored this while it was running and finally, from an ASH report, I could see the wait event I was looking for.
Event Event Class % Event Sessions ----------------------------------- --------------- ---------- ---------- db file scattered read User I/O 59.15 0.34 CPU + Wait for CPU CPU 30.99 0.18
I committed the update. So now there should be multiple blocks of this table in the buffer cache, including one that's been updated. In another session I performed the same simple select count against test_tab2 and this time, there was a segment checkpoint in there.
PARSING IN CURSOR #3 len=30 dep=0 uid=88 oct=3 lid=88 tim=1190662307795682 hv=3408193527
ad='7a0e6e68' sqlid='5f6zrrv5k9tzr'
select count(*) from test_tab2
END OF STMT
PARSE #3:c=5000,e=4564,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1190662307795665
EXEC #3:c=0,e=231,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1190662307796185
WAIT #3: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0
obj#=5808 tim=1190662307796296
WAIT #3: nam='reliable message' ela= 25 channel context=2042343360 channel handle=2042172560
broadcast message=2043267192 obj#=5808 tim=1190662307801012
WAIT #3: nam='reliable message' ela= 340 channel context=2042343360 channel handle=2042172560
broadcast message=2043267192 obj#=5808 tim=1190662307801451
WAIT #3: nam='enq: KO - fast object checkpoint' ela= 1994 name|mode=1263468550 2=65556 0=1
obj#=5808 tim=1190662307803549
WAIT #3: nam='direct path read' ela= 32596 file number=6 first dba=4106 block cnt=22 obj#=70941
tim=1190662307841063
WAIT #3: nam='direct path read' ela= 58 file number=6 first dba=4128 block cnt=96 obj#=70941
tim=1190662307844828It's still using direct path reads, though, and not a suggestion of parallelism anywhere!
Sep 23: Parallel Query and 11g - Part 2
Now, that's weird.
A little surprising might be more accurate and maybe I'm missing something.
During the various tests with and without parallel hints and different parallel_io_cap_enabled settings, I expected the runs that didn't use parallelism to show up "db file scattered read" events in the trace files. For example, here's the explain plan and wait event summary generated by tkprof from a test on version 10.2.0.2
and here's the same section from an 11.1.0.6 trace file. The test isn't quite identical - I don't have the 10g database around any more - but I suppose I could try it later.
It looks to me like Oracle is using Direct Path Reads where it would previously have used Multiblock Reads. Let me try another simple test to see.
(Note the new *.trm Trace Mapping file.)
When I look at the contents of the trace file, sure enough, all of the i/o wait events are direct path reads.
I had a quick look at this page in the documentation and it implies that direct path reads would be associated with parallelism.
but these statements aren't using PX. I probably am missing something.
Any ideas?
A little surprising might be more accurate and maybe I'm missing something.
During the various tests with and without parallel hints and different parallel_io_cap_enabled settings, I expected the runs that didn't use parallelism to show up "db file scattered read" events in the trace files. For example, here's the explain plan and wait event summary generated by tkprof from a test on version 10.2.0.2
Rows Row Source Operation
------- ---------------------------------------------------
0 SORT GROUP BY (cr=0 pr=0 pw=0 time=64 us)
0 HASH JOIN (cr=0 pr=0 pw=0 time=15 us)
65536000 TABLE ACCESS FULL TEST_TAB2 (cr=1118253 pr=1052301 pw=0 time=1441796425 us)
47535880 TABLE ACCESS FULL TEST_TAB1 (cr=832521 pr=832491 pw=0 time=1378575273 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
db file scattered read 135042 0.16 983.67
direct path write temp 5961 6.05 833.35
db file sequential read 15370 0.04 17.56
local write wait 4482 0.39 479.14
SQL*Net break/reset to client 2 0.01 0.01and here's the same section from an 11.1.0.6 trace file. The test isn't quite identical - I don't have the 10g database around any more - but I suppose I could try it later.
Rows Row Source Operation
------- ---------------------------------------------------
113 SORT GROUP BY (cr=2265134 pr=2479358 pw=2479358 time=0 us cost=945776
size=787633200 card=65636100)
65529137 HASH JOIN (cr=2265134 pr=2479358 pw=2479358 time=2198939 us cost=647314
size=787633200 card=65636100)
65536000 TABLE ACCESS FULL TEST_TAB2 (cr=1130519 pr=1130495 pw=1130495 time=8362747 us
cost=267173 size=393816600 card=65636100)
65536000 TABLE ACCESS FULL TEST_TAB1 (cr=1134615 pr=1134591 pw=1134591 time=9715954 us
cost=268141 size=393880800 card=65646800)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 9 0.00 0.00
direct path read 27807 0.90 260.40
direct path write temp 6912 0.71 38.93
direct path read temp 6912 0.04 18.31
SQL*Net message from client 9 0.00 0.00It looks to me like Oracle is using Direct Path Reads where it would previously have used Multiblock Reads. Let me try another simple test to see.
SQL> exec dbms_session.set_identifier('FTS');
PL/SQL procedure successfully completed.
SQL> alter session set tracefile_identifier='FTS';
Session altered.
SQL> exec dbms_monitor.client_id_trace_enable(client_id => 'FTS');
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> select count(*) from testuser.test_tab1;
COUNT(*)
----------
65536000
Execution Plan
----------------------------------------------------------
Plan hash value: 1745491793
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1407K (1)| 01:57:17 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST_TAB1 | 65M| 1407K (1)| 01:57:17 |
------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1134615 consistent gets
1134591 physical reads
0 redo size
420 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> exec dbms_monitor.client_id_trace_disable(client_id => 'FTS');
PL/SQL procedure successfully completed.
So I expect to see a single trace file in the user_dump_dest directory
[oracle@ISP4400 PX]$ cd /u01/app/oracle/diag/rdbms/test11/TEST11/trace [oracle@ISP4400 trace]$ ls -ltra total 9128 drwxr-x--- 13 oracle oracle 4096 Sep 16 11:33 .. -rw-rw-r-- 1 oracle oracle 6725 Sep 19 20:49 ccf.sql -rw-rw-r-- 1 oracle oracle 1650 Sep 19 20:50 ccf2.sql drwxrwxr-x 2 oracle oracle 4096 Sep 22 00:32 cdmp_20070922003230 -rw-rw---- 1 oracle oracle 298602 Sep 23 12:52 TEST11_ora_9977_default.trm -rw-rw---- 1 oracle oracle 5085019 Sep 23 12:52 TEST11_ora_9977_default.trc -rw-rw---- 1 oracle oracle 59 Sep 23 13:01 TEST11_m000_10050.trm -rw-rw---- 1 oracle oracle 882 Sep 23 13:01 TEST11_m000_10050.trc -rw-rw---- 1 oracle oracle 113602 Sep 23 13:23 TEST11_ora_10100_FTS.trm -rw-r----- 1 oracle oracle 196158 Sep 23 13:24 alert_TEST11.log -rw-rw---- 1 oracle oracle 1696437 Sep 23 13:24 TEST11_ora_10100_FTS.trc
(Note the new *.trm Trace Mapping file.)
When I look at the contents of the trace file, sure enough, all of the i/o wait events are direct path reads.
PARSING IN CURSOR #1 len=39 dep=0 uid=0 oct=3 lid=0 tim=1190549917314881
hv=1002667060 ad='6d964890' sqlid='0quzqwnxw6z1n'
select count(*) from testuser.test_tab1
END OF STMT
PARSE #1:c=2999,e=2606,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1190549917314863
EXEC #1:c=0,e=131,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1190549917315423
WAIT #1: nam='SQL*Net message to client' ela= 20 driver id=1650815232
#bytes=1 p3=0 obj#=-1 tim=1190549917315531
WAIT #1: nam='direct path read' ela= 12764 file number=6 first dba=10
block cnt=15 obj#=70940 tim=1190549917337012
WAIT #1: nam='direct path read' ela= 18221 file number=6 first dba=25
block cnt=83 obj#=70940 tim=1190549917359346I had a quick look at this page in the documentation and it implies that direct path reads would be associated with parallelism.
On a healthy system, physical read waits should be the biggest waits after the idle waits. However, also consider whether there are direct read waits (signifying full table scans with parallel query) or db file scattered read waits on an operational (OLTP) system that should be doing small indexed accesses.
but these statements aren't using PX. I probably am missing something.
Any ideas?
Sep 23: Parallel Query and 11g
I've been playing around with 11g on and off this week, re-running some of the tests in this document (PDF file).
The main edits required to the scripts were simple ones to correct the locations of trace files, for example (where $ORACLE_BASE is /ora on the 10g system)
They were always hard-coded because I never planned to re-run the tests much but, even if I had used $ORACLE_BASE/admin/$ORACLE_SID/udump (and bdump), which would have been the correct approach, the scripts still wouldn't have worked because of Oracle's decision to play around with OFA directory structures yet again! I can't help feeling this is going to cause more problems than expected because there are a lot of scripts on systems out there that depend on the existing locations.
Anyway, on to the tests themselves. I was keen to try out the PARALLEL_IO_CAP_ENABLED parameter. The first step was to use the new I/O calibration tool. I simply lifted the example straight out of the documentation and amended the number of available disks.
Well, it's certainly recognised how pathetic my i/o infrastructure is, as I've mentioned in the past. Oh, and I should mention that this is a 5-disk stripe set with a stripe width of 256KB. That's too small, but was the one the Linux installation decided on and matched the one I used in the original paper, so I've left it as it is for now.
Now to see how this impacts the decision to use PX for a query. The problem is that the existing scripts use hints to request a specific degree of parallelism (DOP), e.g.
So I threw together a slightly modified script that requests PX but doesn't specify the DOP in the hint, leaving it to Oracle to decide :-
Next I'll run the test and then check the contents of v$pq_tqstat to see if parallelism was used (note - this is only enough of the output to illustrate my point).
So the DOP is cpu_count (4) * parallel_threads_per_cpu (2) = 8 px slaves per slave set, with two slave sets used. That's what I expected.
With parallel_io_cap_enabled
Now I'll look at the effect of the new parameter.
This time, the contents of v$pq_tqstat were
So, because of the limited i/o resources, Oracle's decided not to use PX for the same query, with the same hints. What I found interesting was that the execution plan still looks like PX is used. (Just a snippet of output again.)
There'll be more to say later, no doubt, but that's all for now.
The main edits required to the scripts were simple ones to correct the locations of trace files, for example (where $ORACLE_BASE is /ora on the 10g system)
rm /ora/admin/TEST1020/udump/*.trc
and $ORACLE_BASE is /u01/app/oracle on the 11g system
rm /u01/app/oracle/diag/rdbms/test11/TEST11/trace/*.trc
They were always hard-coded because I never planned to re-run the tests much but, even if I had used $ORACLE_BASE/admin/$ORACLE_SID/udump (and bdump), which would have been the correct approach, the scripts still wouldn't have worked because of Oracle's decision to play around with OFA directory structures yet again! I can't help feeling this is going to cause more problems than expected because there are a lot of scripts on systems out there that depend on the existing locations.
Anyway, on to the tests themselves. I was keen to try out the PARALLEL_IO_CAP_ENABLED parameter. The first step was to use the new I/O calibration tool. I simply lifted the example straight out of the documentation and amended the number of available disks.
SQL> @io
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 lat INTEGER;
3 iops INTEGER;
4 mbps INTEGER;
5 BEGIN
6 -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (5, 10, iops, mbps, lat);
8
9 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
10 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
11 dbms_output.put_line('max_mbps = ' || mbps);
12 end;
13 /
max_iops = 112
latency = 8
max_mbps = 32
PL/SQL procedure successfully completed.Well, it's certainly recognised how pathetic my i/o infrastructure is, as I've mentioned in the past. Oh, and I should mention that this is a 5-disk stripe set with a stripe width of 256KB. That's too small, but was the one the Linux installation decided on and matched the one I used in the original paper, so I've left it as it is for now.
Now to see how this impacts the decision to use PX for a query. The problem is that the existing scripts use hints to request a specific degree of parallelism (DOP), e.g.
SELECT /*+ parallel(tt1,$DOP) parallel(tt2, $DOP) */ MOD(tt1.pk_id + tt2.pk_id, 113), COUNT(*)
FROM test_tab1 tt1, test_tab2 tt2
WHERE tt1.pk_id = tt2.pk_id
GROUP BY MOD(tt1.pk_id + tt2.pk_id ,113)
ORDER BY MOD(tt1.pk_id + tt2.pk_id ,113); So I threw together a slightly modified script that requests PX but doesn't specify the DOP in the hint, leaving it to Oracle to decide :-
SELECT /*+ parallel(tt1) parallel(tt2) */ MOD(tt1.pk_id + tt2.pk_id, 113), COUNT(*)
Without parallel_io_cap_enabled
SQL> show parameter parallel_io NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ parallel_io_cap_enabled boolean FALSE
Next I'll run the test and then check the contents of v$pq_tqstat to see if parallelism was used (note - this is only enough of the output to illustrate my point).
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes
2 FROM v$pq_tqstat
3 ORDER BY dfo_number DESC, tq_id, server_type DESC , process;
DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS BYTES
---------- ---------- ---------- ---------- ---------- ----------
1 0 Producer P000 8220476 58056397
Producer P001 8104871 57233905
Producer P002 8075353 57029517
Producer P003 7816421 55186002
Producer P004 8088829 57061516
Producer P005 8008482 56470221
Producer P006 8491092 59649083
Producer P007 8730476 61174367
Consumer P008 8189988 57718569
Consumer P009 8196864 57766896
Consumer P010 8192339 57734621
Consumer P011 8189512 57715446
Consumer P012 8192294 57734372
Consumer P013 8192216 57734131
Consumer P014 8193654 57745093
Consumer P015 8189133 57711880So the DOP is cpu_count (4) * parallel_threads_per_cpu (2) = 8 px slaves per slave set, with two slave sets used. That's what I expected.
With parallel_io_cap_enabled
Now I'll look at the effect of the new parameter.
SQL> alter system set PARALLEL_IO_CAP_ENABLED=true; System altered.
This time, the contents of v$pq_tqstat were
SQL> SELECT dfo_number, tq_id, server_type, process, num_rows, bytes 2 FROM v$pq_tqstat 3 ORDER BY dfo_number DESC, tq_id, server_type DESC , process; no rows selected
So, because of the limited i/o resources, Oracle's decided not to use PX for the same query, with the same hints. What I found interesting was that the execution plan still looks like PX is used. (Just a snippet of output again.)
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%
CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
---------------------------------------------
| 0 | SELECT STATEMENT | | 65M| 751M| | 585K
(13)| 03:22:10 | | | |
| 1 | PX COORDINATOR | | | | |
| | | | |
| 2 | PX SEND QC (ORDER) | :TQ10002 | 65M| 751M| | 585K
(13)| 03:22:10 | Q1,02 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 65M| 751M| 2513M| 585K
(13)| 03:22:10 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 65M| 751M| | 285K
(11)| 01:38:33 | Q1,02 | PCWP | |
| 5 | PX SEND RANGE | :TQ10001 | 65M| 751M| | 285K
(11)| 01:38:33 | Q1,01 | P->P | RANGE |
|* 6 | HASH JOIN | | 65M| 751M| 1126M| 285K
(11)| 01:38:33 | Q1,01 | PCWP | |
| 7 | PX RECEIVE | | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,01 | PCWP | |
| 8 | PX SEND BROADCAST | :TQ10000 | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | P->P | BROADCAST |
| 9 | PX BLOCK ITERATOR | | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | PCWC | |
| 10 | TABLE ACCESS FULL| TEST_TAB2 | 65M| 375M| | 112K
(9)| 00:38:52 | Q1,00 | PCWP | |
| 11 | PX BLOCK ITERATOR | | 65M| 375M| | 112K
(9)| 00:39:00 | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL | TEST_TAB1 | 65M| 375M| | 112K
(9)| 00:39:00 | Q1,01 | PCWP | |There'll be more to say later, no doubt, but that's all for now.
« previous page
(Page 1 of 1, totaling 3 entries)
next page »


Comments