Sep 24: 11g and direct path reads
I've been thinking about this on and off today and I really can't understand the behaviour I'm seeing. In fact, it's a good job it was a busy day or it might have sent me round the twist

I've been playing around a little tonight, trying to find the obvious mistake I'm making and I can't see it. I tried various different statements on some smaller test tables with similar results.
One of the comments Jonathan Lewis left in the last blog gave me food for thought, though. He commented on the lack of a segment checkpoint at the start of the full table scan.
"One interesting omission is the absence of the segment checkpoint that normally precedes a direct path tablescan."
Maybe the FTS is using direct path reads because it's a straightforward read of the table and no blocks for that segment exist in the buffer cache yet? I decided to try updating one of the rows, which would perform a full table scan (there are no indexes on this table) which I expected to use db file scattered reads.
First I performed another FTS, just to make sure that it was using direct path reads. It was, as confirmed by both an ASH report and a trace file (I'm starting to get paranoid!

SQL> select count(*) from test_tab2; COUNT(*) ---------- 65536000
So I picked out a row to update and updated it. This would require a FTS because I'm using an unindexed column to identify the row.
SQL> select * from test_tab2 where rownum=1; PK_ID NUM_CODE ---------- ---------- STR_PAD -------------------------------------------------------------------------------- 1 1 tReEjSRqBLQHiFYNCokRvNANVmohDvdLuuLbtQHwlgwIjNhXezdYxjeOESlmOvBvFbicWgKoUgZfbhkz mRFAfIZyMxOHLJrqIfkF SQL> update test_tab2 set str_pad='DOUG_TEST' where pk_id=1; 1 row updated.
I monitored this while it was running and finally, from an ASH report, I could see the wait event I was looking for.
Event Event Class % Event Sessions ----------------------------------- --------------- ---------- ---------- db file scattered read User I/O 59.15 0.34 CPU + Wait for CPU CPU 30.99 0.18
I committed the update. So now there should be multiple blocks of this table in the buffer cache, including one that's been updated. In another session I performed the same simple select count against test_tab2 and this time, there was a segment checkpoint in there.
PARSING IN CURSOR #3 len=30 dep=0 uid=88 oct=3 lid=88 tim=1190662307795682 hv=3408193527 ad='7a0e6e68' sqlid='5f6zrrv5k9tzr' select count(*) from test_tab2 END OF STMT PARSE #3:c=5000,e=4564,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1190662307795665 EXEC #3:c=0,e=231,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1190662307796185 WAIT #3: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=5808 tim=1190662307796296 WAIT #3: nam='reliable message' ela= 25 channel context=2042343360 channel handle=2042172560 broadcast message=2043267192 obj#=5808 tim=1190662307801012 WAIT #3: nam='reliable message' ela= 340 channel context=2042343360 channel handle=2042172560 broadcast message=2043267192 obj#=5808 tim=1190662307801451 WAIT #3: nam='enq: KO - fast object checkpoint' ela= 1994 name|mode=1263468550 2=65556 0=1 obj#=5808 tim=1190662307803549 WAIT #3: nam='direct path read' ela= 32596 file number=6 first dba=4106 block cnt=22 obj#=70941 tim=1190662307841063 WAIT #3: nam='direct path read' ela= 58 file number=6 first dba=4128 block cnt=96 obj#=70941 tim=1190662307844828
It's still using direct path reads, though, and not a suggestion of parallelism anywhere!
#1 - Jonathan Lewis said:
2007-09-24 20:24 - (Reply)
Doug,
You beat me to it - I was going to suggest the same experiment.
Direct path reads ARE going to operate faster than scattered reads, and have less impact on other processes because they avoid latches - so they are a good thing.
BUT - to do this, Oracle must either be counting how many dirty blocks there are in memory for each segment (which wouldn't be hard as it's already got v$segstat helping in that type of thing) or it walks the checkpoint queue counting the buffers from the objects before operating.
The next test is to find out how many blocks have to be dirty before Oracle decides that scattered reads would be quicker than doing the writes and using direct path reads.
#1.1 - Doug Burns said:
2007-09-24 20:28 - (Reply)
Direct path reads ARE going to operate faster than scattered reads, and have less impact on other processes because they avoid latches - so they are a good thing.
Oh, absolutely! That's what I picked up from the serial direct reads exercise from last year. I was just surprised to see it and, dare I say it, quite excited
Then again, I decided to run a couple of sessions scanning the same table because I wondered about each process running it's own reads without the benefit of the buffer cache (but I suppose that's limited anyway for FTS). Both sessions used direct path reads.
Still a lot to play around with though, as you say.
#1.1.1 - Marco Gralike said:
2007-09-24 22:34 - (Reply)
Just to tickle your paranoia...
I long time ago (when I was young ) and parallelism came to light, I once encountered a "feature" where when one table was created in parallel all statements were handled by the cost based optimizer, despite the instance rule based optimizer setting. Could it be some "strange" behaviour like described...
or just, missing out on a new hidden parameter / feature (http://julian.dyke.users.btopenworld.com/com/Internals/Parameters/Parameters.html)
#1.1.1.1 - Doug Burns said:
2007-09-24 22:45 - (Reply)
I went to look for that link today and then got distracted by other (real) work
This one certainly looks promising ...
http://julian.dyke.users.btopenworld.com/com/Internals/Parameters/adaptive_direct_read.html
#1.1.1.1.1 - Marco Gralike said:
2007-09-25 00:10 - (Reply)
Problem here is that, I guess, this is "old" stuff, so you would have noticed this behaviour before...
#1.1.1.1.1.1 - Doug Burns said:
2007-09-25 05:48 - (Reply)
But maybe some default values have changed?
#1.1.1.2 - Doug Burns said:
2007-09-24 22:50 - (Reply)
There are certainly some interesting possibilities to try here.
#1.2 - Lakshmi 2008-10-23 06:18 - (Reply)
Hi ,
Do Cached blocks in SGA and any sorting operations involved in the sql later on after a FTS influence the decision of Direct Path Reads or Scatter read ?
#1.2.1 - Doug Burns said:
2008-10-23 21:03 - (Reply)
I really have no idea, but maybe I'll play around with it more one day.
#1.3 - Taral Desai 2009-08-02 14:14 - (Reply)
Sir,
Please guide me on this. We are using oracle 10g and we do parallel on huge table joins.
So, point is if direct path read is beneficiary for full table scan and 4-5 table(huge) join in parallel how do we achieve this as sometimes we see scattered or serial read. We see them from temp also direct path read temp.
#1.3.1 - Doug Burns said:
2009-08-04 16:19 - (Reply)
I'm sorry, there just isn't enough information here for me to give you a sensible reply. However, I show one reason you might see db file sequential read in comment 2.1
#2 - Naresh 2007-09-26 02:52 - (Reply)
Hello Doug,
I have a question about direct path reads.
My understanding about how it works is
- checkpoint is done for the segment to make all the blocks of the segment upto-date
- SGA is bypassed and the blocks are read directly into PGA
What happens, if during the course of the read operation, one of the blocks is changed on the segment and actually written to the data file, and it is read as part of he direct path read? It will need to be processed with the undo to make it consistent - so the SGA cannot be bypassed. What is the mechanism to handle this?
Thanks,
Naresh
#2.1 - Doug Burns said:
2007-09-26 04:46 - (Reply)
Good question, Naresh, and it cropped up when I was testing the update session alongside the select session.
If you look at the trace file for the select session while a large uncommited update is running in another sessionm you will see a combination of events like this.
WAIT #5: nam='direct path read' ela= 18651 file number=6 first dba=36864 block cnt=96 obj#=70941 tim=1190781659239010
WAIT #5: nam='db file sequential read' ela= 13362 file#=3 block#=557 blocks=1 obj#=0 tim=1190781659253240
WAIT #5: nam='db file sequential read' ela= 494 file#=3 block#=558 blocks=1 obj#=0 tim=1190781659254793
WAIT #5: nam='db file sequential read' ela= 495 file#=3 block#=559 blocks=1 obj#=0 tim=1190781659255855
WAIT #5: nam='db file sequential read' ela= 496 file#=3 block#=560 blocks=1 obj#=0 tim=1190781659257364
WAIT #5: nam='direct path read' ela= 20673 file number=6 first dba=36960 block cnt=96 obj#=70941 tim=1190781659280734
WAIT #5: nam='direct path read' ela= 4736 file number=6 first dba=37056 block cnt=96 obj#=70941 tim=1190781659287898
File number 3 is -
/u01/app/oracle/oradata/TEST11/undotbs01.dbf
#3 - Gregory said:
2008-09-25 18:15 - (Reply)
Doug,
I guess with the ExaData Storage, those Direct Read get a complete different perspective, doesn't it?
#3.1 - Doug Burns said:
2008-09-27 00:12 - (Reply)
Good point, Gregory and I've just posted a related comment on Kevin Closson's blog here ....
http://kevinclosson.wordpress.com/2008/09/26/oracle-exadata-storage-server-part-iii-a-faq-is-born/#comment-33012
(Kevin moderates his comments, so it might take a while to appear)
#3.1.1 - Gregory said:
2008-09-27 08:31 - (Reply)
It didn't take long!
Btw, there is still a lot of space for improvement. I'm looking forward to see a TPC-H, they start to win deals against Teradata and It pre-aggregates results. I guess that last change will require, they rearchitect a lot in the rdbms but we can dream.
#3.1.1.1 - Doug Burns said:
2008-09-27 16:07 - (Reply)
It pre-aggregates results. I guess that last change will require, they rearchitect a lot in the rdbms but we can dream.
I'm not sure it does. I think one of the best things about this solution in the DW space will be that Oracle is already very functionally rich and this is in addition to MVs, OLAP, compression etc.
But I really don't know enough about it yet to comment with any certainty.
#4 - Adrian Angelov 2008-10-09 11:18 - (Reply)
We've done some tests 10g/11g and were surprised because of this direct path read/db file scattered reads thing. Our tests are:
Two databases - one 11g and one 10g with same sga_target,sga_max_size(500MB) and pga_aggregate_target(200MB), everything else is as it is set during install by DBCA. These two databases run on one host.
The storage is a NAS(6 discs) managed by 11g ASM (one normal redundancy diskgroup ) which is used by the databases.
I imported a 3GB table on every database and gathered stats. To not go in detail as of now, the results are:
three consecutive runs on 11g.
average execution time - 50 seconds
three consecutive runs on 10g.
average execution time - 125 seconds
The statement is:
select /*+ full(t) nocache(t) */ count(1) from mb.trn t;
and there is no other activity on the host and the storage.
Same number of logical and physical reads during 11g and 10g runs.
During the above runs while run on:
- 11g, iostat reports ~14MB/s per disk
- 10g, iostat reports ~5MB/s per disk
After tracing these sessions we figured out that on 11g direct path read is used, while 10g uses db file scattered read.
Do you know or have any clues what causes this.
#4.1 - Doug Burns said:
2008-10-09 21:18 - (Reply)
Adrian,
First, sorry that your comment needed to be approved. One of Serendipity's more successful spam prevention measures is to require approval for comments to old blog posts.
Second, thanks for sharing. I'm very interested in this subject too.
Third, unfortunately I still don't have any information to add about why or how 11g uses Direct Path Reads where previous versions wouldn't, other than what's included in the posts here. Having said that, I've started to look at this again so there may be some more relevant posts soon and it helps to see another example, as you've provided here.
Cheers,
Doug
#4.1.1 - Oracle Developer said:
2008-10-14 18:13 - (Reply)
There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans.
In 10g, serial table scans for "large" tables used to go through cache (by default) which is not the case anymore in 11g (where this decision to read via direct path or through cache is based on various stats).
#4.1.1.1 - Doug Burns said:
2008-10-15 06:55 - (Reply)
Thanks for the information. I don't suppose there's any possibility on elaborating on the stats used? I suppose it's open to further change anyway.
#4.2 - Doug Burns said:
2009-02-01 23:57 - (Reply)
Alex Fatkulin blogs about this more here.
#5 - Hans-Peter Sloot 2008-11-06 08:14 - (Reply)
Hi Doug,
In a recent test on 11.1.0.6 on Linux I saw that db file scattered reads where done with 128 blocks (8k size, no explicitly set db file multi block read count) but the direct path reads (temp) where done with 31 blocks and 1 block per read.
I was surprised by the strange number of 31.
Do you know why?
#5.1 - Doug Burns said:
2008-11-09 08:41 - (Reply)
Hans-Peter,
I'm afraid I don't but I'm sure I've seen something similar during other tests
#6 - Doug Burns said:
2009-07-21 07:52 - (Reply)
Dion Cho blogs about this more here
#7 - Kyle Hailey said:
2012-11-09 20:10 - (Reply)
Been googling around for what controls the size of the direct path reads. The parameter above _db_file_direct_io_count looks promising but though it's set to 1M on my 11.2.0.3 system, the dpr size is 128K. I then changed
db_file_multiblock_read_count
from 16 (128K) to 128 (ie 1M) and the dpr size went to 1M, so looks like dpr size can depend on db_file_multiblock_read_count
- Kyle
#7.1 - Doug Burns said:
2012-11-11 22:06 - (Reply)
Interesting stuff, Kyle - thanks!
Tracked: Jul 24, 21:04
Tracked: Jul 24, 21:07
Tracked: Jul 24, 21:08
Tracked: Apr 21, 06:03
Tracked: Jun 24, 10:42
Tracked: Jun 27, 20:33
Tracked: Aug 15, 23:56
Tracked: Sep 26, 16:32