Statistics on Partitioned Tables - Part 4

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Feb 28: Statistics on Partitioned Tables - Part 4

In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the relevant structures then aggregation won't take place so, for example, if you have statistics for three out of four subpartitions, there won't be any aggregated global statistics on the related partition until you gather statistics on the fourth subpartition. Randolf Geist left a comment describing how you might avoid problems with this.

"In order to solve the issue of adding partitions with initially missing statistics screwing up the aggregated statistics it was taken care that newly added subpartitions got their statistics immediately updated (with 0 rows in that case) - which didn't take a lot of time since the subpartitions were empty and it solved the issue with the aggregated statistics."

That's what our system does, but we introduced a change in the last release that caused the problems that inspired this series of posts ...

First let's start with an empty table (definition hasn't changed since the first post). Now, because we are so paranoid about partitions without stats, we'll gather statistics at the PARTITION level even though the table is empty at the moment. I'm not going to specify a partition name here to cut the text back a bit, but on the real system we would have. Regardless, we'll still see the same problematic end result.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'PARTITION');

PL/SQL procedure successfully completed.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:04:24          0

OK, so the table statistics aren't true Global Statistics but that's ok, we know about that. We also know that there's no data in the table at this stage so the stats reflect that. When we look at the Partition level stats :-

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

They are true global statistics, albeit on no data at this stage, but at least we have some statistics to reflect that. Looking at the Subpartition stats :-

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                NO
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

No subpartition stats at all at this stage which is expected behaviour and we'll be gathering them later after we load the data. I'm going to skip the column statistics at this stage because I don't need them to illustrate the problem. So let's imagine that on the live system we've just created the partitions above and are about to load data into the P_20100206_GROT subpartition.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 100000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 3000000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 200000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 110000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 240000, 'U');

1 row created.

SQL> COMMIT;

Commit complete.

Next our normal stats gathering approach is invoked and we gather stats on the subpartition just loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION', 
                                         PARTNAME => 'P_20100206_GROT');

PL/SQL procedure successfully completed.

N.B. It's probably worth pointing out at this stage that I put a short pause in the test script between the original stats gathering on the empty table and the INSERTs and gather on the newly-loaded subpartition so you might want to pay attention to the LAST_ANALYZED values here.

So how do the stats look?

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

Mmmmmm .... I can see that the LAST_ANALYZED time has been updated, but NUM_ROWS is still 0 at the table level. How about the partitions?

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, as far as the Table and Partition Statistics look, this table is still empty! That's not good and I can imagine a near future of execution plans with CARDINALITY=1 and MERGE JOIN CARTESIAN. Looking at the LAST_ANALYSED values on the Partitions, I can see that the timestamp hasn't changed, which is another sign that something is wrong.

I'll check that the subpartition stats were gathered correctly.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

Ah, perhaps that's what the problem is. Only one of the P_20100206 subpartitions has valid stats so Oracle can not generate aggregated Global Stats at the higher levels  of the table. So I'll try to fix that by gathering statistic on all of the subpartitions in the table. (In fact, I only really need to gather stats on the remaining P_20100206 subpartitions but I'll use this approach for brevity)

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

Let's check that all of the subpartitions have valid statistics now.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_OTHERS              YES 28-FEB-2010 08:06:25          0

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_OTHERS              YES 28-FEB-2010 08:06:25          0

32 rows selected.


OK, so Oracle should have aggregated the subpartition stats to generate global stats on the partitions and table.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

SQL> 
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, according to the table and the partition stats, the table is still empty and those partition statistics still haven't been updated!

The problem here is that Oracle won't overwrite true global stats with aggregated global stats. When you think about it, that's a sensible approach because if I have a strategy of collecting Table and Partition stats (i.e. the Oracle-recommended strategy covered in the first post) then the last thing I want is those global stats constantly being overwritten by aggregated stats (with incorrect NDVs) when stats are gathered on subpartitions!

Our mistake here could be viewed as a combination of a) not following Oracle recommendations (because if we did, we'd also be gathering global stats on the Table and Partitions using a seperate task and b) once we depart from that strategy, gathering stats at the incorrect level. Those Partition stats that we gathered can never be over-written except by gathering stats again on the Partitions, which would then be aggregated up to the table level.

Allowing for the fact we want to (have to?), use our current approach, we should only ever gather stats at the SUBPARTITION level which will then be aggregated up to the Table and the Partition level.

As for the fix, we deleted the existing stats, to rid the partitions of their global stats and then regathered at the SUBPARTITION level as a one-off exercise.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

The important change is that we now have aggregated stats at both the Table and Partition levels which can then be updated by the aggregation process as we gather stats on new SUBPARTITIONS. Checking the statistics on the Table and Partitions ...

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:26          5

SQL> 
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100201                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100202                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100203                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100204                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100205                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100206                     NO  28-FEB-2010 08:06:26          5
TEST_TAB1                      P_20100207                     NO  28-FEB-2010 08:06:26          0

8 rows selected

All of the partition stats have been updated and are now aggregated rather than true global stats. A modification to the metadata that our stats process uses to change the granularity from PARTITION to SUBPARTITION will ensure stats are always gathered at the subpartition level and stop the problem from re-occuring.

You could argue that we could have avoided all of this by just using the default stats gathering strategy and not try to be too clever, but we would really struggle to support the required additional workload. Oh, and this example makes the problem obvious because the stats were gathered on empty partitions, we knew we'd done so and it was relatively easy to spot zero-row partitions, but imagine if someone gathered statistics on your partitions manually for some reason (it wouldn't be difficult to decide that seemed sensible) and the row counts for partitions are several million or so, frozen and stuck that way forever until someone decides to repeat the process? Would you really notice the aggregation process wasn't working for some reason?

Regardless of whether the problem is self-inflicted, as soon as we spotted this mistake, I could imagine others making the same mistake if they don't understand the aggregation process fully.

In the next few posts I'll look at some of the new approaches Oracle has introduced which we've investigated, to see if they can help us to gather better global statistics and/or reduce our stats-gathering workload.
Posted by Doug Burns Comments: (3) Trackbacks: (0)
Defined tags for this entry: DBMS_STATS, Optimiser, Partitions
Related entries by tags:
Statistics on Partitioned Tables - Part 5
Statistics on Partitioned Tables - Part 3
Statistics on Partitioned Tables - Part 2
Statistics on Partitioned Tables - Part 1

Trackbacks
Trackback specific URI for this entry

PingBack
Weblog: jonathanlewis.wordpress.com
Tracked: Mar 17, 07:44
PingBack
Weblog: coskan.wordpress.com
Tracked: Mar 25, 13:49

Comments
Display comments as (Linear | Threaded)

#1 - Pete Scott said:
2010-02-28 15:14 - (Reply)

I could write "been there, done it, got the t-shirt!" but that is not too helpful :-)

One thing I have done in the past is to use the SET_TABLE_STATISTICS component of the DBMS_STATS package to put in temporary values into partition or table level stats until I have opportunity to collect the real thing - I know the number of rows and blocks populated and can often make the assumption that the NDV and ranges of values are not going to be far enough wrong to cause a bigger problem than not having the information in place.

Using aggregated statistics can be viable - but it forces discipline - the discipline to gather statistics after partition creation, the discipline not to touch global level stats, the discipline to regather stats after loading the partition - now that can cause problems too!
One challenge I see is with the new interval partitions in 11g - when we gather stats we will need to revise our stats gathering code to find the name of the partition - the PARTITION FOR syntax is not available to DBMS_STATS... still I like a challenge

#2 - Doug Burns said:
2010-02-28 17:03 - (Reply)

I could write "been there, done it, got the t-shirt!" but that is not too helpful :-)

Or alternatively, write about it on the Rittman Mead blog so everyone can understand it, which *would* be helpful.

One thing I have done in the past is to use the SET_TABLE_STATISTICS component of the DBMS_STATS package to put in temporary values ...

As I commented here, that's something I've done in the past too, but it's not *quite* what the point of this post was ;-) Although there will be a copy_table_stats post coming up, I doubt I'll cover setting stats. Can't even attend Wolfgang Breitlings presentation at Hotsos which looks like it might be on this subject, given that I'm presenting at the same time :-(

the discipline to regather stats after loading the partition - now that can cause problems too!

Probably needs to be automated, eh? I doubt something that requires discipline is likely to happen these days ;-)

#3 - Doug Burns said:
2010-02-28 17:09 - (Reply)

Forgot to say, I've already mentioned you in the next post based on one of your comments on an earlier post ... a teaser for you ....

"no doubt he'll leave another comment here expanding on his reasons"

But I think I'll make you wait for the full text! LOL


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl