Feb 23: Statistics on Partitioned Tables - Part 3
To summarise where we are so far, though, and highlight a couple of key points ...
1) As I said in the first post, all of the examples to date are on Oracle 10.2.0.4 but I think you would see similar behaviour on earlier releases that I don't have to hand right now. The default parameters would be different, but the aggregation process goes back a long way. i.e. This is not about any 11g features, at least not yet. That will come later. (I still wish I'd put 10g in the post titles, though, like I did with the Adaptive Thresholds posts.)
2) Partitioned objects have two different kinds of stats - Global Statistics describe a Table or Partition as a whole, including all of it's child structures and Partition Statistics describe individual partitions and subpartitions. In addition, Oracle has the capability to take Partition stats and aggregate them up to generate Aggregated Global Statistics. Hopefully the last post illustrated that some elements of aggregated stats seem reliable but some not, particularly Number of Distinct Values (NDV).
3) So when looking at optimiser stats, it's essential that you look at columns like GLOBAL_STATS and at the HIGH_VALUE, LOW_VALUE and NUM_DISTINCT columns or you might kid yourself into thinking that your stats are better than they really are. I suspect that's what had happened at my current site. Be honest with yourself. Excluding those experts who might read this, how many of you have taken a quick glance at NUM_ROWS and LAST_ANALYZED columns to reassure yourself your stats are ok? You need to be careful with this stuff and Greg's post can help you check.
4) One of the more confusing aspects of the first two posts is that they showed completely different strategies to collecting stats on our tables. The first post covered the Oracle-recommended 10.2 default behaviour of gathering GLOBAL AND PARTITION stats down to the Partition level. The second post showed a completely different strategy we use on many tables of gathering no stats at all at the Table and Partition level, but gathering Subpartition statistics and having Oracle aggregate them up to the higher levels in an attempt to reduce stats gathering activity. That was deliberate, as in the next post I'm going to show you how these two strategies combined in the wrong way can cause trouble.
5) If I posted all of the examples each time, it would become a pretty long post, so I'm going to ask you to refer back to earlier posts if you want to check table definitions and the like. This is a series after all
Hopefully that little summary will help us move on to the specific problem that we faced at work and some of the options we're looking at (because, yes folks, I have read other posts and do know some of the options but I'm trying to work my way through them here. Give it time
What went wrong on our current system? Remember that we have traditionally gathered purely at the SUBPARTITION level and allowed Oracle to aggregate those to generate the TABLE and PARTITION stats (i.e. the approach shown in post 2). I'll be honest that it wasn't a strategy I'd seen used almost exclusively across a system before. I'd been lucky enough to find some way of gathering PARTITION stats at the very least. First I'll show you a simulation of what would happen when we start loading data for a new day. First we add a new subpartition (and because we have a subpartition template, the subpartitions are created too) then I'll insert some rows into the different new subpartitions. (Note that we actually use both inserts and partition exchange, depending on the table, but I'll deal with partition exchange later.)
SQL> ALTER TABLE TEST_TAB1 2 ADD PARTITION P_20100208 VALUES LESS THAN (20100209); Table altered. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 1000, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 30000, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2000, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 10000, 'Z'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2400, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 500, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 700, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'JUNE', 1200, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 400, 'P'); 1 row created. SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 600, 'P'); 1 row created. SQL> COMMIT; Commit complete.
At this stage there are no stats on the new partition or subpartitions and all of the previous stats look the same. (I'll make things more succinct by avoiding the column stats for now.)
SQL> select table_name, global_stats, last_analyzed, num_rows 2 from dba_tables 3 where table_name='TEST_TAB1' 4 and owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ --- -------------------- ---------- TEST_TAB1 NO 23-FEB-2010 06:09:55 27 SQL> SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_partitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131 NO 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100201 NO 23-FEB-2010 06:09:55 8 TEST_TAB1 P_20100202 NO 23-FEB-2010 06:09:55 4 TEST_TAB1 P_20100203 NO 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100204 NO 23-FEB-2010 06:09:55 4 TEST_TAB1 P_20100205 NO 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100206 NO 23-FEB-2010 06:09:55 7 TEST_TAB1 P_20100207 NO 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100208 NO 9 rows selected. SQL> select table_name, subpartition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_subpartitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131_GROT YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_HALO YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_OTHERS YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100201_GROT YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_HALO YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_JUNE YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_OTHERS YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_GROT YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_HALO YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100202_OTHERS YES 23-FEB-2010 06:09:55 0 <<output snipped>> TEST_TAB1 P_20100207_GROT YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_HALO YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_OTHERS YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100208_GROT NO TEST_TAB1 P_20100208_HALO NO TEST_TAB1 P_20100208_JUNE NO TEST_TAB1 P_20100208_OTHERS NO 36 rows selected.
Which is probably what you expected. Now I'm going to simulate what would happen when one of the source data feeds complete and we'll gather stats on that subpartition now that the data is loaded.
SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION',
partname => 'P_20100208_GROT');
PL/SQL procedure successfully completed.
SQL>
SQL> select table_name, global_stats, last_analyzed, num_rows
2 from dba_tables
3 where table_name='TEST_TAB1'
4 and owner='TESTUSER'
5 order by 1, 2, 4 desc nulls last;
TABLE_NAME GLO LAST_ANALYZED NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1 NO
SQL>
SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows
2 from dba_tab_partitions
3 where table_name='TEST_TAB1'
4 and table_owner='TESTUSER'
5 order by 1, 2, 4 desc nulls last;
TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1 P_20100131 NO 23-FEB-2010 06:11:39 0
TEST_TAB1 P_20100201 NO 23-FEB-2010 06:11:39 8
TEST_TAB1 P_20100202 NO 23-FEB-2010 06:11:39 4
TEST_TAB1 P_20100203 NO 23-FEB-2010 06:11:39 2
TEST_TAB1 P_20100204 NO 23-FEB-2010 06:11:39 4
TEST_TAB1 P_20100205 NO 23-FEB-2010 06:11:39 2
TEST_TAB1 P_20100206 NO 23-FEB-2010 06:11:39 7
TEST_TAB1 P_20100207 NO 23-FEB-2010 06:11:39 0
TEST_TAB1 P_20100208 NO
9 rows selected.
Woah! What happened to our Aggregated Global Stats on the TABLE? It looks like it's never had statistics at all! Oh, and why are there no Aggregated Stats on the new partition either, given that I just gathered stats for one of it's subpartitions? Well the problem is that Oracle will only aggregate statistics when all of the components have stats that can be aggregated. The problem here is that, at this stage, P_20100208_GROT is the only subpartition of P_20100208 that has stats. The others haven't been gathered yet.
SQL> select table_name, subpartition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_subpartitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131_GROT YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_HALO YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100131_OTHERS YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100201_GROT YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_HALO YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_JUNE YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100201_OTHERS YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_GROT YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_HALO YES 23-FEB-2010 06:09:55 2 TEST_TAB1 P_20100202_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100202_OTHERS YES 23-FEB-2010 06:09:55 0 <<output snipped>> TEST_TAB1 P_20100207_GROT YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_HALO YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_JUNE YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100207_OTHERS YES 23-FEB-2010 06:09:55 0 TEST_TAB1 P_20100208_GROT YES 23-FEB-2010 06:11:39 5 TEST_TAB1 P_20100208_HALO NO TEST_TAB1 P_20100208_JUNE NO TEST_TAB1 P_20100208_OTHERS NO 36 rows selected.
Of course, this is all behaving exactly as designed, Oracle keep emphasising that people should gather using the default granularity of 'AUTO' and so anyone who does this is asking for trouble, but the reality is that people are trying this. Look no further than Peter Scott's comment to see that someone else has come across this before now!
Let's look at the stats once the other data feeds complete and we gather the rest of the subpartition stats ....
SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION',
partname => 'P_20100208_JUNE');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION',
partname => 'P_20100208_HALO');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION',
partname => 'P_20100208_OTHERS');
PL/SQL procedure successfully completed.
SQL>
SQL> select table_name, global_stats, last_analyzed, num_rows
2 from dba_tables
3 where table_name='TEST_TAB1'
4 and owner='TESTUSER'
5 order by 1, 2, 4 desc nulls last;
TABLE_NAME GLO LAST_ANALYZED NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1 NO 23-FEB-2010 06:31:24 37
SQL>
SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows
2 from dba_tab_partitions
3 where table_name='TEST_TAB1'
4 and table_owner='TESTUSER'
5 order by 1, 2, 4 desc nulls last;
TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1 P_20100131 NO 23-FEB-2010 06:31:24 0
TEST_TAB1 P_20100201 NO 23-FEB-2010 06:31:24 8
TEST_TAB1 P_20100202 NO 23-FEB-2010 06:31:24 4
TEST_TAB1 P_20100203 NO 23-FEB-2010 06:31:24 2
TEST_TAB1 P_20100204 NO 23-FEB-2010 06:31:24 4
TEST_TAB1 P_20100205 NO 23-FEB-2010 06:31:24 2
TEST_TAB1 P_20100206 NO 23-FEB-2010 06:31:24 7
TEST_TAB1 P_20100207 NO 23-FEB-2010 06:31:24 0
TEST_TAB1 P_20100208 NO 23-FEB-2010 06:31:24 10
9 rows selected.
That looks much better. So, if you are going to use this approach (and I hope this series of blogs helps you decide it's questionable) you should only gather stats on the subpartitions when you have all of the subpartitions populated and gather them all at the same time. In fact, in that case, why not just gather at the PARTITION level and get proper Global Statistics on your partitions?
Believe me, there are more horrors to come ....
#1 - Pete Scott said:
2010-02-23 15:49 - (Reply)
I first came across the missing stats problem in 8i - so long ago that I can't remember if actually invalidated stats on tables with existing stats at table level.
At the risk of a mate flaming me for pre-empting the next part of the story. The reasons why people neglect global stats are many fold (and not just not not reading the manual). Two common ones are not having TIME to regather them and the use of partition exchange where the stats have been collected on the table to be exchanged in before the exchange happens - we then get stats exchanged but only at the partition or subpartion level (depending on what we exchange).
With the time problem - I have been known to make my own stats up and use SET_TABLE_STATS to insert what I think will tide me over until I can do it properly - row counts and the like are easy - but setting column stats either resorts to 'copy the last partition' (well not for the partition key column) or not doing it
#1.1 - Doug Burns said:
2010-02-23 16:03 - (Reply)
You're welcome to comment this time, Pete, as I had the feeling you might be first ![]()
Hopefully I've mentioned the TIME thing several times now. Gathering Global Stats can be an extremely expensive option prior to 11g and, yes, I've worked on a system where we set the stats manually in the past.
But I suspect that people might be tempted to use a variety of GRANULARITY options to meet the challenges (because they *can*) so hopefully this helps them understand some of the problems.
For example, our architect said something along the lines of 'why doesn't Oracle just work?' But, hey, you know what 'architects' are like, right? ![]()
Hopefully future posts will cover some of the other issues too - I have quite a long list in front of me.
Cheers,
Doug
#2 - Randolf Geist said:
2010-02-23 16:25 - (Reply)
I've also come across such systems where statistics were only gathered on the lowest (in that particular case subpartition) level using the minimal statistics "aggregation" approach for higher level statistics (partition / global).
In order to solve the issue of adding partitions with initially missing statistics screwing up the aggregated statistics it was taken care that newly added subpartitions got their statistics immediately updated (with 0 rows in that case) - which didn't take a lot of time since the subpartitions were empty and it solved the issue with the aggregated statistics.
@Pete: When gathering statistics on the table to be exchanged before the exchange happens, you're effectively disabling (or misleading) the GATHER AUTO default option for column histogram generation, since the column workload monitoring of the exchange table will be used in that case instead of the actual partitioned table where the queries take place normally. Not that I generally recommend to rely on GATHER AUTO for histogram maintenance, but since it is the default option in 10g and later it is worth to mention.
Randolf
#2.1 - Doug Burns said:
2010-02-23 16:33 - (Reply)
Hi Randolf,
it was taken care that newly added subpartitions got their statistics immediately updated (with 0 rows in that case) - which didn't take a lot of time since the subpartitions were empty and it solved the issue with the aggregated statistics.
Just to be clear, that's what we do already. Just hadn't got to that part yet
So we don't actually face the problem described in this post, but I could still imagine people hitting it!
Cheers
#2.2 - Pete Scott said:
2010-02-23 17:55 - (Reply)
@Randolf - thanks for that; I did not think about the histogram issue, working (as Doug says) on the dark side (or data warehousing) often causes me to shy away from hisotgorams ![]()
As for partition exchange, I rarely exchange statistics -
#2.3 - Doug Burns said:
2010-02-24 08:07 - (Reply)
Randolf,
Re-reading the post, I can see how it's a bit misleading. I said I was going to explain the problem we'd run into at work but that it might take two posts. This first one doesn't really describe our problem, because we do gather zero stats on the fly but, in a way, out solution lead to our real problem, which I'll show in the next post.
Cheers,
Doug
#3 - Flatcat and The Boys (and Girls) 2010-02-23 20:33 - (Reply)
Hi Douglas,
Thank you so much for giving us your nice shiny red vaio .... we like it very much, and we miss you. When you coming home?
Lots and lots and lots and lots of love
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#3.1 - Doug Burns said:
2010-02-24 06:06 - (Reply)
Sigh. The old laptop is *not* yours. It will have to be sold to help pay for the new one.
Now, can we get back to Oracle statistics please?
#4 - Andy Campbell said:
2010-02-25 09:41 - (Reply)
Interesting series, I always assumed the aggregation happened when the CBO created the plan - which thinking about it would be a big overhead.
Thanks
Andy
#4.1 - Doug Burns said:
2010-02-25 09:48 - (Reply)
It's about to get a whole lot more interesting. I've opened up a real can of worms here ![]()
#5 - Abhishek Tiwari said:
2010-05-28 06:49 - (Reply)
nice articles! great for someone like me who didn't even knew how to analyze his table partitions in the first place ![]()


Tracked: Feb 26, 18:04
In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the re
Tracked: Feb 28, 08:38
In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the re
Tracked: Feb 28, 08:38
Actually, before looking at any recent features, let me introduce one more aspect of the existing aggregation approach used by Oracle. The examples used to date have been based on INSERTing new rows into subpartitions and, although that's the approach use
Tracked: Feb 28, 22:23
Tracked: Mar 17, 07:44
Tracked: Mar 21, 03:30
I think the best approach here is to focus on the technical details of the mistake first and then follow with any whining, self-justification or philosophy. The mistake I made in my presentation was to suggest at least twice that simply adding a new parti
Tracked: Mar 12, 08:58