Doug's Oracle Blog

Entries tagged as dbms_stats

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Entries tagged as dbms_stats

Related tags
optimiser partitions

Feb 28: Statistics on Partitioned Tables - Part 5

Actually, before looking at any recent features, let me introduce one more aspect of the existing aggregation approach used by Oracle. The examples used to date have been based on INSERTing new rows into subpartitions and, although that's the approach used for some of our tables and will suit some systems, the likelihood is that in a near-real-time DW you will be using partition exchange at some point. Which means we need to understand how the stats might be gathered and then aggregated up to the partition and table-level stats.

Although there might be other approaches, I'd say that there are two distinct approaches you are likely to use.

1) Create a temporary load table, load it with data, gather statistics on it and then exchange it with the relevant subpartition in the real table.

2) Create a temporary load table, load it with data, exchange it with the relevant subpartition and then gather stats on the subpartition.

Pete Scott left a comment on a previous post stating that he rarely uses approach 1 so no doubt he'll leave another comment here expanding on his reasons ;-) What I want to show you is what happens if you do use approach 1 and introduce the _minimal_stats_aggregation hidden parameter that's been kicking around since Oracle 8i. The default setting of the parameter is TRUE, which means that Oracle minimises automatic stats aggregation activity. Let's see that in action.

First of all I'll recreate TEST_TAB1 as it was at the start of the series and add a new partition (and, by implication, the related subpartitions) and create a seperate table that I'll load the data into.

SQL> ALTER TABLE TEST_TAB1
  2  ADD  PARTITION P_20100209 VALUES LESS THAN (20100210);

Table altered.

SQL> DROP TABLE LOAD_TAB1;

Table dropped.

SQL> CREATE TABLE LOAD_TAB1
  2  AS SELECT * FROM TEST_TAB1 WHERE 1=0;

Table created.

SQL> CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1
  2  (REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)
  3  NOPARALLEL COMPRESS 1;

Index created.

Now I'll use LOAD_TAB1 to repeat the same process for the four different subpartitions - INSERT data into LOAD_TAB1, gather stats on it and then exchange it with the relevant subpartition of TEST_TAB1.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 400, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 600, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 900, 'Z');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_GROT WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_GROT REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 400, 'U');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 600, 'U');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 900, 'U');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_JUNE WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_JUNE REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 400, 'N');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 600, 'N');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 900, 'N');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_HALO WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_HALO REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 400, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 600, 'P');

1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 900, 'Z');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1');

PL/SQL procedure successfully completed.

SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_OTHERS WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION  P_20100209_OTHERS REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

All of the P_20100209 subpartitions have stats that were swapped in as part of the partition exchange operation so hopefully there'll be some aggregated global statistics.

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS                                              
------------------------------ --- -------------------- ----------                                              
TEST_TAB1                      NO                                                                               

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS               
------------------------------ ------------------------------ --- -------------------- ----------               
TEST_TAB1                      P_20100131                     NO                                                
TEST_TAB1                      P_20100201                     NO                                                
TEST_TAB1                      P_20100202                     NO                                                
TEST_TAB1                      P_20100203                     NO                                                
TEST_TAB1                      P_20100204                     NO                                                
TEST_TAB1                      P_20100205                     NO                                                
TEST_TAB1                      P_20100206                     NO                                              
TEST_TAB1                      P_20100207                     NO                                                  
TEST_TAB1                      P_20100209                     NO                                                   

9 rows selected.

Oh, well, that doesn't seem to have worked. Maybe the LOAD_TAB1 stats weren't gathered correctly or didn't appear as part of the subpartition exchange operation?

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS                                   
------------------------------ ------------------------------ --- -------------------- ----------                                    
TEST_TAB1                      P_20100131_GROT                NO                                                         
TEST_TAB1                      P_20100131_HALO                NO                                                                    
TEST_TAB1                      P_20100131_JUNE                NO                                                        
TEST_TAB1                      P_20100131_OTHERS              NO                                                

<<output snipped>>

TEST_TAB1                      P_20100209_GROT                NO  28-FEB-2010 21:41:47          3                
TEST_TAB1                      P_20100209_HALO                NO  28-FEB-2010 21:41:49          3                
TEST_TAB1                      P_20100209_JUNE                NO  28-FEB-2010 21:41:49          3                
TEST_TAB1                      P_20100209_OTHERS              NO  28-FEB-2010 21:41:50          3                

36 rows selected.


The subpartition stats are ok, then, but the aggregation process hasn't happened and that's because _miminal_stats_aggregation is set to TRUE (the default) which instructs Oracle to minimise aggregation operations and one of the ways it does so is to not aggregate statistics as a result of a partition exchange operation but to leave you to do that manually by gathering stats on the table partition. If we were to modify the parameter to a non-default value (and, being an underscore parameter, that's your own choice at your own risk ...), we would see different behaviour. I ran the same script, but with this small addition that changes the parameter setting at the session level.

SQL> alter session set "_minimal_stats_aggregation"=FALSE;

Session altered.

Which will change the end result to this ...

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS                                             
------------------------------ --- -------------------- ----------                                             
TEST_TAB1                      NO                                                                              

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS              
------------------------------ ------------------------------ --- -------------------- ----------              
TEST_TAB1                      P_20100131                     NO                                               
TEST_TAB1                      P_20100201                     NO                                               
TEST_TAB1                      P_20100202                     NO                                               
TEST_TAB1                      P_20100203                     NO                                               
TEST_TAB1                      P_20100204                     NO                                               
TEST_TAB1                      P_20100205                     NO                                               
TEST_TAB1                      P_20100206                     NO                                               
TEST_TAB1                      P_20100207                     NO                                               
TEST_TAB1                      P_20100209                     NO  28-FEB-2010 21:41:53         12              

9 rows selected.

Note that there are still no statistics at the table level because not all of the partitions have stats yet, so aggregation can't take place, but there are aggregated statistics on the P_20100209 partition, because all of the relevant subpartitions do have stats.

All you need to remember is that the default setting of _minimal_stats_aggregation means that, unless you explicitly gather statistics on the partitions you've just exchanged, aggregation will not take place! Actually, copying stats will also invoke the aggregation process too, but I'll deal with that in the next post. (Updated later. That last sentence might not be true. I've just tried something at home and I'm seeing different results at work, so more investigation needed.)

Oh, and there's much more on this subject over on Randolf Geist's blog post.
Posted by Doug Burns Comments: (18) Trackbacks: (4)
Defined tags for this entry: DBMS_STATS, Optimiser, Partitions

Feb 28: Statistics on Partitioned Tables - Part 4

In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the relevant structures then aggregation won't take place so, for example, if you have statistics for three out of four subpartitions, there won't be any aggregated global statistics on the related partition until you gather statistics on the fourth subpartition. Randolf Geist left a comment describing how you might avoid problems with this.

"In order to solve the issue of adding partitions with initially missing statistics screwing up the aggregated statistics it was taken care that newly added subpartitions got their statistics immediately updated (with 0 rows in that case) - which didn't take a lot of time since the subpartitions were empty and it solved the issue with the aggregated statistics."

That's what our system does, but we introduced a change in the last release that caused the problems that inspired this series of posts ...

First let's start with an empty table (definition hasn't changed since the first post). Now, because we are so paranoid about partitions without stats, we'll gather statistics at the PARTITION level even though the table is empty at the moment. I'm not going to specify a partition name here to cut the text back a bit, but on the real system we would have. Regardless, we'll still see the same problematic end result.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'PARTITION');

PL/SQL procedure successfully completed.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:04:24          0

OK, so the table statistics aren't true Global Statistics but that's ok, we know about that. We also know that there's no data in the table at this stage so the stats reflect that. When we look at the Partition level stats :-

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

They are true global statistics, albeit on no data at this stage, but at least we have some statistics to reflect that. Looking at the Subpartition stats :-

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                NO
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

No subpartition stats at all at this stage which is expected behaviour and we'll be gathering them later after we load the data. I'm going to skip the column statistics at this stage because I don't need them to illustrate the problem. So let's imagine that on the live system we've just created the partitions above and are about to load data into the P_20100206_GROT subpartition.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 100000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 3000000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 200000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 110000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 240000, 'U');

1 row created.

SQL> COMMIT;

Commit complete.

Next our normal stats gathering approach is invoked and we gather stats on the subpartition just loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION', 
                                         PARTNAME => 'P_20100206_GROT');

PL/SQL procedure successfully completed.

N.B. It's probably worth pointing out at this stage that I put a short pause in the test script between the original stats gathering on the empty table and the INSERTs and gather on the newly-loaded subpartition so you might want to pay attention to the LAST_ANALYZED values here.

So how do the stats look?

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

Mmmmmm .... I can see that the LAST_ANALYZED time has been updated, but NUM_ROWS is still 0 at the table level. How about the partitions?

SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, as far as the Table and Partition Statistics look, this table is still empty! That's not good and I can imagine a near future of execution plans with CARDINALITY=1 and MERGE JOIN CARTESIAN. Looking at the LAST_ANALYSED values on the Partitions, I can see that the timestamp hasn't changed, which is another sign that something is wrong.

I'll check that the subpartition stats were gathered correctly.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO
TEST_TAB1                      P_20100131_HALO                NO
TEST_TAB1                      P_20100131_JUNE                NO
TEST_TAB1                      P_20100131_OTHERS              NO
TEST_TAB1                      P_20100201_GROT                NO
TEST_TAB1                      P_20100201_HALO                NO
TEST_TAB1                      P_20100201_JUNE                NO
TEST_TAB1                      P_20100201_OTHERS              NO

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                NO
TEST_TAB1                      P_20100206_JUNE                NO
TEST_TAB1                      P_20100206_OTHERS              NO
TEST_TAB1                      P_20100207_GROT                NO
TEST_TAB1                      P_20100207_HALO                NO
TEST_TAB1                      P_20100207_JUNE                NO
TEST_TAB1                      P_20100207_OTHERS              NO

32 rows selected.

Ah, perhaps that's what the problem is. Only one of the P_20100206 subpartitions has valid stats so Oracle can not generate aggregated Global Stats at the higher levels  of the table. So I'll try to fix that by gathering statistic on all of the subpartitions in the table. (In fact, I only really need to gather stats on the remaining P_20100206 subpartitions but I'll use this approach for brevity)

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

Let's check that all of the subpartitions have valid statistics now.

SQL> select     table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100131_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100201_OTHERS              YES 28-FEB-2010 08:06:25          0

<<output snipped>>

TEST_TAB1                      P_20100206_GROT                YES 28-FEB-2010 08:06:25          5
TEST_TAB1                      P_20100206_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100206_OTHERS              YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_GROT                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_HALO                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_JUNE                YES 28-FEB-2010 08:06:25          0
TEST_TAB1                      P_20100207_OTHERS              YES 28-FEB-2010 08:06:25          0

32 rows selected.


OK, so Oracle should have aggregated the subpartition stats to generate global stats on the partitions and table.

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:25          0

SQL> 
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100201                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100202                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100203                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100204                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100205                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100206                     YES 28-FEB-2010 08:04:24          0
TEST_TAB1                      P_20100207                     YES 28-FEB-2010 08:04:24          0

8 rows selected.

So, according to the table and the partition stats, the table is still empty and those partition statistics still haven't been updated!

The problem here is that Oracle won't overwrite true global stats with aggregated global stats. When you think about it, that's a sensible approach because if I have a strategy of collecting Table and Partition stats (i.e. the Oracle-recommended strategy covered in the first post) then the last thing I want is those global stats constantly being overwritten by aggregated stats (with incorrect NDVs) when stats are gathered on subpartitions!

Our mistake here could be viewed as a combination of a) not following Oracle recommendations (because if we did, we'd also be gathering global stats on the Table and Partitions using a seperate task and b) once we depart from that strategy, gathering stats at the incorrect level. Those Partition stats that we gathered can never be over-written except by gathering stats again on the Partitions, which would then be aggregated up to the table level.

Allowing for the fact we want to (have to?), use our current approach, we should only ever gather stats at the SUBPARTITION level which will then be aggregated up to the Table and the Partition level.

As for the fix, we deleted the existing stats, to rid the partitions of their global stats and then regathered at the SUBPARTITION level as a one-off exercise.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION');

PL/SQL procedure successfully completed.

The important change is that we now have aggregated stats at both the Table and Partition levels which can then be updated by the aggregation process as we gather stats on new SUBPARTITIONS. Checking the statistics on the Table and Partitions ...

SQL> select     table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  28-FEB-2010 08:06:26          5

SQL> 
SQL> select     table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100201                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100202                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100203                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100204                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100205                     NO  28-FEB-2010 08:06:26          0
TEST_TAB1                      P_20100206                     NO  28-FEB-2010 08:06:26          5
TEST_TAB1                      P_20100207                     NO  28-FEB-2010 08:06:26          0

8 rows selected

All of the partition stats have been updated and are now aggregated rather than true global stats. A modification to the metadata that our stats process uses to change the granularity from PARTITION to SUBPARTITION will ensure stats are always gathered at the subpartition level and stop the problem from re-occuring.

You could argue that we could have avoided all of this by just using the default stats gathering strategy and not try to be too clever, but we would really struggle to support the required additional workload. Oh, and this example makes the problem obvious because the stats were gathered on empty partitions, we knew we'd done so and it was relatively easy to spot zero-row partitions, but imagine if someone gathered statistics on your partitions manually for some reason (it wouldn't be difficult to decide that seemed sensible) and the row counts for partitions are several million or so, frozen and stuck that way forever until someone decides to repeat the process? Would you really notice the aggregation process wasn't working for some reason?

Regardless of whether the problem is self-inflicted, as soon as we spotted this mistake, I could imagine others making the same mistake if they don't understand the aggregation process fully.

In the next few posts I'll look at some of the new approaches Oracle has introduced which we've investigated, to see if they can help us to gather better global statistics and/or reduce our stats-gathering workload.
Posted by Doug Burns Comments: (3) Trackbacks: (0)
Defined tags for this entry: DBMS_STATS, Optimiser, Partitions

Feb 23: Statistics on Partitioned Tables - Part 3

As soon as I'd committed my last post, I knew it wasn't what I'd hoped for and said as much to a couple of people before they'd read it. I knew it would probably just add to any confusion people already had about this subject (something I'm particularly keen to avoid) but I am awash with examples at the moment and trying to pick out the right points to illustrate, in the right order, to the right depth. This is probably more of a White Paper subject, in retrospect, but I'll press on anyway.

To summarise where we are so far, though, and highlight a couple of key points ...

1) As I said in the first post, all of the examples to date are on Oracle 10.2.0.4 but I think you would see similar behaviour on earlier releases that I don't have to hand right now. The default parameters would be different, but the aggregation process goes back a long way. i.e. This is not about any 11g features, at least not yet. That will come later. (I still wish I'd put 10g in the post titles, though, like I did with the Adaptive Thresholds posts.)

2) Partitioned objects have two different kinds of stats - Global Statistics describe a Table or Partition as a whole, including all of it's child structures and Partition Statistics describe individual partitions and subpartitions. In addition, Oracle has the capability to take Partition stats and aggregate them up to generate Aggregated Global Statistics. Hopefully the last post illustrated that some elements of aggregated stats seem reliable but some not, particularly Number of Distinct Values (NDV).

3) So when looking at optimiser stats, it's essential that you look at columns like GLOBAL_STATS and at the HIGH_VALUE, LOW_VALUE and NUM_DISTINCT columns or you might kid yourself into thinking that your stats are better than they really are. I suspect that's what had happened at my current site. Be honest with yourself. Excluding those experts who might read this, how many of you have taken a quick glance at NUM_ROWS and LAST_ANALYZED columns to reassure yourself your stats are ok? You need to be careful with this stuff and Greg's post can help you check.

4) One of the more confusing aspects of the first two posts is that they showed completely different strategies to collecting stats on our tables. The first post covered the Oracle-recommended 10.2 default behaviour of gathering GLOBAL AND PARTITION stats down to the Partition level. The second post showed a completely different strategy we use on many tables of gathering no stats at all at the Table and Partition level, but gathering Subpartition statistics and having Oracle aggregate them up to the higher levels in an attempt to reduce stats gathering activity. That was deliberate, as in the next post I'm going to show you how these two strategies combined in the wrong way can cause trouble.

5) If I posted all of the examples each time, it would become a pretty long post, so I'm going to ask you to refer back to earlier posts if you want to check table definitions and the like. This is a series after all ;-) At the end, I might try to tidy everything up and post it all in one script, showing the various examples. Should that ever happen, it will be after the Hotsos Symposium. i.e. Don't hold your breath.

Hopefully that little summary will help us move on to the specific problem that we faced at work and some of the options we're looking at (because, yes folks, I have read other posts and do know some of the options but I'm trying to work my way through them here. Give it time ;-)). It might take two posts though.



What went wrong on our current system? Remember that we have traditionally gathered purely at the SUBPARTITION level and allowed Oracle to aggregate those to generate the TABLE and PARTITION stats (i.e. the approach shown in post 2). I'll be honest that it wasn't a strategy I'd seen used almost exclusively across a system before. I'd been lucky enough to find some way of gathering PARTITION stats at the very least. First I'll show you a simulation of what would happen when we start loading data for a new day. First we add a new subpartition (and because we have a subpartition template, the subpartitions are created too) then I'll insert some rows into the different new subpartitions. (Note that we actually use both inserts and partition exchange, depending on the table, but I'll deal with partition exchange later.)

SQL> ALTER TABLE TEST_TAB1
  2  ADD  PARTITION P_20100208 VALUES LESS THAN (20100209);

Table altered.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 30000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 10000, 'Z');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 500, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 700, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'JUNE', 1200, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 600, 'P');

1 row created.

SQL> COMMIT;

Commit complete.

At this stage there are no stats on the new partition or subpartitions and all of the previous stats look the same. (I'll make things more succinct by avoiding the column stats for now.)

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:09:55         27

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:09:55          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:09:55          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<output snipped>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                NO
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

Which is probably what you expected. Now I'm going to simulate what would happen when one of the source data feeds complete and we'll gather stats on that subpartition now that the data is loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_GROT');

PL/SQL procedure successfully completed.

SQL> 
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO

SQL> 
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:11:39          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:11:39          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

Woah! What happened to our Aggregated Global Stats on the TABLE? It looks like it's never had statistics at all! Oh, and why are there no Aggregated Stats on the new partition either, given that I just gathered stats for one of it's subpartitions? Well the problem is that Oracle will only aggregate statistics when all of the components have stats that can be aggregated. The problem here is that, at this stage, P_20100208_GROT is the only subpartition of P_20100208 that has stats. The others haven't been gathered yet.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<output snipped>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                YES 23-FEB-2010 06:11:39          5
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

Of course, this is all behaving exactly as designed, Oracle keep emphasising that people should gather using the default granularity of 'AUTO' and so anyone who does this is asking for trouble, but the reality is that people are trying this. Look no further than Peter Scott's comment to see that someone else has come across this before now!

Let's look at the stats once the other data feeds complete and we gather the rest of the subpartition stats ....

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_JUNE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_HALO');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_OTHERS');

PL/SQL procedure successfully completed.

SQL>
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:31:24         37

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:31:24          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:31:24          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100208                     NO  23-FEB-2010 06:31:24         10

9 rows selected.

That looks much better. So, if you are going to use this approach (and I hope this series of blogs helps you decide it's questionable) you should only gather stats on the subpartitions when you have all of the subpartitions populated and gather them all at the same time. In fact, in that case, why not just gather at the PARTITION level and get proper Global Statistics on your partitions?

Believe me, there are more horrors to come ....
Posted by Doug Burns Comments: (11) Trackbacks: (7)
Defined tags for this entry: DBMS_STATS, Optimiser, Partitions

Feb 22: Statistics on Partitioned Tables - Part 2

In the last part, I asked you to trust me that true Global Stats are a good thing so in this post I hope to show you why they are, to make sure you don't kid yourself that you can avoid them. (Updated later - this is all on 10.2.0.4)

Why would you even want to avoid them? Global stats can take a lot of work to gather if you're working with very large objects because Oracle has to visit all partitions. As an alternative, Oracle has the capability to aggregate lower level statistics to generate simulated global statistics at higher levels of the same object. In our case, as we INSERT data into new subpartitions or use partition exchange operations, we gather statistics at the SUBPARTITION level and allow the statistics to aggregate up to the PARTITION and TABLE level. Here's how it looks ....

I'll delete the existing table stats and regather at the SUBPARTITION level.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1') 
 

PL/SQL procedure successfully completed. 

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION') 

PL/SQL procedure successfully completed. 

Note that, because I haven't specified a subpartition name, all of the subpartitions will need to be visited to gather these statistics and will result in the following Table, Partition and Subpartition stats.

SQL> select  table_name, global_stats, last_analyzed, num_rows 
from dba_tables 
where table_name='TEST_TAB1' 
and owner='TESTUSER' 
order by 1, 3 desc nulls last 

TABLE_NAME                     GLO LAST_ANALYZED            NUM_ROWS 
------------------------------ --- ---------------------- ---------- 
TEST_TAB1                      NO  16-FEB-2010 16:23:32           11 

1 row selected. 

I can see that the table statistics are not global stats, but the number of rows looks right. These stats are actually aggregated statistics that Oracle has populated, based on the data found in the subpartitions. Let's look at the partitions.

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows 
from dba_tab_partitions 
where table_name='TEST_TAB1' 
and table_owner='TESTUSER' 
order by 1, 2, 4 desc nulls last 

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS 
------------------------------ ------------------------------ --- ---------------------- -------- 
TEST_TAB1                      P_20100131                     NO  16-FEB-2010 16:23:32          0 
TEST_TAB1                      P_20100201                     NO  16-FEB-2010 16:23:32          4 
TEST_TAB1                      P_20100202                     NO  16-FEB-2010 16:23:32          2 
TEST_TAB1                      P_20100203                     NO  16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100204                     NO  16-FEB-2010 16:23:32          2 
TEST_TAB1                      P_20100205                     NO  16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100206                     NO  16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100207                     NO  16-FEB-2010 16:23:32          0 

8 rows selected. 

Again, the aggregated stats appear to be an accurate reflection of the data. How do the subpartition stats look?

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows 
from dba_tab_subpartitions 
where table_name='TEST_TAB1' 
and table_owner='TESTUSER' 
order by 1, 2, 4 desc nulls last 

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS 
------------------------------ ------------------------------ --- ---------------------- -------- 
TEST_TAB1                      P_20100131_GROT                YES 16-FEB-2010 16:23:32          0 
TEST_TAB1                      P_20100131_HALO                YES 16-FEB-2010 16:23:32          0 
TEST_TAB1                      P_20100131_JUNE                YES 16-FEB-2010 16:23:32          0 
TEST_TAB1                      P_20100131_OTHERS              YES 16-FEB-2010 16:23:32          0 
TEST_TAB1                      P_20100201_GROT                YES 16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100201_HALO                YES 16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100201_JUNE                YES 16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100201_OTHERS              YES 16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100202_GROT                YES 16-FEB-2010 16:23:32          1 
TEST_TAB1                      P_20100202_HALO                YES 16-FEB-2010 16:23:32          1 

<<output_snipped>>

32 rows selected. 

So this looks pretty good, doesn't it? We've gathered 'Global' statistics at the subpartition level and yet the stats at the table and partition level look accurate too. Why would we want to use this approach? Well, to use my current system as an example, it's a near-real-time datawarehouse which creates tens of thousands of subpartitions per day, most of them over a period of a few hours so if we were to re-gather global statistics at the table and partition levels, there would be substantial associated stats-gathering workload and the system is under enough strain as it is. Therefore, if we can just gather stats at the subpartition level for the new subpartitions and have Oracle aggregate them to generate derived Table and Partition stats at the same time, so much the better. To simulate that, I'll insert some more data and see if the stats still look accurate after adding data and regathering at the SUBPARTITION level.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 100000, 'P');

1 row created. 

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 3000000, 'P'); 

1 row created. 

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 200000, 'P');

1 row created. 

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 110000, 'P');

1 row created. 

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'GROT', 240000, 'U');

1 row created. 

SQL> COMMIT;

Commit complete. 

I'll gather stats at the SUBPARTITION level for the only subpartition that has changed data.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION', 
                           PARTNAME => 'P_20100206_GROT');

PL/SQL procedure successfully completed. 

Time to look at the stats ...

SQL> select  table_name, global_stats, last_analyzed, num_rows
from dba_tables 
where table_name='TEST_TAB1' 
and owner='TESTUSER' 
order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED            NUM_ROWS 
------------------------------ --- ---------------------- ---------- 
TEST_TAB1                      NO  16-FEB-2010 16:23:34           16 

1 row selected. 

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows 
from dba_tab_partitions 
where table_name='TEST_TAB1' 
and table_owner='TESTUSER' 
order by 1, 2, 4 desc nulls last; 

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS 
------------------------------ ------------------------------ --- ---------------------- -------- 
TEST_TAB1                      P_20100131                     NO  16-FEB-2010 16:23:34          0 
TEST_TAB1                      P_20100201                     NO  16-FEB-2010 16:23:34          4 
TEST_TAB1                      P_20100202                     NO  16-FEB-2010 16:23:34          2 
TEST_TAB1                      P_20100203                     NO  16-FEB-2010 16:23:34          1 
TEST_TAB1                      P_20100204                     NO  16-FEB-2010 16:23:34          2 
TEST_TAB1                      P_20100205                     NO  16-FEB-2010 16:23:34          1 
TEST_TAB1                      P_20100206                     NO  16-FEB-2010 16:23:34          6 
TEST_TAB1                      P_20100207                     NO  16-FEB-2010 16:23:34          0 

8 rows selected. 

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows 
from dba_tab_subpartitions 
where table_name='TEST_TAB1' 
and table_owner='TESTUSER' 
order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS 
------------------------------ ------------------------------ --- ---------------------- -------- 
TEST_TAB1                      P_20100131_GROT                YES 16-FEB-2010 16:23:34          0 
TEST_TAB1                      P_20100131_HALO                YES 16-FEB-2010 16:23:34          0 
TEST_TAB1                      P_20100131_JUNE                YES 16-FEB-2010 16:23:34          0 
TEST_TAB1                      P_20100131_OTHERS              YES 16-FEB-2010 16:23:34          0 
TEST_TAB1                      P_20100201_GROT                YES 16-FEB-2010 16:23:34          1 
TEST_TAB1                      P_20100201_HALO                YES 16-FEB-2010 16:23:34          1 
TEST_TAB1                      P_20100201_JUNE                YES 16-FEB-2010 16:23:34          1 
TEST_TAB1                      P_20100201_OTHERS              YES 16-FEB-2010 16:23:34          1 

<<output_snipped>>

TEST_TAB1                      P_20100206_GROT                YES 16-FEB-2010 16:23:34          5
TEST_TAB1                      P_20100206_HALO                YES 16-FEB-2010 16:23:34          0
TEST_TAB1                      P_20100206_JUNE                YES 16-FEB-2010 16:23:34          0
TEST_TAB1                      P_20100206_OTHERS              YES 16-FEB-2010 16:23:34          1
TEST_TAB1                      P_20100207_GROT                YES 16-FEB-2010 16:23:34          0
TEST_TAB1                      P_20100207_HALO                YES 16-FEB-2010 16:23:34          0
TEST_TAB1                      P_20100207_JUNE                YES 16-FEB-2010 16:23:34          0
TEST_TAB1                      P_20100207_OTHERS              YES 16-FEB-2010 16:23:34          0

32 rows selected. 

Everything's still looking very good in this case and so it looks like a great strategy - low collection overhead and accurate statistics. That is, until you start drilling down to the column level statistics, using Greg Rahn's query and identify some horrible problems. These are the statistics at the table level.

SQL> select 
   a.column_name, 
   a.num_distinct, 
   display_raw(a.low_value,b.data_type) as low_val, 
   display_raw(a.high_value,b.data_type) as high_val, 
   b.data_type 
from 
   dba_tab_col_statistics a, dba_tab_cols b 
where 
   a.owner='TESTUSER' and 
   a.table_name='TEST_TAB1' and 
   a.table_name=b.table_name and 
   a.column_name=b.column_name 
order by 1;

COLUMN_NAME                NUM_DISTINCT LOW_VAL              HIGH_VAL             DATA_TYPE           
-------------------------- ------------ -------------------- -------------------- ----------- 
REPORTING_DATE                        6 20100201             20100206             NUMBER     
SEQ_ID                               14 400                  3000000              NUMBER     
SOURCE_SYSTEM                         7 GROT                 WINE                 VARCHAR2  
STATUS                                4 P                    U                    VARCHAR2  

4 rows selected. 

At first glance they look pretty good, too. To give you a specific example, the STATUS column does have the correct High Value of 'U', which has just appeared in the last set of rows that were inserted. Based on what I've seen to date, Oracle does accurately update the High/Low column values and row counts when generating aggregated stats, but there's a problem here. According to the column statistics, there are 4 distinct STATUSes in the table, but that's not true, there are only 2

SQL> select distinct STATUS from test_tab1;

S
-
U
P

SQL>  

Based on a problem with such a small number of rows and only two distinct values, the chances of the Number of Distinct Values calculated during stats aggregation being accurate looks pretty slim and, when you consider what a key input to cost-based calculations those values are ....

Why aren't the values accurate? Well let's compare High/Low values to Number of Distinct Values (NDV).

When we gathered statistics on the new subpartition, we had access to the previous High/Low values at the table level. Here are the column statistics before stats were gathered on the new subpartition.

SQL> select
  2      a.column_name,
  3      a.num_distinct,
  4      display_raw(a.low_value,b.data_type) as low_val,
  5      display_raw(a.high_value,b.data_type) as high_val,
  6      b.data_type
  7  from
  8      dba_tab_col_statistics a, dba_tab_cols b
  9  where
 10      a.owner='TESTUSER' and
 11      a.table_name='TEST_TAB1' and
 12      a.table_name=b.table_name and
 13      a.column_name=b.column_name
 14  order by 1
 15  /

COLUMN_NAME                NUM_DISTINCT LOW_VAL              HIGH_VAL             DATA_TYPE 
-------------------------- ------------ -------------------- -------------------- ----------- 
REPORTING_DATE                        6 20100201             20100206             NUMBER  
SEQ_ID                                9 400                  30000                NUMBER   
SOURCE_SYSTEM                         8 GROT                 WINE                 VARCHAR2  
STATUS                                1 P                    P                    VARCHAR2  

So at this stage, there is one distinct value of status, which is P. When we gathered stats on the new subpartition, Oracle could see all of the STATUS values for the rows in that subpartition and noticed STATUS='U' on one of the rows and could work out very easily that it's higher than 'P', so updated the High Value accordingly as per the example shown earlier.

COLUMN_NAME                NUM_DISTINCT LOW_VAL              HIGH_VAL             DATA_TYPE           
-------------------------- ------------ -------------------- -------------------- ------------
REPORTING_DATE                        6 20100201             20100206             NUMBER     
SEQ_ID                               14 400                  3000000              NUMBER       
SOURCE_SYSTEM                         7 GROT                 WINE                 VARCHAR2 
STATUS                                4 P                    U                    VARCHAR2 

Now, what to do about the NDV? Remember, Oracle can't look at any of the data in Partitions or Subpartitions other than the one we're gathering stats on (that's the point, to reduce overhead). So it has to decide what the new NDV should be based on several inputs

1) The actual values in STATUS for the rows in the subpartition we can look at.
2) The previously-gathered and stored NDV for the other subpartitions.
3) The previously-aggregated NDV stored at the table and partition levels.

The problem is that Oracle knows the number of distinct values in other subpartitions (we looked at the data previously to calculate them) but not the values themselves and, without that information, how can it say whether the 2 distinct values (P and U) in this subpartition are distinct when compared to the values in the other subpartitions? Actually, in this case, we might expect Oracle to do something clever and realise that, as there was only one distinct value of P prior to our new subpartition and the only values in the current subpartition are P and U, that there can only be 2 distinct values. Sadly, it just doesn't work that way!

Although relying on lower-level statistics being aggregated up to higher levels might initially seem like a neat trick, it's going to lead to some pretty strange statistics, at least in 10g. Which is why Oracle recommend you gather Global Stats at the TABLE and PARTITION levels.

I wish I could say that was the only problem with this aggregation process, but there's more to come in the next post ...

Posted by Doug Burns Comments: (6) Trackbacks: (5)
Defined tags for this entry: DBMS_STATS, Optimiser, Partitions

Feb 17: Statistics on Partitioned Tables - Part 1

If you've ever worked on large databases that use partitioned and subpartitioned tables, you'll be aware that there are significant challenges in maintaining up-to-date/appropriate statistics. We've encountered a few problems at work recently and I decided it would be an idea to put together a series of posts covering the basics of what can become quite an involved topic because it's not difficult to find yourself going round in circles reading the documentation, Oracle Support Notes, blog posts, forum threads and the rest until you don't know whether you're coming or going!

I'll steer clear of any remotely advanced angle and try to take some time to show simple, practical examples that might be useful to the great unwashed masses (like me). I'm pretty certain that everything I'm going to post has already been written about by the likes of Jonathan Lewis, Christian Antognini, Randolf Geist, Martin Widlake and others, but I want to write it in my own way that I can understand ;-) Sometimes I have a feeling when I write certain blog posts that I'm going to be discussing things which are apparently obvious to experienced people but I'm not convinced most people quite understand. I've no idea how many parts there might be because there's no plan here, but I know it's going to end up being too much for one post.

Added later - whilst digging out a link to Martin's blog, I noticed that he's planning a whole DBMS_STATS series soon. Sigh. Keep an eye out for that, because it will be as in-depth as always. I'll stick to the simple stuff here!

This is all on Oracle 10.2.0.4 running on Linux although we have several stats-related patches applied (probably more on those later) and I'll probably run the same tests on my own 11.2.0.1 installation later to identify any differences.

All of the examples will be based on the following table definition

SQL> CREATE TABLE TEST_TAB1
(
  REPORTING_DATE            NUMBER              NOT NULL,
  SOURCE_SYSTEM             VARCHAR2(30 CHAR)   NOT NULL,
  SEQ_ID                    NUMBER              NOT NULL,
  STATUS                    VARCHAR2(1 CHAR)    NOT NULL
)
PARTITION BY RANGE (REPORTING_DATE)
SUBPARTITION BY LIST (SOURCE_SYSTEM)
SUBPARTITION TEMPLATE
  (SUBPARTITION GROT VALUES ('GROT') TABLESPACE TEST_DAT01,
   SUBPARTITION JUNE VALUES ('JUNE') TABLESPACE TEST_DAT01,
   SUBPARTITION HALO VALUES ('HALO')  TABLESPACE TEST_DAT01,
   SUBPARTITION OTHERS  VALUES (DEFAULT)   TABLESPACE TEST_DAT01)
(  
  PARTITION P_20100131 VALUES LESS THAN (20100201) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100201 VALUES LESS THAN (20100202) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100202 VALUES LESS THAN (20100203) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100203 VALUES LESS THAN (20100204) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100204 VALUES LESS THAN (20100205) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100205 VALUES LESS THAN (20100206) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100206 VALUES LESS THAN (20100207) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100207 VALUES LESS THAN (20100208) NOLOGGING NOCOMPRESS  
)
NOCOMPRESS 
NOCACHE
NOPARALLEL
MONITORING;

Table created.

SQL> CREATE UNIQUE INDEX TEST_TAB1_IX1 ON TEST_TAB1
(REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)
  LOCAL NOPARALLEL COMPRESS 1;

Index created.

So there is a partition per REPORTING_DATE which is sub-partitioned depending on the SOURCE_SYSTEM that sent the data. It's probably worth pointing out at this stage that the table definition and test data does not match that used in the system I'm working on, but is similar enough to illustrate the issues and is pretty similar to several other systems I've seen or worked on in the past. Speaking of test data, I'd better insert some.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100202, 'GROT', 30000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100203, 'GROT', 2000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100204, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100205, 'GROT', 2400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'JUNE', 500, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'HALO', 700, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100202, 'HALO', 1200, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'WINE', 400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'WINE', 600, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100204, 'WINE', 700, 'P');

1 row created.

SQL> COMMIT;

Commit complete.

With table and data created, I'll gather some statistics using default options and it's probably worth pointing out at this stage that everyone I've spoken to at Oracle is very keen that people should start off with the default options for reasons that will hopefully become apparent.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'DEFAULT');

PL/SQL procedure successfully completed.

So let's see what statistics have been gathered and focus on the simple NUM_ROWS for now.

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from user_tables
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      YES 10-FEB-2010 16:31:17         11

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from user_tab_partitions
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 10-FEB-2010 16:31:17          0
TEST_TAB1                      P_20100201                     YES 10-FEB-2010 16:31:17          4
TEST_TAB1                      P_20100202                     YES 10-FEB-2010 16:31:17          2
TEST_TAB1                      P_20100203                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100204                     YES 10-FEB-2010 16:31:17          2
TEST_TAB1                      P_20100205                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100206                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100207                     YES 10-FEB-2010 16:31:17          0  

8 rows selected.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from user_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO   
TEST_TAB1                      P_20100131_HALO                NO   
TEST_TAB1                      P_20100131_JUNE                NO   
TEST_TAB1                      P_20100131_OTHERS              NO   
TEST_TAB1                      P_20100201_GROT                NO   
TEST_TAB1                      P_20100201_HALO                NO   

<output snipped ....there are a lot of subpartitions, all missing stats!>

So at the moment the row counts look spot-on and there are Global Statistics on both the Table and the Partitions of the table and no statistics at all on the Subpartitions. First, let's talk about global statistics. There are several good resources kicking around describing global stats so I'll list just a couple here. I always like a documentation reference and although this is the 11.2 documentation and so some of it isn't correct for 10g, I like the very simple mention of global stats given in the first two paragraphs on 13.3.1.3 - global stats are statistics on the table that describe the table as a whole, in addition to the stats on the underlying partitions. The important point is that sometimes the optimiser will use the global stats, sometimes the partition stats and sometimes both, depending on the query.  For those of you with Support access, Note 236935.1 goes into more detail.

However, our example is complicated by the fact that we have subpartitions too. So at this stage we have global stats that describe the table as a whole (including all of the underlying partitions) and global stats on each partition that describe that partition (and all of its underlying subpartitions). At this stage, let's just assume that having global stats is 'a good thing' which is why Oracle's default option is to gather them at the Table and Partition levels. In the next post I'll look at why they're important.

Why no Subpartition stats, then? Well, the optimiser is only going to use stats on subpartitions when it can guarantee that it's going to use a single subpartition and as that's probably less likely than you think, Oracle doesn't collect those stats by default, but is able to use higher level partition stats to guess what's going on at the subpartition level too. However, if you do think your queries are going to be able to drill down to a specific subpartition effectively, you can choose to gather subpartition statistics too. Beware though that, as far as I'm aware, the optimiser won't use subpartition stats at all, prior to 10.2.0.4 so there's no benefit to the additional overhead if you're running an earlier version.

In the next post I'll look at why global stats are both a good and bad thing ....
Posted by Doug Burns Comments: (11) Trackbacks: (9)
Defined tags for this entry: dbms_stats, optimiser, partitions
« previous page   (Page 1 of 1, totaling 5 entries)   next page »

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

Comments

Doug Burns about 10053 Trace Files - Different Plan in Different Environments
Tue, 02.04.2013 08:57
You're welcome. Now I just nee d to pull my finger out and ac tually come up [...]
Howard Rogers about 10053 Trace Files - Different Plan in Different Environments
Mon, 01.04.2013 23:08
Makes a big difference, so tha nks for that! With two brow ser windows, o [...]
stelioscharalambides.com about 10053 Trace Files
Sat, 30.03.2013 16:28

Upcoming Presentations

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml DBMS_STATS
xml Direct Path Reads
xml Fun
xml grid control
xml hotsos 2010
xml listener
xml Locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml Partitions
xml Patching
xml swingbench
xml The Reality Gap
xml time matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl