Pivot Tables and Multi-part Logic

A common use of CASE is to generate pivot tables or cross-matrix reports (although the new MODEL clause in 10g is more powerful). We might want to modify our previous report to just display sales information for Photo Products but to have one column per month for the last quarter of 2001.

To achieve this, we'll select the usual product and sales data, group on the product category and sub-category and apply the 10% mark-up to Photo products. However, we'll also check which month the sale occurred in before adding the amount sold to a given column, one for each month. It's important to remember here that if there is no ELSE clause and none of the conditions is TRUE, the default value of NULL will be returned, so the running SUM for that month will be unaffected. That's the approach I've used here and the behaviour has been consistent over multiple versions of Oracle, but if you're cautious, you can simply add ELSE 0 or ELSE NULL to each CASE expression.

Example 3a 

(Note that this is the first example to use a Searched CASE expression where we have a number of WHEN clauses containing discrete logical tests, rather than comparing each to the same initial test expression. This can be clearly identified because the first WHEN keyword appears immediately after the CASE keyword.)

SET LINES 80
COLUMN prod_category FORMAT A16
COLUMN prod_subcategory FORMAT A20
COLUMN oct_dollars format 999,990.90
COLUMN nov_dollars format 999,990.90
COLUMN dec_dollars format 999,990.90
SELECT   p.prod_category, p.prod_subcategory,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 10
                                AND t.calendar_year = 2001
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 10
                                AND t.calendar_year = 2001
                THEN amount_sold
   END) AS OCT_dollars,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 11
                                AND t.calendar_year = 2001
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 11
                                AND t.calendar_year = 2001
                THEN amount_sold
   END) AS NOV_dollars,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 12
                                AND t.calendar_year = 2001
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 12
                                AND t.calendar_year = 2001
                THEN amount_sold
   END) AS DEC_dollars
FROM     sales s, times t, products p
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
GROUP BY p.prod_category, p.prod_subcategory
ORDER BY 1, 2
/

PROD_CATEGORY    PROD_SUBCATEGORY     OCT_DOLLARS NOV_DOLLARS DEC_DOLLARS
---------------- -------------------- ----------- ----------- -----------
Electronics      Game Consoles         105,876.22   88,989.00  193,546.85
                 Home Audio            220,133.02  246,483.07  252,605.84
                 Y Box Accessories      14,769.55   15,101.27   12,499.64
                 Y Box Games            52,945.94   55,197.39   45,690.73
Hardware         Desktop PCs           153,857.16  239,435.84  180,910.18
                 Portable PCs          192,513.25  201,927.44  205,868.81
Peripherals and  Accessories            65,193.15   53,754.82   69,484.05
Accessories
                 CD-ROM                 63,934.53   65,239.50   37,439.03
                 Memory                126,492.82  116,220.35  120,243.47
                 Modems/Fax             85,488.22   75,540.22   82,163.87
                 Monitors              338,795.95  309,261.00  318,457.99
                 Printer Supplies      126,890.11  103,908.88   98,610.61
Photo            Camcorders            276,726.14  302,757.06  309,163.10
                 Camera Batteries       87,368.25   77,983.22   64,453.74
                 Camera Media           55,979.88   60,589.43   46,312.73
                 Cameras               222,135.47  217,513.90  250,361.61
Software/Other   Accessories            60,090.11   46,264.68   50,513.14
                 Bulk Pack Diskettes     8,816.24    6,497.00    7,066.86
                 Documentation          63,202.96   67,330.54   78,742.43
                 Operating Systems      89,284.76  106,532.85   96,790.97
                 Recordable CDs         22,849.65   20,342.21   26,734.34
                 Recordable DVD Discs   74,207.23   57,712.70   60,317.69

One of the problems with DECODE is that writing multi-part logical expressions using the AND operator can be a little cumbersome as each AND operation would require an additional nested DECODE so, although example 3a might look long-winded and slightly difficult to follow (imagine if the report covered 18 months, rather than 3), the DECODE version would be worse! You're likely to find CASE expressions much easier to work with.

However, I've just fallen into a common trap when using CASE. The logic it allows us to implement is so flexible that it can encourage us to produce logically consistent but inefficient code if we're not careful. The thing to keep in mind is that when you use CASE in the SELECT list, it is a post-retrieval function. What this example will do is trawl through all of the sales figures, then apply a DECODE function to it in such a way as to exclude most sales from the result, because they didn't occur in the last quarter of 2001.

Another way of looking at this is that we are using CASE expressions in the SELECT clause to eliminate results that should have been eliminated much earlier, in the WHERE clause. Why retrieve data that we know we are going to discard subsequently! After all, that's the whole point of these techniques, to reduce the workload required to produce the reports

A better way of achieving the same result is shown in Example 3b.

Example 3b 

SELECT   p.prod_category, p.prod_subcategory,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 10
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 10
                THEN amount_sold
   END) AS OCT_dollars,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 11
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 11
                THEN amount_sold
   END) AS NOV_dollars,
   SUM(CASE WHEN p.prod_category = 'Photo' AND t.calendar_month_number = 12
                THEN amount_sold *1.1
        WHEN p.prod_category != 'Photo' AND t.calendar_month_number = 12
                THEN amount_sold
   END) AS DEC_dollars
FROM     sales s, times t, products p
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND t.calendar_year = 2001
AND t.calendar_month_number BETWEEN 10 AND 12
GROUP BY p.prod_category, p.prod_subcategory
ORDER BY 1, 2
/

The additional lines in the WHERE clause, shown in bold text will ensure that we reduce the volume of data that we're processing to the minimum first. We're not interested in any sales data that doesn't occur in the last quarter of 2001, so let's not even bother selecting it and, given that we've just eliminated the data that we're not interested in, there's no need to check the year in the CASE expressions any more.

The interesting thing is that the optimiser will choose the same execution plan for both of these queries and so any performance gain is minimal. However, it's a useful principle when writing SQL statements to eliminate as much data as possible as early as possible <96> with the most selective WHERE clause. This gives the optimiser the best chance of choosing an efficient access path and reduces the resource requirements.

In many cases the difference between having logic in the WHERE clause instead of the SELECT clause will be dramatic because Oracle will be able to use an index to retrieve a smaller amount of data more quickly. The golden rule is

Use the WHERE clause to eliminate all unnecessary data first and then use CASE in the SELECT list for additional processing.

Next

Improving SQL efficiency using CASE