## Logic-dependent Aggregation (contd)

The only reason that we require two scans of SALES in example 2b is to return all the non-Photo products and their amount_sold in one data set, using SUM(amount_sold); and to return another data set containing the Photo products, using SUM(amount_sold) * 1.1 to calculate the total amount_sold. The two sets are then UNIONed.

We can optimise this query by retrieving all of the amount_sold values in one scan of the SALES table and then using CASE to selectively apply a calculation to the results for the Photo products in the SELECT list, as shown in example 2c.

__Example 2c__

SELECT p.prod_category, p.prod_subcategory, SUM(CASE p.prod_category WHEN 'Photo' THEN amount_sold *1.1 ELSE amount_sold END) AS dollars FROM sales s, times t, products p WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND t.calendar_year = 2001 GROUP BY p.prod_category, p.prod_subcategory ORDER BY 1, 2; PROD_CATEGORY PROD_SUBCATEGORY DOLLARS ------------------------------ -------------------------- --------------- Electronics Game Consoles 1,205,027.35 Home Audio 2,779,398.57 Y Box Accessories 161,004.00 Y Box Games 559,421.03 ****************************** --------------- sum 4,704,850.95 Hardware Desktop PCs 2,230,713.39 Portable PCs 3,453,656.62 ****************************** --------------- sum 5,684,370.01 Peripherals and Accessories Accessories 663,034.82 CD-ROM 669,134.90 Memory 1,228,555.41 Modems/Fax 874,702.07 Monitors 3,191,525.93 Printer Supplies 1,232,754.58 ****************************** --------------- sum 7,859,707.71 Photo Camcorders 3,100,982.48 Camera Batteries 833,389.59 Camera Media 606,199.41 Cameras 2,426,420.33 ****************************** --------------- sum 6,966,991.80 Software/Other Accessories 521,342.80 Bulk Pack Diskettes 88,216.04 Documentation 827,932.29 Operating Systems 1,020,370.87 Recordable CDs 367,478.04 Recordable DVD Discs 728,564.36 ****************************** --------------- sum 3,553,904.40

Although the results are identical and the two statements are functionally equivalent, it is clear from the execution plan that this will require only one full scan of the sales table, which represents a useful improvement.

Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=448 Card=75 Bytes=4800) 1 0 SORT (GROUP BY) (Cost=448 Card=75 Bytes=4800) 2 1 HASH JOIN (Cost=430 Card=229837 Bytes=14709568) 3 2 TABLE ACCESS (FULL) OF 'PRODUCTS' (TABLE) (Cost=3 Card=72 Bytes=2520) 4 2 HASH JOIN (Cost=424 Card=229837 Bytes=6665273) 5 4 TABLE ACCESS (FULL) OF 'TIMES' (TABLE) (Cost=15 Card=365 Bytes=4380) 6 4 PARTITION RANGE (ITERATOR) (Cost=400 Card=918843 Bytes=15620331) 7 6 TABLE ACCESS (FULL) OF 'SALES' (TABLE) (Cost=400 Card=918843 Bytes=15620331) Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 587 consistent gets 0 physical reads 0 redo size 1306 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 22 rows processed

Example 2c requires half the number of consistent gets and less than half the number of sorts that example 2b does. However, you'll probably notice that if you run this in a single user test environment that, because we're operating on fairly small volumes of data, the time to complete the requests and return the data is very similar - around 1 second in my tests. Which is why generating the execution plans and resource usage statistics is important. If you had many users running this report against larger data sets, the difference would become more noticeable.

So let's look at what we've changed. We'll leave the SELECT clause until last (because that is where the most significant changes are) and exclude the more straightforward parts of the statement first.

We still need to select FROM the same three tables and to GROUP BY product_category and product_subcategory, so no change in those two parts of the statement. We know we're interested in all products so let's eliminate the product_category check from the two different WHERE clauses in example 2b which leaves us with two identical WHERE clauses which facilitate the joins between the sales, times and product tables and limit the data to the calendar year 2001. Now that the two WHERE clauses are identical they return the same rows so we can reduce everything to one data set, with no need for the UNION any more. In fact, the query is starting to look like example 2a.

FROM sales s, times t, products p WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND t.calendar_year = 2001 GROUP BY p.prod_category, p.prod_subcategory ORDER BY p.prod_category, p.prod_subcategory;

This leaves us with our new SELECT clause to look at. The first two grouping columns remain the same - product_category and product_subcategory from the products table. The third column specification contains some of the logic which we've moved from the WHERE clauses of the UNION. It uses the SUM() function to generate a total amount_sold for all the products in the product_category and product_subcategory but uses different values for amount_sold, depending on whether the category is <91>Photo' or not. So here is a high-level procedural view of how example 2c works.

FOR EACH product subcategory (GROUP BY prod_category, prod_subcategory) Generate the total amount_sold for that product subcategory (SUM) IF the related product_category is 'Photo', THEN Use amount_sold * 1.1 ELSE (by default) Use amount_sold END IF