페이지

2014년 12월 29일 월요일

DB2 Pivot Query Example

  • PIVOT
    The meaning of pivoting a table is to turn n-rows into n-columns.
    For example given a ales table with a composite primary key of year and quarter you may want to get a result that shows only one row for each year, but a separate column for each quarter.
CREATE TABLE Sales(Year    INTEGER NOT NULL,
                   Quarter INTEGER NOT NULL,
                   Results INTEGER);

CREATE UNIQUE INDEX PK_IDX ON Sales(Year, Quarter) INCLUDE(Results);

ALTER TABLE Sales ADD PRIMARY KEY (Year, Quarter);
 
INSERT INTO Sales VALUES
(2004, 1, 20),
(2004, 2, 30),
(2004, 3, 15),
(2004, 4, 10),
(2005, 1, 18),
(2005, 2, 40),
(2005, 3, 12),
(2005, 4, 27);
The desired result is:
Year Q1 Q2 Q3 Q4
---- -- -- -- --
2004 20 30 15 10
2005 18 40 12 27
There a many ways to achieve pivoting, but there is clearly one that is optimal which I will explain here.
Pivoting involves two events: First the payload (Results in this case) needs to be dispatch into the Q1 through Q4 columns.
This can be done with a CASE expression or DECODE (which is another syntax for CASE):


SELECT Year,
       DECODE(Quarter, 1, Results) AS Q1,
       DECODE(Quarter, 2, Results) AS Q2,
       DECODE(Quarter, 3, Results) AS Q3,
       DECODE(Quarter, 4, Results) AS Q4
  FROM Sales;

 
YEAR                 Q1          Q2          Q3          Q4
----------- ----------- ----------- ----------- -----------
       2004          20           -           -           -
       2004           -          30           -           -
       2004           -           -          15           -
       2004           -           -           -          10
       2005          18           -           -           -
       2005           -          40           -           -
       2005           -           -          12           -
       2005           -           -           -          27

  8 record(s) selected.
Note how DECODE injects NULL as an implicit ELSE.
The second step is to collapse the rows to get one row per Year.
The technique of choice is a GROUP BY here.
When grouping on Year we need to use some sort of column function for the columns Q1 through Q4.
MAX or MIN() work for most types and they are very cheap. Especially since all but one value will be NULL per group:


SELECT Year,
       MAX(DECODE(Quarter, 1, Results)) AS Q1,
       MAX(DECODE(Quarter, 2, Results)) AS Q2,           
       MAX(DECODE(Quarter, 3, Results)) AS Q3,
       MAX(DECODE(Quarter, 4, Results)) AS Q4
  FROM Sales
  GROUP BY Year;


YEAR                 Q1          Q2          Q3          Q4
----------- ----------- ----------- ----------- -----------
       2004          20          30          15          10
       2005          18          40          12          27

  2 record(s) selected.
I noted above that this approach is the best. How do I know. Well let's look at the access plan:
Access Plan:
-----------
     Total Cost:         0.0134932
    Query Degree:        1

        Rows
       RETURN
       (   1)
        Cost
         I/O
         |
         1.6
       GRPBY
       (   2)
       0.01322
          0
         |
          8
       IXSCAN
       (   3)
      0.0126886
          0
         |
          8
 INDEX:    SERGE  
       PK_IDX
         Q1
Pretty hard to beat!
  • UNPIVOT
    Unpivot is the inverse operation of PIVOT. Here we have several similar columns which are to be combined into one column, but different rows.
    Let's reuse the same example from above and run it backwards so to speak
CREATE TABLE Sales(Year INTEGER NOT NULL PRIMARY KEY,
                   Q1   INTEGER,
                   Q2   INTEGER,
                   Q3   INTEGER,
                   Q4   INTEGER);

INSERT INTO Sales VALUES
(2004, 20, 30, 15, 10),
(2005, 18, 40, 12, 27);
The most efficient way to unpivot is to do a cross join between the Sales table and a correlated VALUES of as many rows as columns that need to be unpivoted.


SELECT Year, Quarter, Results
  FROM Sales AS S,
             LATERAL(VALUES (1, S.Q1),
                            (2, S.Q2),
                            (3, S.Q3),
                            (4, S.Q4)) AS T(Quarter, Results);



YEAR            QUARTER     RESULTS
----------- ----------- -----------
       2004           1          20
       2004           2          30
       2004           3          15
       2004           4          10
       2005           1          18
       2005           2          40
       2005           3          12
       2005           4          27

  8 record(s) selected.
Again let's look at the explain:
Access Plan:
-----------
     Total Cost:         7.57432
    Query Degree:        1

                  Rows
                 RETURN
                 (   1)
                  Cost
                   I/O
                   |
                    8
                 NLJOIN
                 (   2)
                 7.57432
                    1
           /-------+-------\
          2                   4
       TBSCAN              TBSCAN
       (   3)              (   4)
       7.57317          4.95961e-005
          1                   0
         |                   |
          2                   4
 TABLE:    SERGE     TABFNC:   SYSIBM 
        SALES              GENROW
         Q3                  Q1

Once more this is the most efficient plan. Since it only contains a single scan.

댓글 없음:

댓글 쓰기