Some intermediate DB2 SQL techniques


Set up tables, logicals, labels and add some data
Use SQL Join to combine Headers and Details
Types of Joins
Example of a full OUTER JOIN
Using Nested Subquery
The Case Statement
OVER and PARTITION BY functions
Update using a subselect in the WHERE clause
Using HAVING to filter on GROUP BY values
Use SQL Hierarchical Query Operators PRIOR and CONNECT_BY_ROOT



Set up tables, logicals, labels and add some data

Here we will take a look at some interesting SQL techniques. Note that some syntax may be specific to DB2 or DB2 for iSeries..., but first let's create some tables and set up some data.

Create an Customer Master table

CREATE TABLE MYLIB/CUSTOMER (
  CUST_CODE CHAR (10) NOT NULL WITH DEFAULT,   
  CUST_NAME CHAR (25) NOT NULL WITH DEFAULT,   
  CUST_ADDRESS CHAR (25) NOT NULL WITH DEFAULT,   
  CUST_YR_SALES INT NOT NULL WITH DEFAULT 
) 

Create an Order Header table

CREATE TABLE MYLIB/ORDER_HED (
  ORDER_NUM INT NOT NULL WITH DEFAULT, 
  ORDER_CUST CHAR (10) NOT NULL WITH DEFAULT,   
  ORDER_DATE DATE NOT NULL WITH DEFAULT, 
  ORDER_TIME TIME NOT NULL WITH DEFAULT
) 

Create an Order Detail table

CREATE TABLE MYLIB/ORDER_DET (
     DETAIL_NUM INT NOT NULL WITH DEFAULT, 
	 DETAIL_LINE INT NOT NULL WITH DEFAULT, 
	 DETAIL_SKU CHAR (10) NOT NULL WITH DEFAULT, 
	 DETAIL_QTY INT NOT NULL WITH DEFAULT, 
	 DETAIL_PRICE DEC ( 10, 2) NOT NULL WITH DEFAULT
 )  

Add an index

 CREATE INDEX MYLIB/ORD_DETL1  
    ON ORDER_DET(DETAIL_NUM, DETAIL_LINE)      

Give descriptions to table and index

LABEL ON TABLE MYLIB/ORDER_DET 
IS 'Order Header') 

LABEL ON INDEX MYLIB/ORD_DETL1
IS 'Order Header -logical') 

Add some column labels

LABEL ON COLUMN ORDER_HED                            
(  ORDER_NUM   TEXT IS 'Order_Number', 
   ORDER_CUST  TEXT IS 'Order_Customer', 
   ORDER_DATE  TEXT IS 'Order_Date', 
   ORDER_TIME  TEXT IS 'Order_Time')                 
   
LABEL ON COLUMN ORDER_DET
(  DETAIL_NUM    TEXT IS 'Order_Number', 
   DETAIL_LINE   TEXT IS 'Line_Number', 
   DETAIL_SKU    TEXT IS 'SKU', 
   DETAIL_QTY    TEXT IS 'Quantity', 
   DETAIL_PRICE  TEXT IS 'Price'
)   
      

Add add some data

INSERT INTO TESTMC3/CUSTOMER VALUES('A10001', 'Reliable Rug Cleaning', '124 Main Street',0)
INSERT INTO TESTMC3/CUSTOMER VALUES('A10003', 'Quicky Mart', '125 Main Street',0)                                       
INSERT INTO TESTMC3/CUSTOMER VALUES('A10005', 'BOB''S BAR-B-Q', '126 Main Street',0)
                                                          
INSERT INTO ORDER_HED VALUES(100, 'A10005', '01/13/2016', '12:00:01')
INSERT INTO ORDER_HED VALUES(101, 'A10003', '01/14/2016', '12:00:02')
INSERT INTO ORDER_HED VALUES(102, 'A10001', '01/15/2016', '12:00:03')

INSERT INTO ORDER_DET VALUES(100, 1, 'ABCD' ,1, 10.50)     
INSERT INTO ORDER_DET VALUES(100, 2, 'SHES' ,2, 13.99)     
INSERT INTO ORDER_DET VALUES(101, 1, 'SEGN' ,1, 14.99)     
INSERT INTO ORDER_DET VALUES(101, 2, 'SWTG' ,3, 14.99)     
INSERT INTO ORDER_DET VALUES(101, 3, 'FGST' ,1, 12.00)     
INSERT INTO ORDER_DET VALUES(102, 1, 'SEQY' ,5, 11.99)     
INSERT INTO ORDER_DET VALUES(102, 2, 'ABCD' ,1, 10.50)     


Use SQL Join to combine Headers and Details

A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

SELECT ORDER_HED.ORDER_NUM, ORDER_DET.DETAIL_LINE,
    ORDER_DET.DETAIL_SKU, ORDER_DET.DETAIL_QTY, ORDER_DET.DETAIL_PRICE, 
	ORDER_DET.DETAIL_QTY * ORDER_DET.DETAIL_PRICE as TOTAL
FROM ORDER_HED JOIN ORDER_DET ON (ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM )            
                                                           

Note that we can also use the WHERE clause instead of JOIN ON clause.

SELECT ORDER_HED.ORDER_NUM, ORDER_DET.DETAIL_LINE,
    ORDER_DET.DETAIL_SKU, ORDER_DET.DETAIL_QTY, ORDER_DET.DETAIL_PRICE, 
	ORDER_DET.DETAIL_QTY * ORDER_DET.DETAIL_PRICE 
FROM ORDER_HED , ORDER_DET                                      
WHERE ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM  as TOTAL

Output:

	   
  ORDER_NUM     DETAIL_LINE   DETAIL_SKU     DETAIL_QTY    DETAIL_PRICE                  TOTAL
       100               1   ABCD                    1           10.50                          10.50           
       100               2   SHES                    2           13.99                          27.98           
       101               1   SEGN                    1           14.99                          14.99           
       101               2   SWTG                    3           14.99                          44.97           
       101               3   FGST                    1           12.00                          12.00           
       102               1   SEQY                    5           11.99                          59.95           
       102               2   ABCD                    1           10.50                          10.50           
 ********  End of data  ********                                 


Types of Joins and Union

JOIN or INNER JOIN - An Inner join is a method of combining two tables that discards rows of either table that do not match any row of the other table. The matching is based on the join condition.

LEFT JOIN or LEFT OUTER JOIN - A left outer join is a method of combining tables. The result includes unmatched rows from only the table that is specified before the LEFT OUTER JOIN clause.

RIGHT JOIN or RIGHT OUTER JOIN - A right outer join is a method of combining tables. The result includes unmatched rows from only the table that is specified after the RIGHT OUTER JOIN clause.

FULL OUTER JOIN - A full outer join is a method of combining tables so that the result includes unmatched rows of both tables.

UNION - allows you to combine results of two or more SELECT statements into a single result set which includes all the rows that belongs to the SELECT statements in the union. Each SELECT statement within UNION must have the same number of columns. The columns must also have similar data types. The columns in each SELECT statement must also be in the same order.




Example of a FULL OUTER JOIN

The FULL OUTER JOIN is often overlooked as it was added to many flavours of SQL years after initial releases and is often left out of textbooks. To test the FULL OUTER JOIN, let's insert "orphan" header and footer records into the database, then use the FULL OUTER JOIN to catch all matches and non-matched records.


INSERT INTO ORDER_HED VALUES(777, 'A10005', '01/13/2016',       

INSERT INTO ORDER_DET VALUES(888, 2, 'SHES' ,2, 13.99)          
SELECT ORDER_HED.ORDER_NUM, ORDER_DET.DETAIL_LINE,                  
   ORDER_DET.DETAIL_SKU, ORDER_DET.DETAIL_QTY, ORDER_DET.DETAIL_PRICE, 
   ORDER_DET.DETAIL_QTY * ORDER_DET.DETAIL_PRICE as TOTAL 
FROM ORDER_HED FULL OUTER JOIN ORDER_DET 
   ON (ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM )                                              

Note the dashes representing null values in the results below, where the SQL query does not find matches using a FULL OUTER JOIN.

     ORDER_NUM     DETAIL_LINE   DETAIL_SKU     DETAIL_QTY    DETAIL_PRICE                          TOTAL       
           100               1   ABCD                    1           10.50                          10.50       
           100               2   SHES                    2           13.99                          27.98       
           101               1   SEGN                    1           14.99                          14.99       
           101               2   SWTG                    3           14.99                          44.97       
           101               3   FGST                    1           12.00                          12.00       
           102               1   SEQY                    5           11.99                          59.95       
           102               2   ABCD                    1           10.50                          10.50       
           777                -  -                        -               -                              -      
              -              2   SHES                    2           13.99                          27.98       
 ********  End of data  ********                                                                                



Using Subqueries

A subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery. A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. A subquery can be used anywhere an expression is allowed.

A subquery may perform the following tasks:

  • Compare an expression to the result of the query.
  • Determine if an expression is included in the results of the query.
  • Check whether the query selects any rows.

A Subquery that returns a single value

In the following example we display the total number of items orders and uses a subquery to get total number of orders.

SELECT SUM(DETAIL_QTY) AS TOTAL_ORDER_LINE_QTY,
(
	SELECT COUNT(*) 
	FROM ORDER_HED
) AS TOTAL_ORDERS
FROM ORDER_DET

 TOTAL_ORDER_LINE_QTY   TOTAL_ORDERS    
                14                 3    
 ********  End of data  ********       

Multiple Subqueries in a single SQL statement

Show orders and how many units in the order and the average number of units for all orders.

SELECT DETAIL_NUM AS ORDER_NO,  
SUM( DETAIL_QTY ) AS UNITS_IN_ORDER,  
(  
	SELECT SUM(DETAIL_QTY) 
	FROM ORDER_DET  
	)                                                           
	/  
	(SELECT CAST(COUNT(*) AS DECIMAL(9,2)) 
	FROM ORDER_HED                                   
) AS AVG_UNITS_PER_ORDER
FROM ORDER_DET  
GROUP BY DETAIL_NUM 
ORDER BY DETAIL_NUM 
     ORDER_NO   UNITS_IN_ORDER                    AVG_UNITS_PER_ORDER       
          100                3                   4.666666666666666666       
          101                5                   4.666666666666666666       
          102                6                   4.666666666666666666       
********  End of data  ********                                             




The Case Statement

The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL. We can use a Case statement in select queries along with Where, Order By and Group By clause. It can be used in Insert statement as well.


Here we want to do tests on each order and add a comment for each based on order quantity.

SELECT ORDER_HED.ORDER_NUM,                                             
    SUM(ORDER_DET.DETAIL_QTY) AS ORDER_QTY,                        
    CASE 
       WHEN SUM(ORDER_DET.DETAIL_QTY) <= 3 THEN 'Small Order (3 units or less)'  
       WHEN SUM(ORDER_DET.DETAIL_QTY) BETWEEN 4 and 5 THEN 'Medium Order (4 or 5 units)'  
       ELSE 'Large Order (6 or more units)'         
    END AS ORDER_COMMENT
 FROM ORDER_HED JOIN ORDER_DET                                           
 ON(ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM)                          
 GROUP BY ORDER_HED.ORDER_NUM         
    ORDER_NUM       ORDER_QTY   ORDER_COMMENT                        
          100               3   Small Order (3 units or less)        
          101               5   Medium Order (4 or 5 units)          
          102               6   Large Order (6 or more units)        
********  End of data  ********                                      



OVER and PARTITION BY functions

The OVER and PARTITION BY functions are both functions used to portion a results set according to specified criteria. This syntax is useful to add line number, averages or running totals,


The ROW_NUMBER() provides a line sequence. However, this sequence will be continually incremented. By using the PARTITION BY, we can reset the count on specified field(s), in this case on each change on order number in detail file.

SELECT    
   ORDER_HED.ORDER_NUM,        
   ROW_NUMBER() 
   OVER (PARTITION BY DETAIL_NUM ORDER BY DETAIL_NUM ASC) as 
       ROW_NUMBER,  
       ORDER_DET.DETAIL_SKU,  
       ORDER_DET.DETAIL_QTY, ORDER_DET.DETAIL_PRICE 
FROM ORDER_HED JOIN ORDER_DET  
ON(ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM) 

Output:

 ....+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8....+.                 
     ORDER_NUM                  ROW_NUMBER   DETAIL_SKU     DETAIL_QTY    DETAIL_PRICE                  
           100                           1   ABCD                    1           10.50                  
           100                           2   SHES                    2           13.99                  
           101                           1   SEGN                    1           14.99                  
           101                           2   SWTG                    3           14.99                  
           101                           3   FGST                    1           12.00                  
           102                           1   SEQY                    5           11.99                  
           102                           2   ABCD                    1           10.50                  
 ********  End of data  ********



Update using a subselect in the WHERE clause

In this query we use a subquery to give 20% discount to customers who ordered before January 14th.

UPDATE  ORDER_DET SET DETAIL_PRICE = DETAIL_PRICE * .8   
WHERE   ORDER_DET.DETAIL_NUM IN ( 
        SELECT   ORDER_NUM 
        FROM     ORDER_HED   
        WHERE    ORDER_DATE > '01/14/2016' 
		)                                 



Using HAVING to filter on GROUP BY values

The HAVING clause enables you to specify conditions that filter which group results appear in the final results. The WHERE clause places conditions on the selected columns, whereas the HAVING clause places conditions on groups created by the GROUP BY clause. In this example, we use HAVING and GROUP BY to show orders whose total is greater than $50

SELECT    ORDER_HED.ORDER_NUM, SUM( ORDER_DET.DETAIL_QTY*              
          ORDER_DET.DETAIL_PRICE ) AS ORDER_VALUE 
FROM      ORDER_HED JOIN ORDER_DET ON ( ORDER_HED.ORDER_NUM = ORDER_DET.DETAIL_NUM) GROUP BY ORDER_HED.ORDER_NUM 
HAVING    SUM (ORDER_DET.DETAIL_QTY * ORDER_DET.DETAIL_PRICE ) > 50 
ORDER BY  ORDER_HED.ORDER_NUM 

ORDER_NUM        ORDER_VALUE  
	 101              57.96  
	 102              69.95  



Using SQL Hierarchical Query Operators PRIOR and CONNECT_BY_ROOT, to help us generate comprehensive list of hierarchical relationships.


The Challenge

A membership organization wants to purge old data relating to inactive customers, however due to data retention policy, they don't want to delete an inactive customer if he/she has a spouse who is an active customer or ex-spouse who is an active customer etc... Given a table of values that represent current and past spousal relationships, generate list of all extended relationships, including ex-wives and their new wives/husbands etc..


Solution

Use DB2 SQL Hierarchical Query Operators CONNECT_BY_ROOT and PRIOR, to help us generate comprehensive list of relationships.

First we want to create a file and add sample data.

CREATE TABLE CONJ (SPOUSE1 NUMERIC ( 9, 0) NOT NULL WITH DEFAULT, 
                   SPOUSE2 NUMERIC ( 9, 0) NOT NULL WITH DEFAULT)              
                                                                                                                                                                                                                                                 
INSERT INTO CONJ VALUES(1, 2)      
INSERT INTO CONJ VALUES(1, 3)      
INSERT INTO CONJ VALUES(3, 4)      
INSERT INTO CONJ VALUES(5, 4)      
INSERT INTO CONJ VALUES(6, 7)

To generate list of all relationships:

WITH CONJ2 AS(                                             
  SELECT DISTINCT SPOUSE1, SPOUSE2 FROM conj                 
  WHERE SPOUSE1 <> SPOUSE2  
UNION                            
  SELECT SPOUSE2, SPOUSE1 FROM conj                          
  WHERE SPOUSE1 <> SPOUSE2                                   
  ORDER BY SPOUSE1 
)                                         
SELECT DISTINCT CONNECT_BY_ROOT SPOUSE1 AS PARENT ,        
  SPOUSE2 AS CHILD, LEVEL LEVEL                              
FROM CONJ2 START WITH SPOUSE1 <> 0                         
CONNECT BY NOCYCLE PRIOR SPOUSE2 = SPOUSE1 UNION           
SELECT DISTINCT CONNECT_BY_ROOT SPOUSE2 AS PARENT ,        
  SPOUSE1 AS CHILD, LEVEL LEVEL                              
FROM CONJ2 START WITH SPOUSE2 <> 0                         
CONNECT BY NOCYCLE PRIOR SPOUSE2 = SPOUSE1                 
ORDER BY PARENT , CHILD                                    

Output

Note that we pick up the 1-4 relationship (via 1-3, 3-4) and the 3-5 (via 1-3, 4-3, 5-4) relationship.

  PARENT         CHILD    
      1             1    
      1             2    
      1             3    
      1             4    
      1             5    
      2             1    
      2             2    
      2             3    
      2             4    
      2             5    
      3             1    
      3             2    
      3             3    
      3             4    
      3             5    
      4             1    
      4             2    
      4             3    
      4             4             
      4             5           
      5             1           
      5             2           
      5             3           
      5             4           
      5             5           
      6             6           
      6             7           
      7             6           
      7             7                     
                

Vocabulary

CONNECT_BY_ROOT is a unary operator that is valid only in hierarchical queries. When you qualify a column with this operator, SQL will return the column value using data from the root row. This operator extends the functionality of the CONNECT BY [PRIOR] condition of hierarchical queries.

PRIOR In a hierarchical query, one expression in the CONNECT BY condition must be qualified by the PRIOR operator. If the CONNECT BY condition is compound, then only one condition requires the PRIOR operator, although you can have multiple PRIOR conditions. PRIOR evaluates the immediately following expression for the parent row of the current row in a hierarchical query. PRIOR is most commonly used when comparing column values with the equality operator. (The PRIOR keyword can be on either side of the operator.) PRIOR causes SQL to use the value of the parent row in the column.

The NOCYCLE clause eliminates cyclic redundance in the queries.