Monday, October 7, 2013

Analytic Functions in SQL 2012 - Part 2 - LEAD and LAG

In the Part 1 of this series we looked at how to get the extreme (Maximum, Minimum, Least, Most, etc) records from an ordered list using two fnctions First_Value and Last_Value. However, there could be situation when you want to access a record that is at a position relative to a current record.

For example, if you want to compare the value of a record with the previous or the next record. There could be a situation when you have a monthly data for multiple years and you want to comnpare values of current month with the same month previous year, assuming that all the 12 months appear in the list whether or not they have values. For these kind of situations, we have next 2 functions - LEAD() and LAG().

Just like other Analytic Functions, these functions too wotk with OVER clause only.


LEAD and LAG


LEAD function provides access to a row at a given offset following the current row. Similarly, LAG function proveds access to a row at a given offset preceding the current row. Before we go to the examples, lets have a close look at the syntax of these functions because they expect two extra input parameters.

LEAD ( return value found expression, [offset expression], [default return value expression])
LAG ( return value found expression, [offset expression], [default return value expression])

The second parameter "offset expression" would be an expression returning positive BIGINT value. This is optional and the default value is 1. In case of LEAD, the offset is moved forward and in case of LAG the offset is moved backwards. However, it can not be negative!! The offset value could be dynamically evaluated using the expression. This really makes this function very powerful and flexible.

The Third parameter "default return value expression" would be an expression which will be returned if no record is found at the given offset position for the current record. This could happen for Last records.


Examples:


To compare the next Quarter Quota Amount with the Current Quarter Quota Amount for a specific SalesRep and Year 


SELECT


SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SalesAmountQuota,
NextQuarterQuota = LEAD(SalesAmountQuota, 1,0) OVER (ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
WHERE EmployeeKey = 272
AND CalendarYear = 2007
 

The OVER Clause first orders the list by CalendarQuarter in ascending order and then the LEAD function is applied to get the next Quarter (offset = 1) in the list. In this ORDER BY clause within the OVER clause does affect the output sequence.




To compare the next Aggregated Quarter Quota Amount with the Aggregated Current Quarter Quota Amount for Year = 2007. This example shows the use of expression instead of just a column name in the LEAD function


SELECT


SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SUM(SalesAmountQuota),
NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
WHERE CalendarYear = 2007
GROUP BY CalendarQuarter


Now lets say we want to do this for EACH Year:

SELECT
CalendarYear,


SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SUM(SalesAmountQuota),
NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter
 
 

In this case the Partition By clause will first create small partitions on CalendarYear and then Order these Partitions by CalendarQuarter. Finally, the LEAD function will get the Next Record (second parameter, offset = 1) for the records with Quarter 1, 2 or 3. For the 4th Quarter, the function will return 0 (third parameter, default = 0) as there won't be any next record in the CalendarYear partition for Quarter 4.



What if we want the comparison of the Quarter over Quarter across the years. That means, For Quarter 4 of a current year, the Next Quarter should be Quarter 1 of the Next year. This will be the SQL:

SELECT CalendarYear,
 SalesQuarter = CalendarQuarter,
 CurrentQuarterQuota = SUM(SalesAmountQuota), 
 NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (ORDER BY CalendarYear ,  CalendarQuarter)
FROM dbo.FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter

Now instead of creating Partition on CalendarYear, we will work on the WHOLE dataset as one big partition, but order the parition by CalendarYear and CalendarQuarter.



Finally, let's look at an example to use an expression for setting the value of OFFSET parameter instead of a constant. In this case, we need to compare the current quarter SalesQuota with the SalesQuota of First Quarter of the same Year.

SELECT
CalendarYear,
SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SUM(SalesAmountQuota),
FirstQuarterQuota = LAG(SUM(SalesAmountQuota),
CASE
WHEN CalendarQuarter = 2 THEN 1
WHEN CalendarQuarter = 3 THEN 2
WHEN CalendarQuarter = 4 THEN 3
ELSE 0
END ,0)
OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter


The CASE Statement can be replaced with a simple expression as below:

SELECT CalendarYear,
SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SUM(SalesAmountQuota),
FirstQuarterQuota = LAG(SUM(SalesAmountQuota), CalendarQuarter -1, 0)
OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter




Ofcourse, if you recollect from previous part, the same statement can be written using FIRST_VALUE function as below. However, this statement will not return correct result for years where you do not have data for First Quarter, because FIRST_VALUE function will return the very first record in the year partition. So, if you suspect the table may not have the First Quarter data, it's better to go with LAG function.


SELECT CalendarYear,
SalesQuarter = CalendarQuarter,
CurrentQuarterQuota = SUM(SalesAmountQuota),
FirstQuarterQuota = FIRST_VALUE(SUM(SalesAmountQuota)) OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)
FROM dbo.FactSalesQuota
GROUP BY CalendarYear, CalendarQuarter




That's the end of this part.

In the next part of the series, I am going to cover CUME_DIST and PERCENT_RANK. Stay tuned!