In the Part 1 of this series we looked at how to get the extreme (Maximum, Minimum, Least, Most, etc) records from an ordered list using two fnctions First_Value and Last_Value. However, there could be situation when you want to access a record that is at a position relative to a current record.

For example, if you want to compare the value of a record with the previous or the next record. There could be a situation when you have a monthly data for multiple years and you want to comnpare values of current month with the same month previous year, assuming that all the 12 months appear in the list whether or not they have values. For these kind of situations, we have next 2 functions - LEAD() and LAG().

Just like other Analytic Functions, these functions too wotk with OVER clause only.

LEAD function provides access to a row at a given offset following the current row. Similarly, LAG function proveds access to a row at a given offset preceding the current row. Before we go to the examples, lets have a close look at the syntax of these functions because they expect two extra input parameters.

LEAD ( return value found expression, [offset expression], [default return value expression])

LAG ( return value found expression, [offset expression], [default return value expression])

The second parameter "offset expression" would be an expression returning positive BIGINT value. This is optional and the default value is 1. In case of LEAD, the offset is moved forward and in case of LAG the offset is moved backwards. However, it can not be negative!! The offset value could be dynamically evaluated using the expression. This really makes this function very powerful and flexible.

The Third parameter "default return value expression" would be an expression which will be returned if no record is found at the given offset position for the current record. This could happen for Last records.

To compare the next Quarter Quota Amount with the Current Quarter Quota Amount for a specific SalesRep and Year

The OVER Clause first orders the list by CalendarQuarter in ascending order and then the LEAD function is applied to get the next Quarter (offset = 1) in the list. In this ORDER BY clause within the OVER clause does affect the output sequence.

To compare the next Aggregated Quarter Quota Amount with the Aggregated Current Quarter Quota Amount for Year = 2007. This example shows the use of expression instead of just a column name in the LEAD function

Now lets say we want to do this for EACH Year:

In this case the Partition By clause will first create small partitions on CalendarYear and then Order these Partitions by CalendarQuarter. Finally, the LEAD function will get the Next Record (second parameter, offset = 1) for the records with Quarter 1, 2 or 3. For the 4th Quarter, the function will return 0 (third parameter, default = 0) as there won't be any next record in the CalendarYear partition for Quarter 4.

What if we want the comparison of the Quarter over Quarter across the years. That means, For Quarter 4 of a current year, the Next Quarter should be Quarter 1 of the Next year. This will be the SQL:

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (ORDER BY CalendarYear , CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

Now instead of creating Partition on CalendarYear, we will work on the WHOLE dataset as one big partition, but order the parition by CalendarYear and CalendarQuarter.

Finally, let's look at an example to use an expression for setting the value of OFFSET parameter instead of a constant. In this case, we need to compare the current quarter SalesQuota with the SalesQuota of First Quarter of the same Year.

SELECT

CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = LAG(SUM(SalesAmountQuota),

CASE

WHEN CalendarQuarter = 2 THEN 1

WHEN CalendarQuarter = 3 THEN 2

WHEN CalendarQuarter = 4 THEN 3

ELSE 0

END ,0)

OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

The CASE Statement can be replaced with a simple expression as below:

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = LAG(SUM(SalesAmountQuota), CalendarQuarter -1, 0)

OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

Ofcourse, if you recollect from previous part, the same statement can be written using FIRST_VALUE function as below. However, this statement will not return correct result for years where you do not have data for First Quarter, because FIRST_VALUE function will return the very first record in the year partition. So, if you suspect the table may not have the First Quarter data, it's better to go with LAG function.

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = FIRST_VALUE(SUM(SalesAmountQuota)) OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

That's the end of this part.

In the next part of the series, I am going to cover CUME_DIST and PERCENT_RANK. Stay tuned!

For example, if you want to compare the value of a record with the previous or the next record. There could be a situation when you have a monthly data for multiple years and you want to comnpare values of current month with the same month previous year, assuming that all the 12 months appear in the list whether or not they have values. For these kind of situations, we have next 2 functions - LEAD() and LAG().

Just like other Analytic Functions, these functions too wotk with OVER clause only.

### LEAD and LAG

LEAD function provides access to a row at a given offset following the current row. Similarly, LAG function proveds access to a row at a given offset preceding the current row. Before we go to the examples, lets have a close look at the syntax of these functions because they expect two extra input parameters.

LEAD ( return value found expression, [offset expression], [default return value expression])

LAG ( return value found expression, [offset expression], [default return value expression])

The second parameter "offset expression" would be an expression returning positive BIGINT value. This is optional and the default value is 1. In case of LEAD, the offset is moved forward and in case of LAG the offset is moved backwards. However, it can not be negative!! The offset value could be dynamically evaluated using the expression. This really makes this function very powerful and flexible.

The Third parameter "default return value expression" would be an expression which will be returned if no record is found at the given offset position for the current record. This could happen for Last records.

#### Examples:

To compare the next Quarter Quota Amount with the Current Quarter Quota Amount for a specific SalesRep and Year

SELECT

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SalesAmountQuota,

NextQuarterQuota = LEAD(SalesAmountQuota, 1,0) OVER (ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

WHERE EmployeeKey = 272

AND CalendarYear = 2007

The OVER Clause first orders the list by CalendarQuarter in ascending order and then the LEAD function is applied to get the next Quarter (offset = 1) in the list. In this ORDER BY clause within the OVER clause does affect the output sequence.

To compare the next Aggregated Quarter Quota Amount with the Aggregated Current Quarter Quota Amount for Year = 2007. This example shows the use of expression instead of just a column name in the LEAD function

SELECT

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

WHERE CalendarYear = 2007

GROUP BY CalendarQuarter

Now lets say we want to do this for EACH Year:

SELECT

CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

In this case the Partition By clause will first create small partitions on CalendarYear and then Order these Partitions by CalendarQuarter. Finally, the LEAD function will get the Next Record (second parameter, offset = 1) for the records with Quarter 1, 2 or 3. For the 4th Quarter, the function will return 0 (third parameter, default = 0) as there won't be any next record in the CalendarYear partition for Quarter 4.

What if we want the comparison of the Quarter over Quarter across the years. That means, For Quarter 4 of a current year, the Next Quarter should be Quarter 1 of the Next year. This will be the SQL:

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

NextQuarterQuota = LEAD(SUM(SalesAmountQuota), 1,0) OVER (ORDER BY CalendarYear , CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

Now instead of creating Partition on CalendarYear, we will work on the WHOLE dataset as one big partition, but order the parition by CalendarYear and CalendarQuarter.

Finally, let's look at an example to use an expression for setting the value of OFFSET parameter instead of a constant. In this case, we need to compare the current quarter SalesQuota with the SalesQuota of First Quarter of the same Year.

SELECT

CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = LAG(SUM(SalesAmountQuota),

CASE

WHEN CalendarQuarter = 2 THEN 1

WHEN CalendarQuarter = 3 THEN 2

WHEN CalendarQuarter = 4 THEN 3

ELSE 0

END ,0)

OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

The CASE Statement can be replaced with a simple expression as below:

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = LAG(SUM(SalesAmountQuota), CalendarQuarter -1, 0)

OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

Ofcourse, if you recollect from previous part, the same statement can be written using FIRST_VALUE function as below. However, this statement will not return correct result for years where you do not have data for First Quarter, because FIRST_VALUE function will return the very first record in the year partition. So, if you suspect the table may not have the First Quarter data, it's better to go with LAG function.

SELECT CalendarYear,

SalesQuarter = CalendarQuarter,

CurrentQuarterQuota = SUM(SalesAmountQuota),

FirstQuarterQuota = FIRST_VALUE(SUM(SalesAmountQuota)) OVER (PARTITION BY CalendarYear ORDER BY CalendarQuarter)

FROM dbo.FactSalesQuota

GROUP BY CalendarYear, CalendarQuarter

That's the end of this part.

In the next part of the series, I am going to cover CUME_DIST and PERCENT_RANK. Stay tuned!

## No comments:

## Post a Comment