Wednesday, October 2, 2013

Analytic Functions in SQL 2012 - Part 1 - FIRST_VALUE and LAST_VALUE

SQL 2012 comes with lot of new features and capabilities. One of the interesting add-ins are the new set of Analytic Functions. I am sure most of you (if not everyone) like me used to struggle a lot writing stored procedures for these standard statistic functions. Well, without wasting more time lets have a quick look and then in later posts, I will try to get into more depths of these functions and where possibly we could use them. Your inputs and feedbacks would be on icing on cake - goes without saying.


What are Analytic Functions?

According to technet "Analytic functions compute an aggregate value based on a group of rows. However, unlike aggregate functions, they can return multiple rows for each group."

The second sentence is very important and what makes these functions more interesting. What it means is with aggregate functions like SUM, AVG, MAX, MIN - we do not have luxury of displaying the individual records. Moreover, these aggregate functions return only 1 result and hence one record per group. Whereas, the analytic functions can return multiple rows. Therefore, GROUPING works differently in case of analytic functions.

Due to this, some practical scenarios that were more difficult to implement using aggregate functions have become simple using Analytic functions. Now lets have a look at these functions.

FIRST_VALUE and LAST_VALUE


As the name suggests, these function return the First Value or the Last Value from an ORDERED list. This can be useful in situations where you want to get lets say: Name of Employee with Least Salary, Name of a Product with Maximum Sales, Date on which Least Sales was done and so on..

Alternatives:


In the absence of this function, we would first need to get the Minimum value and then do a Lookup for the minimum value to get the corresponding record.

A second alternative would be to do a TOP 1 statement.


Example:


To get the name of the Least expensive Product.

SELECT
Product = P.EnglishProductName,
P.ListPrice,
LeastExpensive =
FIRST_VALUE(P.EnglishProductName) OVER (ORDER BY P.ListPrice ASC) FROM dbo.DimProduct P

The OVER Clause first orders the list by ListPrice in ascending order and then the FIRST VALUE fnction is applied to get the First Value of EnglishProductName in the list. However, this ORDER BY clause within the OVER clause does not affect the output sequence.



Another example: To get the Least expensive product in each ProductSubCateogry instead of whole list. This can be done by adding "PARTITION" clause in the OVER clause. The Partition clause divides the whole list into multiple categories and then FIRST_VALUE is computed for each of these small partitions. Let's have a look at the SQL for this.

SELECT
SubCategory = S.EnglishProductSubcategoryName,
Product = P.EnglishProductName,
P.ListPrice,
LeastExpensive  = FIRST_VALUE(P.EnglishProductName)
     OVER (PARTITION BY S.EnglishProductSubcategoryName ORDER BY P.ListPrice ASC)
FROM dbo.DimProduct P
JOIN dbo.DimProductSubcategory S ON P.ProductSubcategoryKey = S.ProductSubcategoryKey





To find the difference between the Price of Current Product as compared to the Price of Least Expensive and Most Expensive Product in the Same SubCategory

SELECT SubCategory = S.EnglishProductSubcategoryName,
Product = P.EnglishProductName,
P.ListPrice,

 
PriceDifference_LeastExpensive  = P.ListPrice - FIRST_VALUE(P.ListPrice)
OVER (PARTITION BY S.EnglishProductSubcategoryName ORDER BY P.ListPrice ASC)


PriceDifference_MostExpensive = FIRST_VALUE(P.ListPrice)
OVER (PARTITION BY S.EnglishProductSubcategoryName ORDER BY P.ListPrice DESC) - P.ListPrice

FROM dbo.DimProduct P
JOIN dbo.DimProductSubcategory S ON P.ProductSubcategoryKey = S.ProductSubcategoryKey




This can be rewritten using LAST_VALUE function as


SELECT SubCategory = S.EnglishProductSubcategoryName,
Product = P.EnglishProductName,
P.ListPrice,


PriceDifference_LeastExpensive = P.ListPrice - FIRST_VALUE(P.ListPrice)
OVER (PARTITION BY S.EnglishProductSubcategoryName ORDER BY P.ListPrice ASC)


PriceDifference_MostExpensive = LAST_VALUE(P.ListPrice)
OVER (PARTITION BY S.EnglishProductSubcategoryName ORDER BY P.ListPrice ASC) - P.ListPrice

FROM dbo.DimProduct P
JOIN dbo.DimProductSubcategory S ON P.ProductSubcategoryKey = S.ProductSubcategoryKey




Please do contribute where else you see these functions could be used in practical scenarios.

Part 2 will cover LEAD and LAG functions.

2 comments:

  1. Thanks Shalinder, it was a great insight for me to have a feel of Business Intelligence and use of SQL queries. It appears that the syntax has been changed, as I have used SQL Server 7.x version. But the way you have explained the queries, almost everything is clear to me.

    We can use these functions in social domain also, like in public health. Using these functions we can determine performance of any geographical area, for example States India against any indicator. Like which state is the best performing and which one is the worst performing against a public health indicator.

    The same logic can be applied to various indicators of opinion polls/surveys.

    ReplyDelete
    Replies
    1. Thanks Sandeep for your feedback and suggestions. Yes, SQL Server has come long way since the days of SQL 7.x. Let me know if you want me to update the blogs with some real time scenarios you might be familiar with which could be of help to others as well.

      Cheers

      Delete