In SQL, the LAG and LEAD functions are used to access data from a previous or next row in the result set. These functions are often combined with the PARTITION BY clause to apply their logic within specific subsets of data.
Syntax
- LAG:
LAG(value_column, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)
- LEAD:
LEAD(value_column, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)
- Parameters:
- value_column: The column from which you want to fetch previous or next row values.
- offset: (Optional) The number of rows back (for LAG) or forward (for LEAD) from the current row. Default is 1.
- default_value: (Optional) A value to return if the specified row does not exist (for example, trying to access a row before the first row).
- PARTITION BY partition_column: This divides the result set into partitions to which the functions are applied.
- ORDER BY order_column: This defines the order in which rows are considered in each partition.
Example Usage
Let's assume we have a table named Sales with the following columns:
- EmployeeID
- SalesDate
- Amount
We want to find each employee's current sales amount as well as their previous and next sales amounts.
SELECT
EmployeeID,
SalesDate,
Amount,
LAG(Amount, 1) OVER (PARTITION BY EmployeeID ORDER BY SalesDate) AS Previous_Sale,
LEAD(Amount, 1) OVER (PARTITION BY EmployeeID ORDER BY SalesDate) AS Next_Sale
FROM
Sales
ORDER BY
EmployeeID, SalesDate;
Explanation
1. PARTITION BY EmployeeID: This means that the LAG and LEAD functions will operate independently for each employee.
2. ORDER BY SalesDate: This specifies that the sales records will be ordered chronologically within each employee's data.
3. LAG(Amount, 1): This retrieves the amount of the previous sale for each employee.
4. LEAD(Amount, 1): This retrieves the amount of the next sale for each employee.
5. The final result will include each employee’s sales along with their previous and next sales amounts.
Important Notes
- If you try to access a previous row where none exists (e.g., the first row in the partition), LAG will return NULL (or the specified default_value if provided).
- Similarly, if there is no next row for LEAD, it will also return NULL (or the specified default_value).
By using LAG and LEAD effectively with PARTITION BY, you can perform complex analytics over your dataset.