The FIRST_VALUE and LAST_VALUE functions in SQL are used to return the first and last values from an ordered set of rows in a result set, respectively. These functions are generally used in conjunction with the OVER clause, which allows you to define the window or the set of rows you want to operate on.
Here's a general explanation of how to use these functions, along with an example:
Syntax
1. FIRST_VALUE
FIRST_VALUE(column_name) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
[ROWS or RANGE specification]
)
2. LAST_VALUE
LAST_VALUE(column_name) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
[ROWS or RANGE specification]
)
Explanation of Syntax Components
- column_name: The column from which you want to get the first or last value.
- PARTITION BY: This clause divides the result set into partitions (groups of rows) to which the function is applied. This is optional.
- ORDER BY: This clause defines the order of the rows in each partition. This is mandatory for both functions.
- ROWS or RANGE specification: Optional clauses that can further refine the set of rows to consider.
Example
Assume we have a table named sales with the following columns:
- salesperson_id
- sale_date
- amount
We want to find the first and last sale amounts for each salesperson based on the sale date.
SELECT
salesperson_id,
sale_date,
amount,
FIRST_VALUE(amount) OVER (
PARTITION BY salesperson_id
ORDER BY sale_date
) AS first_sale_amount,
LAST_VALUE(amount) OVER (
PARTITION BY salesperson_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_sale_amount
FROM
sales
ORDER BY
salesperson_id, sale_date;
Breakdown of the Example:
- We partition the data by salesperson_id, so each salesperson's sales are considered separately.
- The ORDER BY sale_date clause ensures that the values are ordered chronologically.
- You must specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for LAST_VALUE() to fetch the last value correctly across all rows in the partition since, by default, it would only consider rows up to the current row unless this specification is provided.
Notes:
- If there are ties in the ordering, the functions will return the first or last value based on the specific criteria provided.
- The results will also include the original columns alongside the first and last values for clarity.
Using FIRST_VALUE and LAST_VALUE can be very powerful in analytical queries where you need to extract boundary values from various partitions of data.