The OVER clause in SQL is a powerful feature that allows you to perform calculations across a set of rows related to the current row. When used with aggregate functions, it can help you calculate running totals, commonly referred to as cumulative totals.
A running total, or cumulative sum, is a sum that grows as you move down the records in your result set. Here's how to use the OVER clause with aggregate functions to calculate running totals:
Typical Syntax
SELECT
column1,
column2,
SUM(column3) OVER (ORDER BY column1) AS running_total
FROM
table_name;
Explanation of the Components:
- SUM(column3): This is an aggregate function that calculates the sum of column3.
- OVER (ORDER BY column1): This specifies how to order the rows for the calculation. The running total will be calculated based on the order of column1.
- AS running_total: This is an alias for the new column that will hold the running total values.
Example
Assume you have a table called sales with the following structure:
| id | sale_date | amount |
|----|-----------|--------|
| 1 | 2023-01-01| 100 |
| 2 | 2023-01-02| 150 |
| 3 | 2023-01-03| 200 |
| 4 | 2023-01-04| 250 |
To calculate a running total of the amount based on sale_date, the query would look like this:
SELECT
id,
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM
sales
ORDER BY
sale_date;
Result
The result would look like this:
| id | sale_date | amount | running_total |
|----|-----------|--------|---------------|
| 1 | 2023-01-01| 100 | 100 |
| 2 | 2023-01-02| 150 | 250 |
| 3 | 2023-01-03| 200 | 450 |
| 4 | 2023-01-04| 250 | 700 |
Additional Options
- You can also include a PARTITION BY clause within the OVER clause if you want to calculate running totals over subsets of data. For example, if you have different categories and want a running total per category, you can use:
SUM(amount) OVER (PARTITION BY category ORDER BY sale_date)
This will reset the running total every time the category changes.
Summary
The OVER clause with aggregate functions is a versatile tool for calculating running totals, enabling you to analyze cumulative data over ordered sets of rows efficiently.