How do you use the CUME_DIST() function to calculate the cumulative distribution of values within a result set?
Posted by PaulAnd
Last Updated: June 10, 2024
The CUME_DIST() function is a window function in SQL that calculates the cumulative distribution of a value within a result set. It returns the relative position (or rank) of a specific value compared to all the values in the partitioned result set. The output ranges from 0 to 1, representing the proportion of rows that are less than or equal to the current row's value.
Syntax
The basic syntax of the CUME_DIST() function is:
CUME_DIST() OVER ( [PARTITION BY partition_expression] ORDER BY sort_expression )
Explanation of Components:
- PARTITION BY partition_expression: This is optional. It divides the result set into partitions to which the CUME_DIST() function is applied. If omitted, the function treats the entire result set as a single partition. - ORDER BY sort_expression: This is required. It defines the order of the rows in each partition, which is used to determine the cumulative distribution.
Example Use Case
Let's say you have a table named Sales with the following columns: Employee, SalesAmount. You want to calculate the cumulative distribution of sales amounts for each employee.
CREATE TABLE Sales (
    Employee VARCHAR(100),
    SalesAmount DECIMAL(10, 2)
);

INSERT INTO Sales (Employee, SalesAmount)
VALUES
('Alice', 500),
('Bob', 700),
('Charlie', 600),
('David', 800);
Here’s how to calculate the cumulative distribution of the SalesAmount:
SELECT 
    Employee,
    SalesAmount,
    CUME_DIST() OVER (ORDER BY SalesAmount) AS CumulativeDistribution
FROM 
    Sales
ORDER BY 
    SalesAmount;
Output Explanation
The output of the above query will provide the cumulative distribution for each row based on the SalesAmount. Here’s how it might look based on the data provided: | Employee | SalesAmount | CumulativeDistribution | |----------|-------------|-----------------------| | Alice | 500 | 0.25 | | Charlie | 600 | 0.50 | | Bob | 700 | 0.75 | | David | 800 | 1.00 |
Interpretation of Output
- Alice has a CumulativeDistribution of 0.25, meaning 25% of the employees have sales amounts less than or equal to Alice's (1 out of 4). - Charlie has a distribution of 0.50, indicating that 50% of the employees have sales amounts less than or equal to Charlie's (2 out of 4). - Bob and David follow the same logic, with Bob having 75% and David having 100% since his sales amount is the highest. This allows you to understand the relative performance of each employee's sales in the total distribution.
Related Content