How do you use the PERCENTILE_CONT and PERCENTILE_DISC functions for percentile calculations?
Posted by TinaGrn
Last Updated: June 13, 2024
The PERCENTILE_CONT and PERCENTILE_DISC functions are used in SQL to calculate percentiles, but they operate slightly differently. Both functions are often used within a SELECT statement and are typically part of the window functions in SQL.
PERCENTILE_CONT
PERCENTILE_CONT is a continuous percentile function that calculates the value below which a given percentage of observations fall. It interpolates between values if necessary, meaning if the desired percentile does not correspond to an exact value in your data set, it will calculate a value in between two points. Syntax:
PERCENTILE_CONT(fraction) WITHIN GROUP (ORDER BY column_name) OVER (PARTITION BY partition_column)
- fraction: A decimal between 0 and 1 (e.g., 0.25 for the 25th percentile). - column_name: The column of numeric values on which you are calculating the percentile. - PARTITION BY partition_column: (Optional) Allows you to calculate percentiles for subsets of data. Example:
SELECT department,
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;
In this example, the median salary (50th percentile) is calculated for each department.
PERCENTILE_DISC
PERCENTILE_DISC is a discrete percentile function that returns the actual value from the data set at the specified percentile, rather than interpolating. This means that it will always return a value that exists in the data set. Syntax:
PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY column_name) OVER (PARTITION BY partition_column)
- The parameters have the same meaning as in PERCENTILE_CONT. Example:
SELECT department,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;
In this example, using PERCENTILE_DISC, the query returns the median salary from actual salary values in each department, without interpolation.
Key Differences
- Interpolation: PERCENTILE_CONT may return a value that is not an actual value in the dataset, while PERCENTILE_DISC always returns an actual data point. - Use Case: Use PERCENTILE_CONT when you want a continuous estimate of percentile values, and PERCENTILE_DISC when you need to retrieve actual data points.
Conclusion
Both functions provide valuable methods for calculating percentiles from datasets. The choice between using PERCENTILE_CONT and PERCENTILE_DISC depends on whether you want continuous or discrete results.