The DISTINCT keyword in SQL is used to remove duplicate rows from the result set of a query. When you apply DISTINCT, it ensures that the output will contain only unique records based on the columns specified in the SELECT statement.
Here’s how you can use the DISTINCT keyword:
Basic Syntax
SELECT DISTINCT column1, column2, ...
FROM table_name;
Example
Assume you have a table named employees with the following data:
| employee_id | name | department |
|-------------|-----------|------------|
| 1 | Alice | HR |
| 2 | Bob | IT |
| 3 | Alice | HR |
| 4 | Charlie | Sales |
| 5 | Bob | IT |
If you want to get a list of unique employee names, you would write:
SELECT DISTINCT name FROM employees;
Result
The result set would be:
| name |
|---------|
| Alice |
| Bob |
| Charlie |
Using DISTINCT with Multiple Columns
If you want to retrieve unique combinations of multiple columns (for example, finding unique employees by name and department), you can specify multiple columns like this:
SELECT DISTINCT name, department FROM employees;
Result
The result set might look like:
| name | department |
|---------|------------|
| Alice | HR |
| Bob | IT |
| Charlie | Sales |
Important Notes
1. Performance: Using DISTINCT can impact performance, especially on large datasets, as it requires the database engine to perform additional work to identify unique rows.
2. NULL Values: The DISTINCT keyword treats NULL values as equal. Therefore, if there are multiple rows with NULL in a specified column, only one NULL will be returned in the result set.
3. Combined with Other Clauses: You can use DISTINCT in combination with other clauses like ORDER BY or WHERE to filter results further before eliminating duplicates.
By using DISTINCT, you can effectively ensure that your query returns only unique rows based on the selected columns.