How can you delete duplicate rows from a table?
Posted by QuinnLw
Last Updated: June 20, 2024
Deleting duplicate rows from a table can be accomplished using different methods depending on the database management system (DBMS) you are using. Here are a few common methods applicable in SQL:
Using Common Table Expressions (CTE) and the ROW_NUMBER() Function
One common method is to use a Common Table Expression (CTE) along with the ROW_NUMBER() function to identify and delete duplicates. This approach works in SQL Server, PostgreSQL, and Oracle. Here’s an example in SQL Server:
WITH CTE AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY (SELECT NULL)) AS rn
    FROM 
        your_table
)
DELETE FROM CTE WHERE rn > 1;
In this example: - Replace your_table with the name of your table. - Replace column1, column2 with the columns you want to check for duplicates. - rn is the row number assigned within each partition of duplicates.
Using a Temporary Table
Another method is to use a temporary table to keep only distinct rows.
-- Create a temporary table with distinct rows
CREATE TABLE temp_table AS
SELECT DISTINCT * 
FROM your_table;

-- Delete all rows from the original table
DELETE FROM your_table;

-- Insert the distinct rows back into the original table
INSERT INTO your_table
SELECT * FROM temp_table;

-- Drop the temporary table
DROP TABLE temp_table;
Using GROUP BY with HAVING Clause
You can also use a GROUP BY statement along with a HAVING COUNT(*) > 1 condition to find duplicates and delete them:
DELETE FROM your_table
WHERE id NOT IN (
    SELECT MIN(id)
    FROM your_table
    GROUP BY column1, column2
);
In this example, id is the primary key of your table which uniquely identifies each row. Replace column1, column2 with the actual columns you want to check for duplicates.
Important Considerations
1. Backup Your Data: Always make sure to back up your data before performing delete operations to avoid unintended data loss. 2. Test First: It’s a good practice to run your select statements first to ensure you’re correctly identifying the duplicates before performing delete operations. 3. Transaction Control: If your DBMS supports transactions, consider wrapping your delete operations in a transaction, so you can roll back if needed. Choose the method that best fits your use case and the features provided by your DBMS.