Delete Duplicate Records in SQL

In relational databases, duplicate records can occur due to various reasons such as data entry errors, system glitches, or incorrect import processes. Duplicates can affect performance, produce misleading results, and lead to data integrity issues. This article explores how to identify and delete duplicate rows in SQL using different methods.

Delete Duplicate Records in SQL

Getting Started

In SQL, a duplicate row means that all (or a subset of) columns in two or more rows have the same values. In the below table, rows with ID 1, 3, and 4 are duplicates based on the Name and Email fields.

IDNameEmail
1Johnjohn@example.com
2Alicealice@example.com
3Johnjohn@example.com
4Johnjohn@example.com

Identifying Duplicates

You can use a GROUP BY clause along with HAVING COUNT(*) > 1 to find duplicates:

 SELECT Name, Email, COUNT(ID)  
 FROM Users  
 GROUP BY Name, Email  
 HAVING COUNT(ID) > 1;  

This query will return the duplicate combinations of Name and Email.

Deleting Duplicate Records

There are several ways to delete duplicates in SQL, depending on your SQL dialect (e.g., MySQL, PostgreSQL, SQL Server, Oracle). Below are common approaches.

Approach 1: Using ROW_NUMBER()

If your database supports window functions (like PostgreSQL, SQL Server, Oracle, or MySQL 8+), use the ROW_NUMBER() function to identify and delete duplicate records in SQL.

SQL Queries To Delete Duplicate Records In SQL
 WITH CTE AS (  
  SELECT *,  
      ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY ID) AS RowNumber  
  FROM Users  
 )  
 DELETE FROM CTE WHERE RowNumber > 1;  

Explanation:
  • ROW_NUMBER() assigns a unique number to each row within a duplicate group.
  • Only the first occurrence (rn = 1) is kept.
  • All others (rn > 1) are deleted.
Approach 2: Delete with GROUP BY and MIN()

If you don’t have access to window functions, you can still remove duplicates using GROUP BY with a subquery.

SQL Queries To Delete Duplicates in SQL
 DELETE FROM Users  
 WHERE ID NOT IN (  
  SELECT MIN(ID) FROM Users GROUP BY Name, Email  
 );  

Explanation:
  • GROUP BY selects the smallest ID (or any rule) per duplicate group.
  • All other rows are deleted.
Approach 3: Using a Temporary Table

In systems like MySQL < 8.0 or SQLite, use a temporary table. This is slower for large datasets but works reliably when other methods aren't available.

 CREATE TABLE temp_users AS  
 SELECT MIN(ID) as ID, Name, Email  
 FROM Users  
 GROUP BY Name, Email;  
 DELETE FROM Users;  
 INSERT INTO Users (ID, Name, Email)  
 SELECT * FROM temp_users;  

Precautions

  1. Backup your data before making any deletions.
  2. Run a SELECT query first to verify which rows will be deleted.
  3. Use a transaction to ensure safe rollbacks:
     BEGIN TRANSACTION;  
     -- deletion logic here  
     ROLLBACK; -- or COMMIT;  
    
  4. Consider whether all columns should match or just a subset.

Best Practices

  • Always define a PRIMARY KEY or UNIQUE constraint to prevent future duplicates.
  • Use data validation at the application level before inserting data.
  • Regularly audit your database for anomalies and duplicates.

Summary

Delete duplicates in SQL is a common data maintenance task. Whether you use ROW_NUMBER(), GROUP BY, or temporary tables, the key is to identify the logic that defines a "duplicate" and delete accordingly. Always proceed with caution and validate the data before deleting.

Thanks

Kailash Chandra Behera

An IT professional with over 13 years of experience in the full software development life cycle for Windows, services, and web-based applications using Microsoft .NET technologies. Demonstrated expertise in delivering all phases of project development—from initiation to closure—while aligning with business objectives to drive process improvements, competitive advantage, and measurable bottom-line gains. Proven ability to work independently and manage multiple projects successfully. Committed to the efficient and effective development of projects in fast-paced, deadline-driven environments. Skills: Proficient in designing and developing applications using various Microsoft technologies. Total IT Experience: 13+ years

Previous Post Next Post

نموذج الاتصال