Remove duplicate rows from a CSV

Same row showing up three times — bad join, a missing DISTINCT, yesterday's script ran twice. We compare every row against every other and drop the exact matches, keeping the first occurrence.

Duplicates removed

Before

A,B
1,2
1,2
3,4

After

A,B
1,2
3,4

Drop your CSV file here

or click to browse

The "duplicate rows" fix will be auto-detected.

What is this and why does it matter?

Duplicate rows appear when you merge spreadsheets, re-export data, copy-paste entries, or combine files from overlapping sources. In large files, they're easy to miss — your row counts look inflated, totals are wrong, and you can't tell which records are real.

CSV First Aid compares every row against every other row and flags exact duplicates. The first occurrence is always kept; only the repeated copies are marked for removal.

Because this runs after other fixes (like trimming spaces and cleaning invisible characters), it also catches rows that only become duplicates after cleanup — for example, two rows that looked different because of hidden spaces.

How it works

1Drop your CSV. We compare every row to find exact duplicates.
2The diagnosis card shows how many duplicate rows were found. This fix is not enabled by default — you turn it on manually, since sometimes duplicate rows are intentional.
3Toggle it on, Apply → duplicates are removed. The report shows how many rows were dropped.

FAQ

Why is this fix opt-in instead of on by default?

Some datasets legitimately have identical rows (e.g., transaction logs, time-series data). Removing them by default could cause data loss. You need to consciously enable it.

Does it detect near-duplicates (fuzzy matching)?

Currently only exact duplicates (all cells identical) are detected. Fuzzy deduplication requires domain-specific rules and is planned for a future release.

Which occurrence is kept — first or last?

The first occurrence is always kept. All subsequent identical rows are removed.