Remove duplicate rows from a CSV
Same row showing up three times — bad join, a missing DISTINCT, yesterday's script ran twice. We compare every row against every other and drop the exact matches, keeping the first occurrence.
Duplicates removed
Before
A,B 1,2 1,2 3,4
After
A,B 1,2 3,4
Drop your CSV file here
or click to browse
The "duplicate rows" fix will be auto-detected.
What is this and why does it matter?
Duplicate rows appear when you merge spreadsheets, re-export data, copy-paste entries, or combine files from overlapping sources. In large files, they're easy to miss — your row counts look inflated, totals are wrong, and you can't tell which records are real.
CSV First Aid compares every row against every other row and flags exact duplicates. The first occurrence is always kept; only the repeated copies are marked for removal.
Because this runs after other fixes (like trimming spaces and cleaning invisible characters), it also catches rows that only become duplicates after cleanup — for example, two rows that looked different because of hidden spaces.
How it works
- 1Drop your CSV. We compare every row to find exact duplicates.
- 2The diagnosis card shows how many duplicate rows were found. This fix is not enabled by default — you turn it on manually, since sometimes duplicate rows are intentional.
- 3Toggle it on, Apply → duplicates are removed. The report shows how many rows were dropped.
FAQ
Why is this fix opt-in instead of on by default?
Some datasets legitimately have identical rows (e.g., transaction logs, time-series data). Removing them by default could cause data loss. You need to consciously enable it.
Does it detect near-duplicates (fuzzy matching)?
Currently only exact duplicates (all cells identical) are detected. Fuzzy deduplication requires domain-specific rules and is planned for a future release.
Which occurrence is kept — first or last?
The first occurrence is always kept. All subsequent identical rows are removed.
Related tools
CSV whitespace trimmer
A single trailing space is why your VLOOKUP misses, why two rows look like duplicates but aren't, why the join silently loses half the records. One pass trims every cell — matches start working again.
Remove invisible characters from a CSV
NBSP, zero-width joiners, stray control bytes — you can't see them in Excel, but VLOOKUP can, and it won't match. We scan every cell and remove the ones that shouldn't be there.
CSV to Excel (XLSX)
CSV in, .xlsx out. Encoding is detected from the bytes, special characters survive the round-trip, column widths auto-fit. Opens the same in Excel, Google Sheets, and Numbers.