What is a CSV BOM and Why Does It Break Everything?
The BOM (Byte Order Mark) is 3 invisible bytes at the start of a file that cause an outsized amount of pain. This guide explains what it is, why it exists, and how to get rid of it.
What is a BOM?
BOM stands for Byte Order Mark. For UTF-8, it's the byte sequence EF BB BF (U+FEFF). It was designed to signal the encoding of a text file — a reader sees these bytes and knows the file is UTF-8.
For UTF-16 and UTF-32, the BOM is essential because those encodings can be big-endian or little-endian. For UTF-8, it's unnecessary because UTF-8 has no byte-order ambiguity. The Unicode standard explicitly says a BOM is 'neither required nor recommended' for UTF-8.
Why does Excel add it?
When you Save As → CSV UTF-8 in Excel, it adds a BOM so that Excel itself can detect the encoding when re-opening the file. This is a Microsoft-centric design choice that works great within the Excel ecosystem and breaks everything else.
What breaks?
Python's csv module: the first header becomes '\ufeffid' instead of 'id'. dict reader keys don't match. Pandas: the first column is named '\ufeffcolumn_name'. SQL BULK INSERT: rejects the file or puts the BOM in the first row's data. JSON parsers: fail if the BOM precedes JSON content. Shell tools: grep, awk, cut see the BOM as part of the first field.
How to remove it
Python: open('file.csv', encoding='utf-8-sig') — the 'sig' suffix tells Python to strip the BOM. In Node.js: buffer.toString('utf8').replace(/^\uFEFF/, ''). In Bash: sed -i '1s/^\xEF\xBB\xBF//' file.csv. Or just drop the file into CSV First Aid — one click.
Should I ever add a BOM?
Only if your sole consumer is Excel. If the file will be processed by any other tool (Python, databases, APIs, shell scripts), omit the BOM. CSV First Aid lets you add a BOM on export for Excel compatibility — but only when you explicitly opt in.
Don't want to touch hex editors? CSV First Aid detects and removes the BOM for you.
Fix your CSV now →Related tools
Strip the UTF-8 BOM from your CSV
First column shows up as 'ID' instead of 'ID'? That's a UTF-8 BOM — three invisible bytes most export tools leave behind. We strip them and the header reads clean again.
Fix CSV encoding
Seeing é, ü, ö where you expected é, ü, ö? The file was saved in one encoding and read in another. We figure out which one, then convert to UTF-8 so Müller looks like Müller again.
Remove invisible characters from a CSV
NBSP, zero-width joiners, stray control bytes — you can't see them in Excel, but VLOOKUP can, and it won't match. We scan every cell and remove the ones that shouldn't be there.