What is a CSV BOM and Why Does It Break Everything?

The BOM (Byte Order Mark) is 3 invisible bytes at the start of a file that cause an outsized amount of pain. This guide explains what it is, why it exists, and how to get rid of it.

What is a BOM?

BOM stands for Byte Order Mark. For UTF-8, it's the byte sequence EF BB BF (U+FEFF). It was designed to signal the encoding of a text file — a reader sees these bytes and knows the file is UTF-8.

For UTF-16 and UTF-32, the BOM is essential because those encodings can be big-endian or little-endian. For UTF-8, it's unnecessary because UTF-8 has no byte-order ambiguity. The Unicode standard explicitly says a BOM is 'neither required nor recommended' for UTF-8.

Why does Excel add it?

When you Save As → CSV UTF-8 in Excel, it adds a BOM so that Excel itself can detect the encoding when re-opening the file. This is a Microsoft-centric design choice that works great within the Excel ecosystem and breaks everything else.

What breaks?

Python's csv module: the first header becomes '\ufeffid' instead of 'id'. dict reader keys don't match. Pandas: the first column is named '\ufeffcolumn_name'. SQL BULK INSERT: rejects the file or puts the BOM in the first row's data. JSON parsers: fail if the BOM precedes JSON content. Shell tools: grep, awk, cut see the BOM as part of the first field.

How to remove it

Python: open('file.csv', encoding='utf-8-sig') — the 'sig' suffix tells Python to strip the BOM. In Node.js: buffer.toString('utf8').replace(/^\uFEFF/, ''). In Bash: sed -i '1s/^\xEF\xBB\xBF//' file.csv. Or just drop the file into CSV First Aid — one click.

Should I ever add a BOM?

Only if your sole consumer is Excel. If the file will be processed by any other tool (Python, databases, APIs, shell scripts), omit the BOM. CSV First Aid lets you add a BOM on export for Excel compatibility — but only when you explicitly opt in.