CSV to JSON: Handling Headers, Quotes and Nested Data

CSV files look deceptively simple — rows of values separated by commas — until you hit a field that contains a comma, a newline, or a double-quote. JSON has its own gotchas: nested objects, arrays inside arrays, and inconsistent types that break downstream parsers. This guide covers every conversion problem you are likely to hit in the real world, with concrete rules and examples so you get clean output every time.

How RFC 4180 Actually Works

RFC 4180 is the closest thing CSV has to an official spec. Its rules are simple but frequently ignored by exporters:

Fields containing commas, double-quotes, or line breaks must be wrapped in double-quote characters.
A double-quote inside a quoted field is escaped by doubling it: "He said ""hello""" parses to He said "hello".
The first record may be a header row — the spec does not require it, but the convention is near-universal.
Line endings are CRLF (\r\n) per spec, though most parsers accept LF-only files.

Where exports go wrong: Excel on Windows writes CRLF; a Linux script may strip the \r and leave a stray character at the end of every field. If your JSON values end with \r, that is the cause.

A minimal but legally valid RFC 4180 file with a quoted field looks like this:

name,city,note
Alice,"New York","Works 9-5, Mon-Fri"
Bob,Austin,"Said ""hi"" once"

The third field on row two contains an embedded quote. A conformant parser returns the string Said "hi" once.

Delimiter Problems Beyond the Comma

Despite the name, CSV files routinely use other delimiters. The most common alternatives:

Delimiter	Typical source	File extension hint
Comma `,`	Excel, Google Sheets, most APIs	.csv
Tab `\t`	Database exports, TSV standard	.tsv or .txt
Semicolon `;`	Excel in European locales (comma = decimal separator)	.csv
Pipe `\|`	Legacy mainframe exports, EDI	.txt or .dat
Caret `^`	Some ERP systems	.csv or .txt

Auto-detection works by scanning the first few rows and counting candidate separator characters outside quoted regions. If every row has exactly four semicolons and zero commas, the file is almost certainly semicolon-delimited. A good converter lets you override the detected delimiter manually — European Excel exports are the most frequent case where auto-detection misfires.

Type Inference: Turning Strings into Real JSON Values

Every CSV field is a string. JSON has strings, numbers, booleans, and null. A quality converter infers the intended type rather than wrapping everything in quotes.

Standard inference rules, applied in order:

Empty field — emit null (not an empty string, unless you need strict string output).
Boolean — true, false, TRUE, FALSE, 1/0 in a column where every non-empty value is one of those tokens.
Integer — matches /^-?\d+$/ and fits in a safe integer range (up to 2⁵³ − 1 in JavaScript).
Float — matches /^-?\d*\.\d+([eE][+-]?\d+)?$/.
String — everything else, including values like 007 (leading zero signals intentional string) or 1,200 (thousands separator — do not silently coerce).

The leading-zero rule matters: product codes, ZIP codes, and phone numbers often start with zero. A converter that strips leading zeros corrupts your data silently. Treat any numeric-looking value with a leading zero as a string.

Dates are a deliberate omission from this list. "2024-01-15" could be an ISO date, a fiscal period identifier, or a version string. Converting it to a Date object without an explicit instruction loses information. Leave it as a string and let the application layer decide.

Working with Headers: Nested Keys and Dot Notation

Flat CSV maps naturally to a JSON array of objects when column names become keys. Nested JSON is harder. The standard approach is dot-notation headers: a column named address.city maps to { "address": { "city": "..." } }.

Example CSV:

id,name,address.city,address.zip,scores[0],scores[1]
1,Alice,Austin,78701,92,87
2,Bob,Denver,80201,78,95

Converted to JSON with dot-notation expansion:

[{ "id": 1, "name": "Alice", "address": { "city": "Austin", "zip": "78701" }, "scores": [92, 87] }, ...]

Array notation (scores[0], scores[1]) is less standardized — some tools use it, others require a different syntax. Check what your converter supports before building a pipeline around it.

If your header names contain dots for unrelated reasons (e.g., v1.2.3 as a version column), disable dot-notation expansion or the converter will misinterpret it as a nested path.

Flattening Nested JSON Back to CSV

The reverse operation — nested JSON to flat CSV — requires decisions that have no single correct answer:

Nested objects: flatten with dot-notation keys (address.city) or serialise the object as a JSON string in one cell. Dot-notation is more useful for spreadsheets; JSON-string is safer when the nested shape is irregular.
Arrays of primitives: either expand into indexed columns (tag[0], tag[1]...) or join into a single cell ("foo,bar,baz"). Expansion creates ragged columns when array lengths differ across rows.
Arrays of objects: the trickiest case. A user with three orders cannot be represented in one row without data loss. The correct solution is to emit multiple rows per user (one per order), repeating the user-level fields — the classic relational normalisation pattern.

Before flattening, decide: do you need the CSV to be reconstructable back to the original JSON? If yes, use dot-notation consistently and avoid joined arrays. If the CSV is purely for human review in a spreadsheet, readability beats round-trippability.

Common Errors and How to Fix Them

Most conversion failures come from a small set of root causes:

Symptom	Root cause	Fix
Fields shifted right by one column from row N onwards	Unquoted newline inside a field	Re-export with proper RFC 4180 quoting, or quote all multiline fields
JSON values end with `\r`	CRLF line endings not stripped	Strip `\r` before parsing, or open file in binary mode and handle explicitly
Numbers like `07890` become `7890`	Aggressive type coercion	Use a converter that respects leading zeros as string signals
Quoted field shows extra double-quotes	Parser not handling RFC 4180 double-quote escape	Use a spec-compliant parser; avoid naive `split(",")` approaches
Single-row JSON object instead of array	CSV has only a header + one data row	Expected — one-row CSVs produce single-element arrays; check array wrapping setting

Preguntas frecuentes

Why does Excel produce semicolon-delimited CSV instead of comma-delimited?+

In locales where the comma serves as the decimal separator (most of continental Europe), Excel uses the system list separator — typically a semicolon — to avoid ambiguity. If you open the file on an English-locale machine it looks broken. Re-save from Excel with 'CSV UTF-8 (Comma delimited)' or change the delimiter setting in the converter.

Should empty CSV fields become null or an empty string in JSON?+

It depends on your schema. Null signals a missing value; an empty string signals a value that is known to be empty. For most data pipelines null is the right choice because it lets downstream code do null checks cleanly. If your application explicitly distinguishes between the two, look for a converter that lets you configure this behaviour.

My CSV has duplicate column names. What happens during conversion?+

Duplicate keys in a JSON object are technically allowed by the spec but produce undefined behaviour in most parsers — one value silently overwrites the other. A well-designed converter will warn you about duplicate headers. The safest fix is to rename the columns before converting.

How do I convert a JSON array of objects back to CSV without losing nested data?+

Use dot-notation flattening: each nested key becomes a column named with its full path (e.g., address.city). For arrays of objects inside a record, you have to choose between emitting multiple rows per parent record (normalised) or serialising the sub-array as a JSON string in a single cell. There is no lossless single-row representation for variable-length nested arrays.

Is UTF-8 BOM in a CSV file a problem?+

A BOM (byte order mark, the three bytes EF BB BF at the start of a file) is added by Excel when saving as CSV UTF-8. Most converters handle it silently, but some naive parsers include the BOM characters in the first column name, producing a key like 'name' instead of 'name'. Always trim the BOM or use a parser that explicitly strips it.