JSON to CSV: Why Flattening Data is So Difficult
The Clash of Data Structures
CSV is a flat, two-dimensional format. It has rows and columns, much like a spreadsheet. JSON, on the other hand, is hierarchical and multi-dimensional. A single JSON object can contain lists within lists within objects. Forcing this 3D structure into a 2D CSV grid is the classic "square peg, round hole" problem of data engineering.
The Flattening Process
To convert JSON to CSV, the hierarchy must be "flattened." If a user object has an address object nested inside it, flattening combines the keys using dot notation. For example, user.address.city becomes the column header, and the city name becomes the cell value.
The Array Problem
The real difficulty arises when JSON contains arrays. If a user object contains an array of five recent purchases, how do you represent that in a CSV? You have two choices:
- Explode the rows: Create five separate rows for the user, one for each purchase. This duplicates the user's base data but is best for database imports.
- Stringify the array: Keep it as one row, but turn the purchases array into a single comma-separated text string inside one cell. This is best for quick human reading, but terrible for analysis.
Automated Conversion
DataScrub's JSON ↔ CSV Converter handles the flattening process automatically, utilizing intelligent heuristics to determine the best way to represent nested objects and arrays in a flat grid. It runs entirely in your browser, making it perfect for quickly transforming API responses without writing custom Python scripts.