DataScrub
Back to all guides

JSON to CSV: Why Flattening Data is So Difficult

Formatting

The Clash of Data Structures

CSV is a flat, two-dimensional format. It has rows and columns, much like a spreadsheet. JSON, on the other hand, is hierarchical and multi-dimensional. A single JSON object can contain lists within lists within objects. Forcing this 3D structure into a 2D CSV grid is the classic "square peg, round hole" problem of data engineering.

The Flattening Process

To convert JSON to CSV, the hierarchy must be "flattened." If a user object has an address object nested inside it, flattening combines the keys using dot notation. For example, user.address.city becomes the column header, and the city name becomes the cell value.

The Array Problem

The real difficulty arises when JSON contains arrays. If a user object contains an array of five recent purchases, how do you represent that in a CSV? You have two choices:

  1. Explode the rows: Create five separate rows for the user, one for each purchase. This duplicates the user's base data but is best for database imports.
  2. Stringify the array: Keep it as one row, but turn the purchases array into a single comma-separated text string inside one cell. This is best for quick human reading, but terrible for analysis.

Automated Conversion

DataScrub's JSON ↔ CSV Converter handles the flattening process automatically, utilizing intelligent heuristics to determine the best way to represent nested objects and arrays in a flat grid. It runs entirely in your browser, making it perfect for quickly transforming API responses without writing custom Python scripts.