๐Ÿฆ€ Functional Rust

765: CSV Parsing Without External Crates

Difficulty: 3 Level: Intermediate A complete RFC 4180-compliant CSV parser using an explicit state machine โ€” handles quoted fields, embedded commas, and escaped quotes.

The Problem This Solves

CSV looks trivial โ€” split on commas, right? Then you encounter `"Smith, John"` and realize commas inside quotes are valid. Then `"She said ""hello"""` and realize quotes inside quoted fields are represented as doubled quotes. Then Windows line endings (`\r\n`). Then empty fields. The naive `split(',')` approach breaks on all of these. In production, CSV appears everywhere: exports from databases, spreadsheets, billing systems, analytics platforms. Getting the parsing wrong means silent data corruption โ€” you read `Smith` from a field that should be `Smith, John`, and the downstream system gets garbage. RFC 4180 defines the standard, and a correct parser follows it precisely. Writing this by hand also teaches you state machines โ€” a fundamental tool in systems programming. The CSV state machine has exactly three states (`Normal`, `Quoted`, `QuoteInQuoted`), and the transition logic fits in a single `match`. Understanding this pattern makes every other format parser easier to reason about.

The Intuition

Think of Python's `csv.reader` โ€” it handles all these edge cases internally. In JavaScript, `Papa Parse` does the same. In Rust, the `csv` crate is excellent for production. But writing it by hand shows you exactly what those libraries are doing. The state machine approach is cleaner than hand-tracking indices. You process one character at a time and transition between states:

How It Works in Rust

#[derive(Debug, PartialEq)]
enum State { Normal, Quoted, QuoteInQuoted }

pub fn parse_fields(line: &str) -> Vec<String> {
 let mut fields = Vec::new();
 let mut buf = String::new();
 let mut state = State::Normal;

 for ch in line.chars() {
     match (&state, ch) {
         // Normal: comma ends field, quote starts quoted field
         (State::Normal, ',') => { fields.push(buf.clone()); buf.clear(); }
         (State::Normal, '"') => { state = State::Quoted; }
         (State::Normal, c)   => { buf.push(c); }

         // Quoted: quote might end field or be escaped
         (State::Quoted, '"') => { state = State::QuoteInQuoted; }
         (State::Quoted, c)   => { buf.push(c); }

         // Just saw closing quote โ€” is it escaped or end of field?
         (State::QuoteInQuoted, '"') => {
             buf.push('"');              // "" = escaped quote
             state = State::Quoted;
         }
         (State::QuoteInQuoted, ',') => {
             fields.push(buf.clone());  // field ended
             buf.clear();
             state = State::Normal;
         }
         (State::QuoteInQuoted, c)   => {
             buf.push(c);               // trailing content after closing quote
             state = State::Normal;
         }
     }
 }
 fields.push(buf);  // last field (no trailing comma)
 fields
}

// Parse typed records from rows
impl Person {
 pub fn from_row(row: &[String]) -> Option<Self> {
     if row.len() < 3 { return None; }
     let age = row[1].trim().parse().ok()?;
     Some(Person { name: row[0].clone(), age, city: row[2].clone() })
 }
}

// Parse a whole CSV document
pub fn parse_csv(text: &str) -> Vec<Vec<String>> {
 text.lines()
     .map(|l| l.trim_end_matches('\r'))  // handle Windows \r\n
     .filter(|l| !l.is_empty())
     .map(parse_fields)
     .collect()
}
Result for `"Bob, Jr.",25,"New York"`: Result for `"a""b",c`: Key points:

What This Unlocks

Key Differences

ConceptOCamlRust
State machineVariant type + recursive function`enum State` + `match (&state, ch)`
Field accumulation`Buffer.t``String` with `push` / `clear`
State transitionMatch on `(state, char)`Same โ€” match on `(&state, ch)`
Row parsing`String.split_on_char ','` (naive)State machine โ€” handles quotes and escapes
Production library`csv-ex`, `octavius``csv` crate
Windows line endingsManual stripping`.trim_end_matches('\r')`
//! # CSV Parsing Pattern
//!
//! Simple CSV parser without external dependencies.

/// A parsed CSV row
pub type Row = Vec<String>;

/// CSV parse error
#[derive(Debug, PartialEq)]
pub enum CsvError {
    UnterminatedQuote(usize),
    InconsistentColumns { expected: usize, got: usize, line: usize },
}

/// Parse a CSV string into rows
pub fn parse_csv(input: &str) -> Result<Vec<Row>, CsvError> {
    let mut rows = Vec::new();
    let mut expected_cols = None;

    for (line_num, line) in input.lines().enumerate() {
        if line.trim().is_empty() {
            continue;
        }
        let row = parse_row(line, line_num)?;
        
        match expected_cols {
            None => expected_cols = Some(row.len()),
            Some(n) if row.len() != n => {
                return Err(CsvError::InconsistentColumns {
                    expected: n,
                    got: row.len(),
                    line: line_num,
                });
            }
            _ => {}
        }
        
        rows.push(row);
    }

    Ok(rows)
}

/// Parse a single CSV row
fn parse_row(line: &str, line_num: usize) -> Result<Row, CsvError> {
    let mut fields = Vec::new();
    let mut current = String::new();
    let mut in_quotes = false;
    let mut chars = line.chars().peekable();

    while let Some(ch) = chars.next() {
        if in_quotes {
            if ch == '"' {
                if chars.peek() == Some(&'"') {
                    chars.next();
                    current.push('"');
                } else {
                    in_quotes = false;
                }
            } else {
                current.push(ch);
            }
        } else {
            match ch {
                '"' => in_quotes = true,
                ',' => {
                    fields.push(current.trim().to_string());
                    current = String::new();
                }
                _ => current.push(ch),
            }
        }
    }

    if in_quotes {
        return Err(CsvError::UnterminatedQuote(line_num));
    }

    fields.push(current.trim().to_string());
    Ok(fields)
}

/// Format rows as CSV
pub fn format_csv(rows: &[Row]) -> String {
    rows.iter()
        .map(|row| {
            row.iter()
                .map(|field| {
                    if field.contains(',') || field.contains('"') || field.contains('\n') {
                        format!("\"{}\"", field.replace('"', "\"\""))
                    } else {
                        field.clone()
                    }
                })
                .collect::<Vec<_>>()
                .join(",")
        })
        .collect::<Vec<_>>()
        .join("\n")
}

/// Parse CSV with headers, returning maps
pub fn parse_csv_with_headers(
    input: &str,
) -> Result<Vec<std::collections::HashMap<String, String>>, CsvError> {
    let rows = parse_csv(input)?;
    if rows.is_empty() {
        return Ok(Vec::new());
    }

    let headers = &rows[0];
    let mut result = Vec::new();

    for row in rows.iter().skip(1) {
        let mut map = std::collections::HashMap::new();
        for (i, value) in row.iter().enumerate() {
            if let Some(header) = headers.get(i) {
                map.insert(header.clone(), value.clone());
            }
        }
        result.push(map);
    }

    Ok(result)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_simple_csv() {
        let input = "a,b,c\n1,2,3\n4,5,6";
        let rows = parse_csv(input).unwrap();
        assert_eq!(rows.len(), 3);
        assert_eq!(rows[0], vec!["a", "b", "c"]);
        assert_eq!(rows[1], vec!["1", "2", "3"]);
    }

    #[test]
    fn test_quoted_field() {
        let input = r#"name,value
"hello, world",42"#;
        let rows = parse_csv(input).unwrap();
        assert_eq!(rows[1][0], "hello, world");
    }

    #[test]
    fn test_escaped_quote() {
        let input = "text\n\"say \"\"hello\"\"\"";
        let rows = parse_csv(input).unwrap();
        assert_eq!(rows[1][0], "say \"hello\"");
    }

    #[test]
    fn test_inconsistent_columns() {
        let input = "a,b,c\n1,2";
        let result = parse_csv(input);
        assert!(matches!(
            result,
            Err(CsvError::InconsistentColumns { .. })
        ));
    }

    #[test]
    fn test_format_csv() {
        let rows = vec![
            vec!["a".to_string(), "b".to_string()],
            vec!["1".to_string(), "2".to_string()],
        ];
        let output = format_csv(&rows);
        assert_eq!(output, "a,b\n1,2");
    }

    #[test]
    fn test_with_headers() {
        let input = "name,age\nAlice,30\nBob,25";
        let records = parse_csv_with_headers(input).unwrap();
        assert_eq!(records.len(), 2);
        assert_eq!(records[0].get("name").unwrap(), "Alice");
        assert_eq!(records[0].get("age").unwrap(), "30");
    }
}
(* CSV parsing without external crates in OCaml *)

(* RFC 4180-compliant CSV field parser *)
let parse_fields line =
  let len = String.length line in
  let fields = ref [] in
  let i = ref 0 in
  while !i <= len do
    if !i = len then begin
      fields := "" :: !fields;
      i := len + 1
    end else if line.[!i] = '"' then begin
      (* Quoted field *)
      incr i;
      let buf = Buffer.create 16 in
      let stop = ref false in
      while not !stop && !i < len do
        if line.[!i] = '"' then begin
          if !i + 1 < len && line.[!i + 1] = '"' then begin
            Buffer.add_char buf '"';
            i := !i + 2
          end else begin
            incr i;
            stop := true
          end
        end else begin
          Buffer.add_char buf line.[!i];
          incr i
        end
      done;
      fields := Buffer.contents buf :: !fields;
      if !i < len && line.[!i] = ',' then incr i
      else if !i >= len then i := len + 1
    end else begin
      (* Unquoted field *)
      let start = !i in
      while !i < len && line.[!i] <> ',' do incr i done;
      fields := String.sub line start (!i - start) :: !fields;
      if !i < len then incr i
      else i := len + 1
    end
  done;
  List.rev !fields

type person = { name: string; age: int; city: string }

let parse_person fields =
  match fields with
  | [name; age_s; city] ->
    (try Some { name; age = int_of_string (String.trim age_s); city }
     with Failure _ -> None)
  | _ -> None

let csv = {|Name,Age,City
Alice,30,Amsterdam
"Bob, Jr.",25,"New York"
Carol,35,Berlin|}

let () =
  let lines = String.split_on_char '\n' csv in
  match lines with
  | [] | [_] -> ()
  | _header :: rows ->
    List.iter (fun line ->
      let fields = parse_fields line in
      match parse_person fields with
      | Some p -> Printf.printf "Person: %s, %d, %s\n" p.name p.age p.city
      | None   -> Printf.printf "Could not parse: %s\n" line
    ) rows