163 Advanced

Whitespace Parser

Functional Programming

Tutorial

The Problem

Most text formats are whitespace-insensitive: {"key": "value"} and { "key" : "value" } are equivalent JSON. Parsers must skip whitespace between tokens without interfering with token recognition. ws0 (zero or more whitespace characters) and ws1 (one or more) are standard utilities. Wrapping content parsers with ws_wrap allows callers to ignore whitespace concerns entirely, keeping individual token parsers clean and focused.

🎯 Learning Outcomes

• Implement ws0, ws1, and ws_wrap as standard whitespace-handling utilities

• Understand why whitespace parsers always succeed (zero whitespace is valid)

• Learn the "wrap with whitespace" pattern for building whitespace-insensitive parsers

• See how comment-skipping extends whitespace handling for real languages

Code Example

fn ws0<'a>() -> Parser<'a, ()> {
    Box::new(|input: &'a str| {
        let trimmed = input.trim_start();
        Ok(((), trimmed))
    })
}

let ws0 : unit parser = fun input ->
  match many0 (satisfy is_ws "whitespace") input with
  | Ok (_, rest) -> Ok ((), rest)
  | Error e -> Error e

Key Differences

Efficiency: OCaml's skip_while skips bytes without constructing values; Rust's many0(satisfy(...)) creates Vec<char> and discards it.

Optimization: A production Rust parser would use input.trim_start() directly or scan with str::find(|c: char| !c.is_whitespace()) — bypassing the combinator overhead.

Line counting: Neither basic ws0 tracks line numbers; adding line/column tracking requires threading a position state through the parser.

Comment handling: Both can extend ws0 to also skip comments; the typical approach is many0(choice([whitespace, line_comment, block_comment])).

OCaml Approach

Angstrom provides skip_while : (char -> bool) -> unit t for efficient whitespace skipping without character-by-character overhead:

let ws = skip_while (fun c -> c = ' ' || c = '\t' || c = '\n' || c = '\r')
let ws_wrap p = ws *> p <* ws

OCaml's skip_while scans the buffer without constructing char values, making whitespace skipping more efficient than many0(satisfy(...)).

Full Source

#![allow(clippy::all)]
// Example 163: Whitespace Parser
// Parse and skip whitespace: ws0, ws1, ws_wrap

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

// ============================================================
// Approach 1: ws0 — skip zero or more whitespace (always succeeds)
// ============================================================

fn ws0<'a>() -> Parser<'a, ()> {
    Box::new(|input: &'a str| {
        let trimmed = input.trim_start();
        Ok(((), trimmed))
    })
}

// ============================================================
// Approach 2: ws1 — require at least one whitespace
// ============================================================

fn ws1<'a>() -> Parser<'a, ()> {
    Box::new(|input: &'a str| match input.chars().next() {
        Some(c) if c.is_ascii_whitespace() => {
            let trimmed = input.trim_start();
            Ok(((), trimmed))
        }
        _ => Err("Expected whitespace".to_string()),
    })
}

// ============================================================
// Approach 3: ws_wrap — parse with surrounding whitespace
// ============================================================

fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
    Box::new(move |input: &'a str| {
        let trimmed = input.trim_start();
        let (value, rest) = parser(trimmed)?;
        let trimmed_rest = rest.trim_start();
        Ok((value, trimmed_rest))
    })
}

/// Line comment: skip from '#' to end of line
fn line_comment<'a>() -> Parser<'a, ()> {
    Box::new(|input: &'a str| {
        if input.starts_with('#') {
            match input.find('\n') {
                Some(pos) => Ok(((), &input[pos..])),
                None => Ok(((), "")),
            }
        } else {
            Err("Expected '#'".to_string())
        }
    })
}

fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
    let exp = expected.to_string();
    Box::new(move |input: &'a str| {
        if input.starts_with(&exp) {
            Ok((&input[..exp.len()], &input[exp.len()..]))
        } else {
            Err(format!("Expected \"{}\"", exp))
        }
    })
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ws0_spaces() {
        assert_eq!(ws0()("  hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws0_no_spaces() {
        assert_eq!(ws0()("hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws0_empty() {
        assert_eq!(ws0()(""), Ok(((), "")));
    }

    #[test]
    fn test_ws0_tabs_newlines() {
        assert_eq!(ws0()("\t\n  x"), Ok(((), "x")));
    }

    #[test]
    fn test_ws1_success() {
        assert_eq!(ws1()("  hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws1_fail() {
        assert!(ws1()("hello").is_err());
    }

    #[test]
    fn test_ws_wrap() {
        let p = ws_wrap(tag("hello"));
        assert_eq!(p("  hello  rest"), Ok(("hello", "rest")));
    }

    #[test]
    fn test_ws_wrap_no_spaces() {
        let p = ws_wrap(tag("hello"));
        assert_eq!(p("hello"), Ok(("hello", "")));
    }

    #[test]
    fn test_line_comment() {
        assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
    }

    #[test]
    fn test_line_comment_eof() {
        assert_eq!(line_comment()("# comment"), Ok(((), "")));
    }

    #[test]
    fn test_line_comment_not_hash() {
        assert!(line_comment()("code").is_err());
    }
}

(* Example 163: Whitespace Parser *)
(* Parse and skip whitespace: ws, ws0, ws1 *)

type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result

let satisfy pred desc : char parser = fun input ->
  if String.length input > 0 && pred input.[0] then
    Ok (input.[0], String.sub input 1 (String.length input - 1))
  else Error (Printf.sprintf "Expected %s" desc)

let many0 p : 'a list parser = fun input ->
  let rec go acc r = match p r with Ok (v, r') -> go (v::acc) r' | Error _ -> Ok (List.rev acc, r)
  in go [] input

let many1 p : 'a list parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, r) -> match many0 p r with Ok (vs, r') -> Ok (v::vs, r') | Error e -> Error e

let is_ws c = c = ' ' || c = '\t' || c = '\n' || c = '\r'

(* Approach 1: ws0 — skip zero or more whitespace *)
let ws0 : unit parser = fun input ->
  match many0 (satisfy is_ws "whitespace") input with
  | Ok (_, rest) -> Ok ((), rest)
  | Error e -> Error e

(* Approach 2: ws1 — require at least one whitespace *)
let ws1 : unit parser = fun input ->
  match many1 (satisfy is_ws "whitespace") input with
  | Ok (_, rest) -> Ok ((), rest)
  | Error e -> Error e

(* Approach 3: ws_wrap — parse p surrounded by optional whitespace *)
let ws_wrap (p : 'a parser) : 'a parser = fun input ->
  match ws0 input with
  | Ok ((), r1) ->
    (match p r1 with
     | Ok (v, r2) ->
       (match ws0 r2 with
        | Ok ((), r3) -> Ok (v, r3)
        | Error e -> Error e)
     | Error e -> Error e)
  | Error e -> Error e

(* line comment: skip from # to newline *)
let line_comment : unit parser = fun input ->
  if String.length input > 0 && input.[0] = '#' then
    let rec skip i =
      if i >= String.length input || input.[i] = '\n' then i
      else skip (i + 1) in
    let end_pos = skip 1 in
    Ok ((), String.sub input end_pos (String.length input - end_pos))
  else Error "Expected '#'"

(* Tests *)
let () =
  assert (ws0 "  hello" = Ok ((), "hello"));
  assert (ws0 "hello" = Ok ((), "hello"));
  assert (ws0 "" = Ok ((), ""));

  assert (ws1 "  hello" = Ok ((), "hello"));
  assert (Result.is_error (ws1 "hello"));

  let tag s : string parser = fun input ->
    let len = String.length s in
    if String.length input >= len && String.sub input 0 len = s then
      Ok (s, String.sub input len (String.length input - len))
    else Error (Printf.sprintf "Expected \"%s\"" s) in

  assert (ws_wrap (tag "hello") "  hello  rest" = Ok ("hello", "rest"));
  assert (ws_wrap (tag "hello") "hello" = Ok ("hello", ""));

  assert (line_comment "# comment\ncode" = Ok ((), "code"));

  print_endline "✓ All tests passed"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ws0_spaces() {
        assert_eq!(ws0()("  hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws0_no_spaces() {
        assert_eq!(ws0()("hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws0_empty() {
        assert_eq!(ws0()(""), Ok(((), "")));
    }

    #[test]
    fn test_ws0_tabs_newlines() {
        assert_eq!(ws0()("\t\n  x"), Ok(((), "x")));
    }

    #[test]
    fn test_ws1_success() {
        assert_eq!(ws1()("  hello"), Ok(((), "hello")));
    }

    #[test]
    fn test_ws1_fail() {
        assert!(ws1()("hello").is_err());
    }

    #[test]
    fn test_ws_wrap() {
        let p = ws_wrap(tag("hello"));
        assert_eq!(p("  hello  rest"), Ok(("hello", "rest")));
    }

    #[test]
    fn test_ws_wrap_no_spaces() {
        let p = ws_wrap(tag("hello"));
        assert_eq!(p("hello"), Ok(("hello", "")));
    }

    #[test]
    fn test_line_comment() {
        assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
    }

    #[test]
    fn test_line_comment_eof() {
        assert_eq!(line_comment()("# comment"), Ok(((), "")));
    }

    #[test]
    fn test_line_comment_not_hash() {
        assert!(line_comment()("code").is_err());
    }
}

Deep Comparison

Comparison: Example 163 — Whitespace Parser

ws0

OCaml:

let ws0 : unit parser = fun input ->
  match many0 (satisfy is_ws "whitespace") input with
  | Ok (_, rest) -> Ok ((), rest)
  | Error e -> Error e

Rust:

fn ws0<'a>() -> Parser<'a, ()> {
    Box::new(|input: &'a str| {
        let trimmed = input.trim_start();
        Ok(((), trimmed))
    })
}

ws_wrap

OCaml:

let ws_wrap (p : 'a parser) : 'a parser = fun input ->
  match ws0 input with
  | Ok ((), r1) ->
    (match p r1 with
     | Ok (v, r2) ->
       (match ws0 r2 with
        | Ok ((), r3) -> Ok (v, r3)
        | Error e -> Error e)
     | Error e -> Error e)
  | Error e -> Error e

Rust:

fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
    Box::new(move |input: &'a str| {
        let trimmed = input.trim_start();
        let (value, rest) = parser(trimmed)?;
        let trimmed_rest = rest.trim_start();
        Ok((value, trimmed_rest))
    })
}

Exercises

Extend ws0 to also skip line comments: // ...until end of line.

Implement ws_between(open: Parser<A>, sep: Parser<B>, close: Parser<C>) -> Parser<Vec<B>> that handles whitespace around separators.

Write a lexeme(p: Parser<T>) -> Parser<T> combinator that skips whitespace after p (a common pattern in language parsers).

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust