๐Ÿฆ€ Functional Rust

151: Introduction to Parser Combinators

Difficulty: โญโญโญ Level: Foundations A parser is just a function: take a string, return `(parsed_value, remaining_string)` or an error โ€” and from that one idea, you can build a parser for any language.

The Problem This Solves

You need to read structured text: a config file, a JSON document, a math expression, some custom DSL your team invented. You reach for regex โ€” and it works, until the input gets nested. Regex can't count matching brackets. It can't parse `(1 + (2 * 3))` and know where the inner expression ends. The moment your format has any nesting at all, regex hits a wall. You could use a parser generator (ANTLR, Yacc), but those require a separate grammar file, a build step, and learning a new notation just to get a `Vec<Token>` back. What if you could write your parser in pure Rust, with no external tools, no codegen, no macros โ€” just functions composed together? Parser combinators give you exactly that. The core insight is almost comically simple: a parser is a function. It takes a string slice and returns either a success (value + what's left to parse) or a failure. Once you have that definition, every other concept โ€” repetition, sequencing, alternation โ€” is just a function that takes parsers and returns a new parser. No magic, no framework. Just functions all the way down.

The Intuition

You already know this pattern. Imagine you're reading a sentence word by word. You look at the first word, recognize it, and remember where you stopped. Then you pass "where you stopped" to the next step, which does the same. If any step fails, you stop and report what went wrong. That's exactly what a parser does. The "where you stopped" is `&str` โ€” a slice into the original input. This is zero-copy: we never duplicate the string, we just move a pointer forward. The type alias captures everything:
type ParseResult<'a, T> = Result<(T, &'a str), String>;
A `Parser` is just a boxed function with that signature:
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
`Box<dyn Fn(...)>` is how Rust stores a function when you don't know its exact type at compile time. Closures have unique anonymous types โ€” boxing erases that type so different parsers can be stored and passed around uniformly.

How It Works in Rust

The simplest parser โ€” a plain function:
fn parse_char_a(input: &str) -> ParseResult<char> {
 match input.chars().next() {
     Some('a') => Ok(('a', &input[1..])),  // consume the 'a', return rest
     Some(c)   => Err(format!("Expected 'a', got '{}'", c)),
     None      => Err("Expected 'a', got end of input".to_string()),
 }
}
`input.chars().next()` peeks at the first Unicode character. `&input[1..]` moves forward by one byte (fine for ASCII; for multi-byte chars, use `c.len_utf8()`). A factory function that builds parsers:
fn char_p<'a>(expected: char) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {  // 'move' captures `expected` by value
     match input.chars().next() {
         Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
         None    => Err(format!("Expected '{}', got end of input", expected)),
     }
 })
}
`move` is required: the closure must own `expected` because it will outlive the function call that created it. Utility parsers for any type:
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
 Box::new(move |input| Ok((value.clone(), input)))  // always succeeds, consumes nothing
}

fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
 let msg = msg.to_string();
 Box::new(move |_| Err(msg.clone()))  // always fails
}
These seem trivial, but `pure` becomes essential when you need a parser that "returns a default value" and `fail` is useful for testing and error recovery. Running a parser:
let p = char_p('h');
match p("hello") {
 Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
 Err(e)         => println!("Error: {}", e),
}
// โ†’ Parsed 'h', remaining: "ello"

What This Unlocks

Key Differences

ConceptOCamlRust
Parser type`string -> ('a * string, string) result``Box<dyn Fn(&'a str) -> Result<(T, &'a str), String> + 'a>`
Why boxed?Not needed โ€” GC handles function valuesClosures have unique types; `Box<dyn Fn>` erases them
Lifetime `'a`Handled by GCExplicit: ties output slice to input lifetime
Closure captureAutomatic`move` keyword required for owned captures
Library`Angstrom`, `opal``nom`, `combine`, `winnow` (or roll your own)
// Example 151: Introduction to Parser Combinators
// Defining the core Parser type and running basic parsers

/// Core type: a parse result is either (value, remaining_input) or an error
type ParseResult<'a, T> = Result<(T, &'a str), String>;

// ============================================================
// Approach 1: Parser as a plain function
// ============================================================

fn parse_char_a(input: &str) -> ParseResult<char> {
    match input.chars().next() {
        Some('a') => Ok(('a', &input[1..])),
        Some(c) => Err(format!("Expected 'a', got '{}'", c)),
        None => Err("Expected 'a', got end of input".to_string()),
    }
}

// ============================================================
// Approach 2: Parser as a boxed closure (our standard type)
// ============================================================

type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

/// Create a parser that always succeeds with the given value
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
    Box::new(move |input| Ok((value.clone(), input)))
}

/// Create a parser that always fails with the given message
fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
    let msg = msg.to_string();
    Box::new(move |_input| Err(msg.clone()))
}

/// Create a parser that matches a specific character
fn char_p<'a>(expected: char) -> Parser<'a, char> {
    Box::new(move |input: &'a str| {
        match input.chars().next() {
            Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
            None => Err(format!("Expected '{}', got end of input", expected)),
        }
    })
}

/// Run a parser on input
fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
    parser(input)
}

// ============================================================
// Approach 3: Parser as a trait object (alternative design)
// ============================================================

trait Parse<T> {
    fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, T>;
}

struct CharParser {
    expected: char,
}

impl Parse<char> for CharParser {
    fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, char> {
        match input.chars().next() {
            Some(c) if c == self.expected => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Expected '{}', got '{}'", self.expected, c)),
            None => Err(format!("Expected '{}', got EOF", self.expected)),
        }
    }
}

fn main() {
    // Approach 1: Plain function
    println!("=== Approach 1: Plain function ===");
    match parse_char_a("abc") {
        Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
        Err(e) => println!("Error: {}", e),
    }

    // Approach 2: Boxed closure
    println!("\n=== Approach 2: Boxed closure ===");
    let p = char_p('h');
    match run_parser(&p, "hello") {
        Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
        Err(e) => println!("Error: {}", e),
    }

    let p = pure(42);
    match run_parser(&p, "anything") {
        Ok((val, rest)) => println!("Pure value: {}, remaining: \"{}\"", val, rest),
        Err(e) => println!("Error: {}", e),
    }

    // Approach 3: Trait object
    println!("\n=== Approach 3: Trait object ===");
    let p = CharParser { expected: 'x' };
    match p.parse("xyz") {
        Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
        Err(e) => println!("Error: {}", e),
    }

    println!("\nโœ“ All examples completed");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_plain_function_success() {
        assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
    }

    #[test]
    fn test_plain_function_failure() {
        assert!(parse_char_a("xyz").is_err());
    }

    #[test]
    fn test_plain_function_empty() {
        assert!(parse_char_a("").is_err());
    }

    #[test]
    fn test_char_parser_success() {
        let p = char_p('h');
        assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
    }

    #[test]
    fn test_char_parser_failure() {
        let p = char_p('h');
        assert!(run_parser(&p, "world").is_err());
    }

    #[test]
    fn test_pure() {
        let p = pure(42);
        assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
    }

    #[test]
    fn test_fail() {
        let p: Parser<i32> = fail("oops");
        assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
    }

    #[test]
    fn test_trait_parser() {
        let p = CharParser { expected: 'z' };
        assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
    }

    #[test]
    fn test_trait_parser_failure() {
        let p = CharParser { expected: 'z' };
        assert!(p.parse("abc").is_err());
    }
}
(* Example 151: Introduction to Parser Combinators *)
(* OCaml approach using Angstrom-style concepts *)

(* Approach 1: Basic parser type as a function *)
type 'a parse_result = Ok of 'a * string | Error of string

type 'a parser = string -> 'a parse_result

let run_parser (p : 'a parser) (input : string) : 'a parse_result =
  p input

(* A parser that consumes a single character 'a' *)
let parse_a : char parser = fun input ->
  if String.length input > 0 && input.[0] = 'a' then
    Ok ('a', String.sub input 1 (String.length input - 1))
  else
    Error (Printf.sprintf "Expected 'a', got '%s'" input)

(* Approach 2: Using a module to encapsulate parser operations *)
module Parser = struct
  type 'a t = string -> 'a parse_result

  let return (x : 'a) : 'a t = fun input -> Ok (x, input)

  let fail (msg : string) : 'a t = fun _input -> Error msg

  let run (p : 'a t) (input : string) = p input

  (* Parse a specific character *)
  let char (c : char) : char t = fun input ->
    if String.length input > 0 && input.[0] = c then
      Ok (c, String.sub input 1 (String.length input - 1))
    else
      Error (Printf.sprintf "Expected '%c'" c)
end

(* Approach 3: Using Angstrom-like combinators *)
(* In real OCaml, you'd use: open Angstrom *)
(* let p = char 'a' *)
(* let result = parse_string ~consume:All p "a" *)

(* Simulating Angstrom's interface *)
module Angstrom_like = struct
  type 'a t = string -> ('a * string, string) result

  let char c : char t = fun input ->
    if String.length input > 0 && input.[0] = c then
      Result.ok (c, String.sub input 1 (String.length input - 1))
    else
      Result.error (Printf.sprintf "Expected '%c'" c)

  let parse_string p input =
    match p input with
    | Result.Ok (v, rest) ->
      if String.length rest = 0 then Result.Ok v
      else Result.Error "Unconsumed input"
    | Result.Error e -> Result.Error e
end

(* Tests *)
let () =
  (* Test Approach 1 *)
  (match run_parser parse_a "abc" with
   | Ok ('a', "bc") -> ()
   | _ -> failwith "Test 1 failed");

  (match run_parser parse_a "xyz" with
   | Error _ -> ()
   | _ -> failwith "Test 2 failed");

  (* Test Approach 2 *)
  (match Parser.run (Parser.char 'b') "bcd" with
   | Ok ('b', "cd") -> ()
   | _ -> failwith "Test 3 failed");

  (match Parser.run (Parser.return 42) "hello" with
   | Ok (42, "hello") -> ()
   | _ -> failwith "Test 4 failed");

  (* Test Approach 3 *)
  (match Angstrom_like.parse_string (Angstrom_like.char 'x') "x" with
   | Result.Ok 'x' -> ()
   | _ -> failwith "Test 5 failed");

  print_endline "โœ“ All tests passed"

๐Ÿ“Š Detailed Comparison

Comparison: Example 151 โ€” Parser Combinator Introduction

Core Parser Type

OCaml:

๐Ÿช Show OCaml equivalent
type 'a parse_result = Ok of 'a * string | Error of string
type 'a parser = string -> 'a parse_result

Rust:

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

Character Parser

OCaml:

๐Ÿช Show OCaml equivalent
let char (c : char) : char parser = fun input ->
if String.length input > 0 && input.[0] = c then
 Ok (c, String.sub input 1 (String.length input - 1))
else
 Error (Printf.sprintf "Expected '%c'" c)

Rust:

fn char_p<'a>(expected: char) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {
     match input.chars().next() {
         Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
         None => Err(format!("Expected '{}', got end of input", expected)),
     }
 })
}

Pure / Return

OCaml:

๐Ÿช Show OCaml equivalent
let return (x : 'a) : 'a parser = fun input -> Ok (x, input)

Rust:

fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
 Box::new(move |input| Ok((value.clone(), input)))
}

Running a Parser

OCaml:

๐Ÿช Show OCaml equivalent
let run (p : 'a parser) (input : string) = p input

Rust:

fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
 parser(input)
}