151: Introduction to Parser Combinators
Difficulty: โญโญโญ Level: Foundations A parser is just a function: take a string, return `(parsed_value, remaining_string)` or an error โ and from that one idea, you can build a parser for any language.The Problem This Solves
You need to read structured text: a config file, a JSON document, a math expression, some custom DSL your team invented. You reach for regex โ and it works, until the input gets nested. Regex can't count matching brackets. It can't parse `(1 + (2 * 3))` and know where the inner expression ends. The moment your format has any nesting at all, regex hits a wall. You could use a parser generator (ANTLR, Yacc), but those require a separate grammar file, a build step, and learning a new notation just to get a `Vec<Token>` back. What if you could write your parser in pure Rust, with no external tools, no codegen, no macros โ just functions composed together? Parser combinators give you exactly that. The core insight is almost comically simple: a parser is a function. It takes a string slice and returns either a success (value + what's left to parse) or a failure. Once you have that definition, every other concept โ repetition, sequencing, alternation โ is just a function that takes parsers and returns a new parser. No magic, no framework. Just functions all the way down.The Intuition
You already know this pattern. Imagine you're reading a sentence word by word. You look at the first word, recognize it, and remember where you stopped. Then you pass "where you stopped" to the next step, which does the same. If any step fails, you stop and report what went wrong. That's exactly what a parser does. The "where you stopped" is `&str` โ a slice into the original input. This is zero-copy: we never duplicate the string, we just move a pointer forward. The type alias captures everything:type ParseResult<'a, T> = Result<(T, &'a str), String>;
- `T` is whatever you parsed (a `char`, a `u64`, a `Vec<Token>`, anything)
- `&'a str` is the remaining input โ what the next parser will see
- `String` is the error message if parsing failed
- The lifetime `'a` ties the remaining slice to the original input (it's a view into the same memory)
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
`Box<dyn Fn(...)>` is how Rust stores a function when you don't know its exact type at compile time. Closures have unique anonymous types โ boxing erases that type so different parsers can be stored and passed around uniformly.
How It Works in Rust
The simplest parser โ a plain function:fn parse_char_a(input: &str) -> ParseResult<char> {
match input.chars().next() {
Some('a') => Ok(('a', &input[1..])), // consume the 'a', return rest
Some(c) => Err(format!("Expected 'a', got '{}'", c)),
None => Err("Expected 'a', got end of input".to_string()),
}
}
`input.chars().next()` peeks at the first Unicode character. `&input[1..]` moves forward by one byte (fine for ASCII; for multi-byte chars, use `c.len_utf8()`).
A factory function that builds parsers:
fn char_p<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| { // 'move' captures `expected` by value
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got end of input", expected)),
}
})
}
`move` is required: the closure must own `expected` because it will outlive the function call that created it.
Utility parsers for any type:
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
Box::new(move |input| Ok((value.clone(), input))) // always succeeds, consumes nothing
}
fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
let msg = msg.to_string();
Box::new(move |_| Err(msg.clone())) // always fails
}
These seem trivial, but `pure` becomes essential when you need a parser that "returns a default value" and `fail` is useful for testing and error recovery.
Running a parser:
let p = char_p('h');
match p("hello") {
Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
Err(e) => println!("Error: {}", e),
}
// โ Parsed 'h', remaining: "ello"
What This Unlocks
- No external dependencies โ your parser is pure Rust, no crates needed, no build magic.
- Full composability โ every example from 152 onward builds on this one type definition; combining parsers is just combining functions.
- Easy error handling โ `Result` gives you structured failure for free; combinators can enrich errors as they propagate.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Parser type | `string -> ('a * string, string) result` | `Box<dyn Fn(&'a str) -> Result<(T, &'a str), String> + 'a>` |
| Why boxed? | Not needed โ GC handles function values | Closures have unique types; `Box<dyn Fn>` erases them |
| Lifetime `'a` | Handled by GC | Explicit: ties output slice to input lifetime |
| Closure capture | Automatic | `move` keyword required for owned captures |
| Library | `Angstrom`, `opal` | `nom`, `combine`, `winnow` (or roll your own) |
// Example 151: Introduction to Parser Combinators
// Defining the core Parser type and running basic parsers
/// Core type: a parse result is either (value, remaining_input) or an error
type ParseResult<'a, T> = Result<(T, &'a str), String>;
// ============================================================
// Approach 1: Parser as a plain function
// ============================================================
fn parse_char_a(input: &str) -> ParseResult<char> {
match input.chars().next() {
Some('a') => Ok(('a', &input[1..])),
Some(c) => Err(format!("Expected 'a', got '{}'", c)),
None => Err("Expected 'a', got end of input".to_string()),
}
}
// ============================================================
// Approach 2: Parser as a boxed closure (our standard type)
// ============================================================
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
/// Create a parser that always succeeds with the given value
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
Box::new(move |input| Ok((value.clone(), input)))
}
/// Create a parser that always fails with the given message
fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
let msg = msg.to_string();
Box::new(move |_input| Err(msg.clone()))
}
/// Create a parser that matches a specific character
fn char_p<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got end of input", expected)),
}
})
}
/// Run a parser on input
fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
parser(input)
}
// ============================================================
// Approach 3: Parser as a trait object (alternative design)
// ============================================================
trait Parse<T> {
fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, T>;
}
struct CharParser {
expected: char,
}
impl Parse<char> for CharParser {
fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, char> {
match input.chars().next() {
Some(c) if c == self.expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", self.expected, c)),
None => Err(format!("Expected '{}', got EOF", self.expected)),
}
}
}
fn main() {
// Approach 1: Plain function
println!("=== Approach 1: Plain function ===");
match parse_char_a("abc") {
Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
Err(e) => println!("Error: {}", e),
}
// Approach 2: Boxed closure
println!("\n=== Approach 2: Boxed closure ===");
let p = char_p('h');
match run_parser(&p, "hello") {
Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
Err(e) => println!("Error: {}", e),
}
let p = pure(42);
match run_parser(&p, "anything") {
Ok((val, rest)) => println!("Pure value: {}, remaining: \"{}\"", val, rest),
Err(e) => println!("Error: {}", e),
}
// Approach 3: Trait object
println!("\n=== Approach 3: Trait object ===");
let p = CharParser { expected: 'x' };
match p.parse("xyz") {
Ok((ch, rest)) => println!("Parsed '{}', remaining: \"{}\"", ch, rest),
Err(e) => println!("Error: {}", e),
}
println!("\nโ All examples completed");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_plain_function_success() {
assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
}
#[test]
fn test_plain_function_failure() {
assert!(parse_char_a("xyz").is_err());
}
#[test]
fn test_plain_function_empty() {
assert!(parse_char_a("").is_err());
}
#[test]
fn test_char_parser_success() {
let p = char_p('h');
assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
}
#[test]
fn test_char_parser_failure() {
let p = char_p('h');
assert!(run_parser(&p, "world").is_err());
}
#[test]
fn test_pure() {
let p = pure(42);
assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
}
#[test]
fn test_fail() {
let p: Parser<i32> = fail("oops");
assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
}
#[test]
fn test_trait_parser() {
let p = CharParser { expected: 'z' };
assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
}
#[test]
fn test_trait_parser_failure() {
let p = CharParser { expected: 'z' };
assert!(p.parse("abc").is_err());
}
}
(* Example 151: Introduction to Parser Combinators *)
(* OCaml approach using Angstrom-style concepts *)
(* Approach 1: Basic parser type as a function *)
type 'a parse_result = Ok of 'a * string | Error of string
type 'a parser = string -> 'a parse_result
let run_parser (p : 'a parser) (input : string) : 'a parse_result =
p input
(* A parser that consumes a single character 'a' *)
let parse_a : char parser = fun input ->
if String.length input > 0 && input.[0] = 'a' then
Ok ('a', String.sub input 1 (String.length input - 1))
else
Error (Printf.sprintf "Expected 'a', got '%s'" input)
(* Approach 2: Using a module to encapsulate parser operations *)
module Parser = struct
type 'a t = string -> 'a parse_result
let return (x : 'a) : 'a t = fun input -> Ok (x, input)
let fail (msg : string) : 'a t = fun _input -> Error msg
let run (p : 'a t) (input : string) = p input
(* Parse a specific character *)
let char (c : char) : char t = fun input ->
if String.length input > 0 && input.[0] = c then
Ok (c, String.sub input 1 (String.length input - 1))
else
Error (Printf.sprintf "Expected '%c'" c)
end
(* Approach 3: Using Angstrom-like combinators *)
(* In real OCaml, you'd use: open Angstrom *)
(* let p = char 'a' *)
(* let result = parse_string ~consume:All p "a" *)
(* Simulating Angstrom's interface *)
module Angstrom_like = struct
type 'a t = string -> ('a * string, string) result
let char c : char t = fun input ->
if String.length input > 0 && input.[0] = c then
Result.ok (c, String.sub input 1 (String.length input - 1))
else
Result.error (Printf.sprintf "Expected '%c'" c)
let parse_string p input =
match p input with
| Result.Ok (v, rest) ->
if String.length rest = 0 then Result.Ok v
else Result.Error "Unconsumed input"
| Result.Error e -> Result.Error e
end
(* Tests *)
let () =
(* Test Approach 1 *)
(match run_parser parse_a "abc" with
| Ok ('a', "bc") -> ()
| _ -> failwith "Test 1 failed");
(match run_parser parse_a "xyz" with
| Error _ -> ()
| _ -> failwith "Test 2 failed");
(* Test Approach 2 *)
(match Parser.run (Parser.char 'b') "bcd" with
| Ok ('b', "cd") -> ()
| _ -> failwith "Test 3 failed");
(match Parser.run (Parser.return 42) "hello" with
| Ok (42, "hello") -> ()
| _ -> failwith "Test 4 failed");
(* Test Approach 3 *)
(match Angstrom_like.parse_string (Angstrom_like.char 'x') "x" with
| Result.Ok 'x' -> ()
| _ -> failwith "Test 5 failed");
print_endline "โ All tests passed"
๐ Detailed Comparison
Comparison: Example 151 โ Parser Combinator Introduction
Core Parser Type
OCaml:
๐ช Show OCaml equivalent
type 'a parse_result = Ok of 'a * string | Error of string
type 'a parser = string -> 'a parse_result
Rust:
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;Character Parser
OCaml:
๐ช Show OCaml equivalent
let char (c : char) : char parser = fun input ->
if String.length input > 0 && input.[0] = c then
Ok (c, String.sub input 1 (String.length input - 1))
else
Error (Printf.sprintf "Expected '%c'" c)
Rust:
fn char_p<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got end of input", expected)),
}
})
}Pure / Return
OCaml:
๐ช Show OCaml equivalent
let return (x : 'a) : 'a parser = fun input -> Ok (x, input)
Rust:
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
Box::new(move |input| Ok((value.clone(), input)))
}Running a Parser
OCaml:
๐ช Show OCaml equivalent
let run (p : 'a parser) (input : string) = p input
Rust:
fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
parser(input)
}