770: Recursive Descent Parser from Scratch
Difficulty: 4 Level: Advanced Turn a context-free grammar directly into mutually recursive functions โ each grammar rule becomes a method, and the parser walks the input building an evaluatable AST.The Problem This Solves
Regular expressions can match patterns, but they can't parse nested structure. `"(1 + 2) * (3 + 4)"` has parentheses that can be arbitrarily deep โ you need a parser that calls itself recursively to handle them. Recursive descent is the most readable parsing technique: each grammar rule maps to one function, and you can read the code as documentation of the grammar. Understanding recursive descent unlocks a whole class of tools: you can write calculators, template engines, configuration DSLs, query languages, and custom command parsers. It's also the foundation for understanding how Rust's own compiler parses source code, and how tools like `nom` and `winnow` organize their combinators. The grammar for arithmetic expressions also teaches operator precedence by structure. `*` binds tighter than `+` because multiplication is handled at a deeper recursion level. This structural encoding of precedence is cleaner than looking up a precedence table.The Intuition
Imagine the grammar written out:expr โ term (('+'|'-') term)*
term โ factor (('*'|'/') factor)*
factor โ '(' expr ')' | number
Each line becomes one method. `parse_expr` calls `parse_term`. `parse_term` calls `parse_factor`. `parse_factor` either calls `parse_expr` (for parentheses) or reads a number. The recursion in the grammar becomes recursion in the code.
In Python, you'd write the same structure โ a class with methods for each rule. The difference in Rust is that the parser is position-tracking (`pos: usize`) and all error cases return `Result`. This matches production parsers like `syn` (Rust's macro parser) exactly.
How It Works in Rust
// Grammar:
// expr โ term (('+'|'-') term)*
// term โ factor (('*'|'/') factor)*
// factor โ '(' expr ')' | number
#[derive(Debug, Clone)]
pub enum Expr {
Num(f64),
BinOp { op: char, left: Box<Expr>, right: Box<Expr> },
}
pub struct Parser<'a> { input: &'a [u8], pos: usize }
impl<'a> Parser<'a> {
// expr: lowest precedence โ handles + and -
pub fn parse_expr(&mut self) -> Result<Expr, ParseError> {
let mut left = self.parse_term()?; // always parse a term first
self.skip_ws();
while matches!(self.peek(), Some('+') | Some('-')) {
let op = self.advance_op();
let right = self.parse_term()?; // right side is also a term
left = Expr::BinOp {
op, left: Box::new(left), right: Box::new(right)
};
}
Ok(left)
}
// term: higher precedence โ handles * and /
fn parse_term(&mut self) -> Result<Expr, ParseError> {
let mut left = self.parse_factor()?; // always parse a factor first
self.skip_ws();
while matches!(self.peek(), Some('*') | Some('/')) {
let op = self.advance_op();
let right = self.parse_factor()?;
left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
}
Ok(left)
}
// factor: highest precedence โ parentheses or a literal number
fn parse_factor(&mut self) -> Result<Expr, ParseError> {
self.skip_ws();
if self.peek() == Some('(') {
self.advance(); // consume '('
let expr = self.parse_expr()?; // RECURSE โ full expression inside parens
self.skip_ws();
self.expect(')')?; // consume ')'
return Ok(expr);
}
Ok(Expr::Num(self.parse_number()?)) // base case: a literal
}
}
// The AST evaluates itself
impl Expr {
pub fn eval(&self) -> f64 {
match self {
Expr::Num(n) => *n,
Expr::BinOp { op, left, right } => {
let (l, r) = (left.eval(), right.eval());
match op { '+' => l+r, '-' => l-r, '*' => l*r, '/' => l/r, _ => 0.0 }
}
}
}
}
// Usage
pub fn eval(input: &str) -> Result<f64, ParseError> {
Ok(Parser::new(input).parse_expr()?.eval())
}
// "2 + 3 * 4" โ 14 (not 20 โ precedence is correct because term is called before +)
// "(2 + 3) * 4" โ 20
Key points:
- `Box<Expr>` is required for recursive enum variants โ otherwise the type would be infinitely sized
- The precedence hierarchy is: `expr` โ `term` โ `factor` โ number/`(expr)` โ tighter binding = deeper recursion level
- `parse_factor` calling `parse_expr` creates the recursive cycle that handles nested parentheses
- Error type `ParseError(String)` keeps it simple โ production parsers add span information (byte offset, line/column)
- `while matches!(...)` handles left-associativity: `1 - 2 - 3` = `(1-2)-3`, not `1-(2-3)`
What This Unlocks
- Custom DSLs: a query language, template engine, or expression evaluator follows this exact pattern โ grammar โ rules โ methods โ AST โ evaluator
- Foundation for Pratt parsing: once you understand recursive descent, the Pratt technique (example 771) is a natural evolution for handling operator precedence tables
- Production parsing: `syn` (Rust macro AST), `swc` (JavaScript compiler), and many language servers use recursive descent โ this is the same technique at scale
Key Differences
| Concept | OCaml | Rust | |
|---|---|---|---|
| Parser state | Functional: thread input through returns | Mutable `Parser` struct with `pos: usize` | |
| AST recursive type | `type expr = Num of float \ | BinOp of ...` | `enum Expr` with `Box<Expr>` for recursion |
| Precedence | Separate parser functions per level | Same โ `parse_expr` calls `parse_term` calls `parse_factor` | |
| Error handling | Exceptions or `result` | `Result<Expr, ParseError>` โ propagated with `?` | |
| Left associativity | Accumulate via function recursion | `while` loop with left-accumulation | |
| Production library | `menhir`, `angstrom` | `nom`, `winnow`, `pest`, `lalrpop` |
// 770. Recursive Descent Parser from Scratch
// Grammar: expr โ term (('+'|'-') term)*
// term โ factor (('*'|'/') factor)*
// factor โ '(' expr ')' | number
// โโ AST โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
Num(f64),
BinOp { op: char, left: Box<Expr>, right: Box<Expr> },
}
impl Expr {
pub fn eval(&self) -> f64 {
match self {
Expr::Num(n) => *n,
Expr::BinOp { op, left, right } => {
let l = left.eval();
let r = right.eval();
match op {
'+' => l + r,
'-' => l - r,
'*' => l * r,
'/' => l / r,
_ => panic!("unknown op {op}"),
}
}
}
}
}
// โโ Parser โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#[derive(Debug)]
pub struct ParseError(pub String);
pub struct Parser<'a> {
input: &'a [u8],
pos: usize,
}
impl<'a> Parser<'a> {
pub fn new(input: &'a str) -> Self {
Self { input: input.as_bytes(), pos: 0 }
}
fn peek(&self) -> Option<char> {
self.input.get(self.pos).map(|&b| b as char)
}
fn advance(&mut self) { self.pos += 1; }
fn skip_ws(&mut self) {
while matches!(self.peek(), Some(' ') | Some('\t')) {
self.advance();
}
}
fn parse_number(&mut self) -> Result<f64, ParseError> {
self.skip_ws();
let start = self.pos;
if self.peek() == Some('-') { self.advance(); }
while matches!(self.peek(), Some('0'..='9') | Some('.')) {
self.advance();
}
let tok = std::str::from_utf8(&self.input[start..self.pos]).unwrap();
tok.parse::<f64>().map_err(|_| ParseError(format!("bad number: '{tok}'")))
}
fn parse_factor(&mut self) -> Result<Expr, ParseError> {
self.skip_ws();
if self.peek() == Some('(') {
self.advance();
let e = self.parse_expr()?;
self.skip_ws();
if self.peek() != Some(')') {
return Err(ParseError("expected ')'".into()));
}
self.advance();
Ok(e)
} else {
Ok(Expr::Num(self.parse_number()?))
}
}
fn parse_term(&mut self) -> Result<Expr, ParseError> {
let mut left = self.parse_factor()?;
loop {
self.skip_ws();
match self.peek() {
Some(op @ ('*' | '/')) => {
self.advance();
let right = self.parse_factor()?;
left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
}
_ => break,
}
}
Ok(left)
}
pub fn parse_expr(&mut self) -> Result<Expr, ParseError> {
let mut left = self.parse_term()?;
loop {
self.skip_ws();
match self.peek() {
Some(op @ ('+' | '-')) => {
self.advance();
let right = self.parse_term()?;
left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
}
_ => break,
}
}
Ok(left)
}
}
pub fn parse(input: &str) -> Result<Expr, ParseError> {
Parser::new(input).parse_expr()
}
fn main() {
let tests = [
("1 + 2 * 3", 7.0),
("(1 + 2) * 3", 9.0),
("10 / 2 - 3", 2.0),
("2 * (3 + 4) / 2", 7.0),
("100", 100.0),
];
for (expr, expected) in tests {
let ast = parse(expr).expect("parse failed");
let result = ast.eval();
println!("{expr:25} = {result:8.1} (expected {expected:.1}) {}",
if (result - expected).abs() < 1e-9 { "โ" } else { "โ" });
}
}
#[cfg(test)]
mod tests {
use super::*;
fn eval(s: &str) -> f64 { parse(s).unwrap().eval() }
#[test]
fn addition() { assert_eq!(eval("1 + 2"), 3.0); }
#[test]
fn precedence() { assert_eq!(eval("1 + 2 * 3"), 7.0); }
#[test]
fn parens_override() { assert_eq!(eval("(1 + 2) * 3"), 9.0); }
#[test]
fn nested_parens() { assert_eq!(eval("((2 + 3) * (4 - 1))"), 15.0); }
#[test]
fn division() { assert!((eval("10 / 4") - 2.5).abs() < 1e-9); }
}
(* Recursive descent parser in OCaml *)
(* โโ AST โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ *)
type expr =
| Num of float
| Add of expr * expr
| Sub of expr * expr
| Mul of expr * expr
| Div of expr * expr
let rec eval = function
| Num n -> n
| Add (a,b) -> eval a +. eval b
| Sub (a,b) -> eval a -. eval b
| Mul (a,b) -> eval a *. eval b
| Div (a,b) -> eval a /. eval b
(* โโ Lexer (character-level) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ *)
let s = ref ""
let pos = ref 0
let peek () = if !pos < String.length !s then Some !s.[!pos] else None
let next () = incr pos
let skip_ws () =
while match peek () with Some ' ' | Some '\t' -> true | _ -> false do next () done
let parse_number () =
skip_ws ();
let start = !pos in
(match peek () with Some '-' -> next () | _ -> ());
while (match peek () with Some c when c >= '0' && c <= '9' || c = '.' -> true | _ -> false) do next () done;
let tok = String.sub !s start (!pos - start) in
float_of_string tok
(* โโ Grammar โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ *)
let rec parse_expr () =
let left = ref (parse_term ()) in
skip_ws ();
let continue = ref true in
while !continue do
match peek () with
| Some '+' -> next (); let r = parse_term () in left := Add (!left, r); skip_ws ()
| Some '-' -> next (); let r = parse_term () in left := Sub (!left, r); skip_ws ()
| _ -> continue := false
done;
!left
and parse_term () =
let left = ref (parse_factor ()) in
skip_ws ();
let continue = ref true in
while !continue do
match peek () with
| Some '*' -> next (); let r = parse_factor () in left := Mul (!left, r); skip_ws ()
| Some '/' -> next (); let r = parse_factor () in left := Div (!left, r); skip_ws ()
| _ -> continue := false
done;
!left
and parse_factor () =
skip_ws ();
match peek () with
| Some '(' ->
next ();
let e = parse_expr () in
skip_ws ();
(match peek () with Some ')' -> next () | _ -> failwith "expected ')'");
e
| _ -> Num (parse_number ())
let parse input =
s := input; pos := 0;
parse_expr ()
let () =
let tests = [
"1 + 2 * 3", (* = 7 *)
"(1 + 2) * 3", (* = 9 *)
"10 / 2 - 3", (* = 2 *)
"2 * (3 + 4) / 2", (* = 7 *)
] in
List.iter (fun expr ->
let result = eval (parse expr) in
Printf.printf "%s = %g\n" expr result
) tests