๐Ÿฆ€ Functional Rust

770: Recursive Descent Parser from Scratch

Difficulty: 4 Level: Advanced Turn a context-free grammar directly into mutually recursive functions โ€” each grammar rule becomes a method, and the parser walks the input building an evaluatable AST.

The Problem This Solves

Regular expressions can match patterns, but they can't parse nested structure. `"(1 + 2) * (3 + 4)"` has parentheses that can be arbitrarily deep โ€” you need a parser that calls itself recursively to handle them. Recursive descent is the most readable parsing technique: each grammar rule maps to one function, and you can read the code as documentation of the grammar. Understanding recursive descent unlocks a whole class of tools: you can write calculators, template engines, configuration DSLs, query languages, and custom command parsers. It's also the foundation for understanding how Rust's own compiler parses source code, and how tools like `nom` and `winnow` organize their combinators. The grammar for arithmetic expressions also teaches operator precedence by structure. `*` binds tighter than `+` because multiplication is handled at a deeper recursion level. This structural encoding of precedence is cleaner than looking up a precedence table.

The Intuition

Imagine the grammar written out:
expr   โ†’ term  (('+'|'-') term)*
term   โ†’ factor (('*'|'/') factor)*
factor โ†’ '(' expr ')' | number
Each line becomes one method. `parse_expr` calls `parse_term`. `parse_term` calls `parse_factor`. `parse_factor` either calls `parse_expr` (for parentheses) or reads a number. The recursion in the grammar becomes recursion in the code. In Python, you'd write the same structure โ€” a class with methods for each rule. The difference in Rust is that the parser is position-tracking (`pos: usize`) and all error cases return `Result`. This matches production parsers like `syn` (Rust's macro parser) exactly.

How It Works in Rust

// Grammar:
// expr   โ†’ term (('+'|'-') term)*
// term   โ†’ factor (('*'|'/') factor)*
// factor โ†’ '(' expr ')' | number

#[derive(Debug, Clone)]
pub enum Expr {
 Num(f64),
 BinOp { op: char, left: Box<Expr>, right: Box<Expr> },
}

pub struct Parser<'a> { input: &'a [u8], pos: usize }

impl<'a> Parser<'a> {
 // expr: lowest precedence โ€” handles + and -
 pub fn parse_expr(&mut self) -> Result<Expr, ParseError> {
     let mut left = self.parse_term()?;        // always parse a term first
     self.skip_ws();
     while matches!(self.peek(), Some('+') | Some('-')) {
         let op = self.advance_op();
         let right = self.parse_term()?;       // right side is also a term
         left = Expr::BinOp {
             op, left: Box::new(left), right: Box::new(right)
         };
     }
     Ok(left)
 }

 // term: higher precedence โ€” handles * and /
 fn parse_term(&mut self) -> Result<Expr, ParseError> {
     let mut left = self.parse_factor()?;      // always parse a factor first
     self.skip_ws();
     while matches!(self.peek(), Some('*') | Some('/')) {
         let op = self.advance_op();
         let right = self.parse_factor()?;
         left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
     }
     Ok(left)
 }

 // factor: highest precedence โ€” parentheses or a literal number
 fn parse_factor(&mut self) -> Result<Expr, ParseError> {
     self.skip_ws();
     if self.peek() == Some('(') {
         self.advance();                        // consume '('
         let expr = self.parse_expr()?;         // RECURSE โ€” full expression inside parens
         self.skip_ws();
         self.expect(')')?;                     // consume ')'
         return Ok(expr);
     }
     Ok(Expr::Num(self.parse_number()?))        // base case: a literal
 }
}

// The AST evaluates itself
impl Expr {
 pub fn eval(&self) -> f64 {
     match self {
         Expr::Num(n) => *n,
         Expr::BinOp { op, left, right } => {
             let (l, r) = (left.eval(), right.eval());
             match op { '+' => l+r, '-' => l-r, '*' => l*r, '/' => l/r, _ => 0.0 }
         }
     }
 }
}

// Usage
pub fn eval(input: &str) -> Result<f64, ParseError> {
 Ok(Parser::new(input).parse_expr()?.eval())
}

// "2 + 3 * 4" โ†’ 14 (not 20 โ€” precedence is correct because term is called before +)
// "(2 + 3) * 4" โ†’ 20
Key points:

What This Unlocks

Key Differences

ConceptOCamlRust
Parser stateFunctional: thread input through returnsMutable `Parser` struct with `pos: usize`
AST recursive type`type expr = Num of float \BinOp of ...``enum Expr` with `Box<Expr>` for recursion
PrecedenceSeparate parser functions per levelSame โ€” `parse_expr` calls `parse_term` calls `parse_factor`
Error handlingExceptions or `result``Result<Expr, ParseError>` โ€” propagated with `?`
Left associativityAccumulate via function recursion`while` loop with left-accumulation
Production library`menhir`, `angstrom``nom`, `winnow`, `pest`, `lalrpop`
// 770. Recursive Descent Parser from Scratch
// Grammar: expr โ†’ term (('+'|'-') term)*
//          term โ†’ factor (('*'|'/') factor)*
//          factor โ†’ '(' expr ')' | number

// โ”€โ”€ AST โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
    Num(f64),
    BinOp { op: char, left: Box<Expr>, right: Box<Expr> },
}

impl Expr {
    pub fn eval(&self) -> f64 {
        match self {
            Expr::Num(n) => *n,
            Expr::BinOp { op, left, right } => {
                let l = left.eval();
                let r = right.eval();
                match op {
                    '+' => l + r,
                    '-' => l - r,
                    '*' => l * r,
                    '/' => l / r,
                    _   => panic!("unknown op {op}"),
                }
            }
        }
    }
}

// โ”€โ”€ Parser โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug)]
pub struct ParseError(pub String);

pub struct Parser<'a> {
    input: &'a [u8],
    pos: usize,
}

impl<'a> Parser<'a> {
    pub fn new(input: &'a str) -> Self {
        Self { input: input.as_bytes(), pos: 0 }
    }

    fn peek(&self) -> Option<char> {
        self.input.get(self.pos).map(|&b| b as char)
    }

    fn advance(&mut self) { self.pos += 1; }

    fn skip_ws(&mut self) {
        while matches!(self.peek(), Some(' ') | Some('\t')) {
            self.advance();
        }
    }

    fn parse_number(&mut self) -> Result<f64, ParseError> {
        self.skip_ws();
        let start = self.pos;
        if self.peek() == Some('-') { self.advance(); }
        while matches!(self.peek(), Some('0'..='9') | Some('.')) {
            self.advance();
        }
        let tok = std::str::from_utf8(&self.input[start..self.pos]).unwrap();
        tok.parse::<f64>().map_err(|_| ParseError(format!("bad number: '{tok}'")))
    }

    fn parse_factor(&mut self) -> Result<Expr, ParseError> {
        self.skip_ws();
        if self.peek() == Some('(') {
            self.advance();
            let e = self.parse_expr()?;
            self.skip_ws();
            if self.peek() != Some(')') {
                return Err(ParseError("expected ')'".into()));
            }
            self.advance();
            Ok(e)
        } else {
            Ok(Expr::Num(self.parse_number()?))
        }
    }

    fn parse_term(&mut self) -> Result<Expr, ParseError> {
        let mut left = self.parse_factor()?;
        loop {
            self.skip_ws();
            match self.peek() {
                Some(op @ ('*' | '/')) => {
                    self.advance();
                    let right = self.parse_factor()?;
                    left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
                }
                _ => break,
            }
        }
        Ok(left)
    }

    pub fn parse_expr(&mut self) -> Result<Expr, ParseError> {
        let mut left = self.parse_term()?;
        loop {
            self.skip_ws();
            match self.peek() {
                Some(op @ ('+' | '-')) => {
                    self.advance();
                    let right = self.parse_term()?;
                    left = Expr::BinOp { op, left: Box::new(left), right: Box::new(right) };
                }
                _ => break,
            }
        }
        Ok(left)
    }
}

pub fn parse(input: &str) -> Result<Expr, ParseError> {
    Parser::new(input).parse_expr()
}

fn main() {
    let tests = [
        ("1 + 2 * 3",        7.0),
        ("(1 + 2) * 3",      9.0),
        ("10 / 2 - 3",       2.0),
        ("2 * (3 + 4) / 2",  7.0),
        ("100",             100.0),
    ];

    for (expr, expected) in tests {
        let ast = parse(expr).expect("parse failed");
        let result = ast.eval();
        println!("{expr:25} = {result:8.1}  (expected {expected:.1}) {}",
            if (result - expected).abs() < 1e-9 { "โœ“" } else { "โœ—" });
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    fn eval(s: &str) -> f64 { parse(s).unwrap().eval() }

    #[test]
    fn addition()         { assert_eq!(eval("1 + 2"), 3.0); }
    #[test]
    fn precedence()       { assert_eq!(eval("1 + 2 * 3"), 7.0); }
    #[test]
    fn parens_override()  { assert_eq!(eval("(1 + 2) * 3"), 9.0); }
    #[test]
    fn nested_parens()    { assert_eq!(eval("((2 + 3) * (4 - 1))"), 15.0); }
    #[test]
    fn division()         { assert!((eval("10 / 4") - 2.5).abs() < 1e-9); }
}
(* Recursive descent parser in OCaml *)

(* โ”€โ”€ AST โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ *)
type expr =
  | Num of float
  | Add of expr * expr
  | Sub of expr * expr
  | Mul of expr * expr
  | Div of expr * expr

let rec eval = function
  | Num n     -> n
  | Add (a,b) -> eval a +. eval b
  | Sub (a,b) -> eval a -. eval b
  | Mul (a,b) -> eval a *. eval b
  | Div (a,b) -> eval a /. eval b

(* โ”€โ”€ Lexer (character-level) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ *)
let s = ref ""
let pos = ref 0

let peek () = if !pos < String.length !s then Some !s.[!pos] else None
let next () = incr pos

let skip_ws () =
  while match peek () with Some ' ' | Some '\t' -> true | _ -> false do next () done

let parse_number () =
  skip_ws ();
  let start = !pos in
  (match peek () with Some '-' -> next () | _ -> ());
  while (match peek () with Some c when c >= '0' && c <= '9' || c = '.' -> true | _ -> false) do next () done;
  let tok = String.sub !s start (!pos - start) in
  float_of_string tok

(* โ”€โ”€ Grammar โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ *)
let rec parse_expr () =
  let left = ref (parse_term ()) in
  skip_ws ();
  let continue = ref true in
  while !continue do
    match peek () with
    | Some '+' -> next (); let r = parse_term () in left := Add (!left, r); skip_ws ()
    | Some '-' -> next (); let r = parse_term () in left := Sub (!left, r); skip_ws ()
    | _ -> continue := false
  done;
  !left

and parse_term () =
  let left = ref (parse_factor ()) in
  skip_ws ();
  let continue = ref true in
  while !continue do
    match peek () with
    | Some '*' -> next (); let r = parse_factor () in left := Mul (!left, r); skip_ws ()
    | Some '/' -> next (); let r = parse_factor () in left := Div (!left, r); skip_ws ()
    | _ -> continue := false
  done;
  !left

and parse_factor () =
  skip_ws ();
  match peek () with
  | Some '(' ->
    next ();
    let e = parse_expr () in
    skip_ws ();
    (match peek () with Some ')' -> next () | _ -> failwith "expected ')'");
    e
  | _ -> Num (parse_number ())

let parse input =
  s := input; pos := 0;
  parse_expr ()

let () =
  let tests = [
    "1 + 2 * 3",        (* = 7 *)
    "(1 + 2) * 3",      (* = 9 *)
    "10 / 2 - 3",       (* = 2 *)
    "2 * (3 + 4) / 2",  (* = 7 *)
  ] in
  List.iter (fun expr ->
    let result = eval (parse expr) in
    Printf.printf "%s = %g\n" expr result
  ) tests