415 Fundamental

415: Token Tree Munching

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "415: Token Tree Munching" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Complex macro DSLs need to parse arbitrary syntax that doesn't fit standard repetition patterns. Key difference from OCaml: 1. **Token level**: Rust TT munching operates on raw tokens; OCaml PPX operates on parsed AST nodes, making it more structured.

Tutorial

The Problem

Complex macro DSLs need to parse arbitrary syntax that doesn't fit standard repetition patterns. Token tree (TT) munching processes input one token tree at a time: one arm peels off the first $tt and processes it, recursing with the remainder. This enables parsing heterogeneous sequences, implementing mini-parsers within macros, and supporting complex field definition syntax like field: Type = default. TT munching is the technique behind the bitflags!, clap::arg!, and pest grammar macros.

Understanding TT munching unlocks the ability to write DSL macros that parse natural, human-readable syntax rather than forcing callers into rigid comma-separated patterns.

🎯 Learning Outcomes

• Understand the token tree munching technique: peel one $tt per recursive step

• Learn how @internal_name naming conventions mark internal macro arms

• See how define_config! parses field: Type = default, syntax incrementally

• Understand when TT munching is needed vs. simpler repetition patterns

• Learn the trade-off: TT munching is powerful but slower to compile than simple repetition

Code Example

// Parse: struct Name { field: Type = default, ... }
macro_rules! define_config {
    // Done: emit the struct
    (@fields $name:ident {} -> { $($fields:tt)* }) => {
        struct $name { $($fields)* }
    };

    // Munch one field
    (@fields $name:ident {
        $field:ident : $ty:ty = $default:expr,
        $($rest:tt)*
    } -> { $($fields:tt)* }) => {
        define_config!(@fields $name { $($rest)* } -> {
            $($fields)*
            $field: $ty,
        });
    };

    // Entry point
    (struct $name:ident { $($body:tt)* }) => {
        define_config!(@fields $name { $($body)* } -> {});
    };
}

define_config!(struct Config {
    port: u16 = 8080,
    debug: bool = false,
});

(* Using parser combinators *)
let rec parse_fields = function
  | [] -> []
  | Field(name, ty, default) :: rest ->
      (name, ty, default) :: parse_fields rest

Key Differences

Token level: Rust TT munching operates on raw tokens; OCaml PPX operates on parsed AST nodes, making it more structured.

Accumulation: Rust uses $($acc:tt)* accumulator arms; OCaml PPX uses mutable buffers or immutable list accumulation in OCaml code.

Compile time: TT munching macros can be slow to compile for large inputs due to recursive expansion; OCaml PPX runs as a separate program once.

Error messages: TT munching errors appear as "no rules matched" at the point of failure; OCaml PPX can emit custom error messages using Location.error.

OCaml Approach

OCaml's PPX approach to DSL parsing is more direct: it operates on the already-parsed OCaml AST. A PPX extension receives a Parsetree.structure (sequence of items) and transforms it. For custom syntax, OCaml uses Menhir parser generators or angstrom combinator parsers at runtime. True DSL parsing during OCaml compilation requires camlp5 or ppx extensions, which are more powerful than macro_rules! TT munching but require more infrastructure.

Full Source

#![allow(clippy::all)]
//! Token Tree Munching
//!
//! Parsing complex syntax by consuming token trees one at a time.

/// Parse a struct definition DSL with defaults.
#[macro_export]
macro_rules! define_config {
    // Done: emit the struct
    (@fields $name:ident {} -> { $($fields:tt)* } defaults: { $($defaults:tt)* }) => {
        #[derive(Debug, Clone)]
        pub struct $name {
            $($fields)*
        }

        impl Default for $name {
            fn default() -> Self {
                $name {
                    $($defaults)*
                }
            }
        }
    };

    // Munch one field: name: Type = default,
    (@fields $name:ident {
        $field:ident : $ty:ty = $default:expr,
        $($rest:tt)*
    } -> { $($fields:tt)* } defaults: { $($defaults:tt)* }) => {
        define_config!(@fields $name { $($rest)* } -> {
            $($fields)*
            pub $field: $ty,
        } defaults: {
            $($defaults)*
            $field: $default,
        });
    };

    // Entry point
    (struct $name:ident { $($body:tt)* }) => {
        define_config!(@fields $name { $($body)* } -> {} defaults: {});
    };
}

/// Simple calculator DSL.
#[macro_export]
macro_rules! calc {
    // Base: single literal
    ($n:literal) => { $n };

    // Addition
    ($a:literal + $($rest:tt)+) => {
        $a + calc!($($rest)+)
    };

    // Subtraction
    ($a:literal - $($rest:tt)+) => {
        $a - calc!($($rest)+)
    };

    // Multiplication (simple case)
    ($a:literal * $b:literal) => {
        $a * $b
    };
}

/// Parse key=value pairs into a HashMap.
#[macro_export]
macro_rules! parse_pairs {
    // Done
    (@acc $map:ident) => {};

    // Munch one pair
    (@acc $map:ident $key:ident = $val:expr; $($rest:tt)*) => {
        $map.insert(stringify!($key).to_string(), $val.to_string());
        parse_pairs!(@acc $map $($rest)*);
    };

    // Entry
    ($($key:ident = $val:expr;)*) => {{
        let mut map = ::std::collections::HashMap::new();
        parse_pairs!(@acc map $($key = $val;)*);
        map
    }};
}

/// Process command-like DSL.
#[macro_export]
macro_rules! commands {
    // Done
    (@acc $results:ident) => {};

    // set command
    (@acc $results:ident set $var:ident = $val:expr; $($rest:tt)*) => {
        $results.push(format!("SET {} = {}", stringify!($var), $val));
        commands!(@acc $results $($rest)*);
    };

    // get command
    (@acc $results:ident get $var:ident; $($rest:tt)*) => {
        $results.push(format!("GET {}", stringify!($var)));
        commands!(@acc $results $($rest)*);
    };

    // delete command
    (@acc $results:ident delete $var:ident; $($rest:tt)*) => {
        $results.push(format!("DELETE {}", stringify!($var)));
        commands!(@acc $results $($rest)*);
    };

    // Entry
    ($($cmd:tt)*) => {{
        let mut results = Vec::new();
        commands!(@acc results $($cmd)*);
        results
    }};
}

/// Count specific tokens.
#[macro_export]
macro_rules! count_tokens {
    // Done
    (@counting $count:expr,) => { $count };

    // Skip commas, count everything else
    (@counting $count:expr, , $($rest:tt)*) => {
        count_tokens!(@counting $count, $($rest)*)
    };

    (@counting $count:expr, $head:tt $($rest:tt)*) => {
        count_tokens!(@counting $count + 1, $($rest)*)
    };

    // Entry
    ($($tokens:tt)*) => {
        count_tokens!(@counting 0usize, $($tokens)*)
    };
}

define_config!(
    struct ServerConfig {
    host: String = "localhost".to_string(),
    port: u16 = 8080,
    timeout_secs: u32 = 30,
}
);

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_define_config() {
        let cfg = ServerConfig::default();
        assert_eq!(cfg.host, "localhost");
        assert_eq!(cfg.port, 8080);
        assert_eq!(cfg.timeout_secs, 30);
    }

    #[test]
    fn test_define_config_override() {
        let cfg = ServerConfig {
            port: 9090,
            ..Default::default()
        };
        assert_eq!(cfg.port, 9090);
        assert_eq!(cfg.host, "localhost");
    }

    #[test]
    fn test_calc_add() {
        assert_eq!(calc!(2 + 3), 5);
    }

    #[test]
    fn test_calc_sub() {
        assert_eq!(calc!(10 - 4), 6);
    }

    #[test]
    fn test_calc_mul() {
        assert_eq!(calc!(3 * 4), 12);
    }

    #[test]
    fn test_parse_pairs() {
        let pairs = parse_pairs! {
            name = "Alice";
            age = 30;
            city = "NYC";
        };
        assert_eq!(pairs["name"], "Alice");
        assert_eq!(pairs["age"], "30");
        assert_eq!(pairs["city"], "NYC");
    }

    #[test]
    fn test_commands() {
        let cmds = commands! {
            set x = 10;
            get y;
            delete z;
        };
        assert_eq!(cmds.len(), 3);
        assert!(cmds[0].contains("SET x"));
        assert!(cmds[1].contains("GET y"));
        assert!(cmds[2].contains("DELETE z"));
    }

    #[test]
    fn test_count_tokens() {
        assert_eq!(count_tokens!(a b c), 3);
        assert_eq!(count_tokens!(1 + 2 + 3), 5);
    }
}

(* Token tree munching concept in OCaml *)
(* Simulated via a parser combinator approach *)

type token = Int of int | Plus | Minus | Star | LParen | RParen | EOF

let tokenize s =
  let tokens = ref [] in
  String.iter (fun c -> match c with
    | '+' -> tokens := Plus :: !tokens
    | '-' -> tokens := Minus :: !tokens
    | '*' -> tokens := Star :: !tokens
    | '(' -> tokens := LParen :: !tokens
    | ')' -> tokens := RParen :: !tokens
    | ' ' -> ()
    | c when c >= '0' && c <= '9' ->
      tokens := Int (Char.code c - Char.code '0') :: !tokens
    | _ -> ()
  ) s;
  List.rev !tokens @ [EOF]

let () =
  let toks = tokenize "1 + 2 * 3" in
  Printf.printf "Tokens: %d\n" (List.length toks - 1)  (* -1 for EOF *)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_define_config() {
        let cfg = ServerConfig::default();
        assert_eq!(cfg.host, "localhost");
        assert_eq!(cfg.port, 8080);
        assert_eq!(cfg.timeout_secs, 30);
    }

    #[test]
    fn test_define_config_override() {
        let cfg = ServerConfig {
            port: 9090,
            ..Default::default()
        };
        assert_eq!(cfg.port, 9090);
        assert_eq!(cfg.host, "localhost");
    }

    #[test]
    fn test_calc_add() {
        assert_eq!(calc!(2 + 3), 5);
    }

    #[test]
    fn test_calc_sub() {
        assert_eq!(calc!(10 - 4), 6);
    }

    #[test]
    fn test_calc_mul() {
        assert_eq!(calc!(3 * 4), 12);
    }

    #[test]
    fn test_parse_pairs() {
        let pairs = parse_pairs! {
            name = "Alice";
            age = 30;
            city = "NYC";
        };
        assert_eq!(pairs["name"], "Alice");
        assert_eq!(pairs["age"], "30");
        assert_eq!(pairs["city"], "NYC");
    }

    #[test]
    fn test_commands() {
        let cmds = commands! {
            set x = 10;
            get y;
            delete z;
        };
        assert_eq!(cmds.len(), 3);
        assert!(cmds[0].contains("SET x"));
        assert!(cmds[1].contains("GET y"));
        assert!(cmds[2].contains("DELETE z"));
    }

    #[test]
    fn test_count_tokens() {
        assert_eq!(count_tokens!(a b c), 3);
        assert_eq!(count_tokens!(1 + 2 + 3), 5);
    }
}

Deep Comparison

OCaml vs Rust: Token Tree Munching

What is TT Munching?

Token Tree Munching (TTM) is a macro technique that:

Consumes input tokens one at a time

Accumulates results in an internal state

Recurses until input is exhausted

It's the macro equivalent of recursive descent parsing.

Example: Parsing a DSL

// Parse: struct Name { field: Type = default, ... }
macro_rules! define_config {
    // Done: emit the struct
    (@fields $name:ident {} -> { $($fields:tt)* }) => {
        struct $name { $($fields)* }
    };

    // Munch one field
    (@fields $name:ident {
        $field:ident : $ty:ty = $default:expr,
        $($rest:tt)*
    } -> { $($fields:tt)* }) => {
        define_config!(@fields $name { $($rest)* } -> {
            $($fields)*
            $field: $ty,
        });
    };

    // Entry point
    (struct $name:ident { $($body:tt)* }) => {
        define_config!(@fields $name { $($body)* } -> {});
    };
}

define_config!(struct Config {
    port: u16 = 8080,
    debug: bool = false,
});

The Pattern

macro_rules! muncher {
    // Base case: input exhausted
    (@internal $acc:tt) => { /* emit result */ };

    // Recursive case: consume one token, accumulate, recurse
    (@internal $acc:tt $head:tt $($tail:tt)*) => {
        muncher!(@internal (/* new acc with $head */) $($tail)*)
    };

    // Entry point
    ($($input:tt)*) => {
        muncher!(@internal () $($input)*)
    };
}

OCaml Equivalent

OCaml doesn't have macros, but parser combinators achieve similar goals:

(* Using parser combinators *)
let rec parse_fields = function
  | [] -> []
  | Field(name, ty, default) :: rest ->
      (name, ty, default) :: parse_fields rest

When to Use TTM

• DSL parsing: Config files, query languages

• Code generation: Structs with defaults

• Complex syntax: When simple repetition isn't enough

5 Takeaways

TTM is recursive descent for macros.

Consume tokens left-to-right, accumulate results.

**Use @internal prefix for helper rules.**

Keeps the public API clean.

**$($rest:tt)* captures remaining tokens.**

The "rest of input" to process recursively.

Accumulator pattern collects results.

Build up output as you consume input.

Complex parsing needs TTM.

Simple $(...)* repetition can't handle context-sensitive syntax.

Exercises

Enum with methods: Write define_enum!(Status { Active => "active", Inactive => "inactive" }) that generates an enum and a fn as_str(&self) -> &str method using TT munching to parse each variant-to-string mapping.

Builder DSL: Implement build_struct!(Point { x: f64 required, y: f64 required, label: String optional = "".to_string() }) where required fields must be set and optional fields have defaults.

State machine: Create state_machine!(start: Idle { on(Event::Start) => Running }, Running { on(Event::Stop) => Idle }) using TT munching to generate a state enum and transition method.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust