๐Ÿฆ€ Functional Rust

160: FlatMap Parser

Difficulty: โญโญโญ Level: Foundations `and_then(parser, f)` lets the next parser depend on the result of the previous one โ€” context-sensitive parsing without giving up composability.

The Problem This Solves

`map` transforms a parsed value, but always produces a fixed type using a pure function. What if what you need to parse next depends on what you just parsed? Classic example: a length-prefixed string in a protocol like `"3:abc"`. First you parse the number `3`. Then โ€” and this is the key โ€” you parse exactly three characters. You can't express "parse N characters" as a fixed parser, because N changes based on the input. You need to parse one thing, look at its value, and decide what to parse next. `and_then` (also known as `flatMap` or monadic bind) is exactly this. It says: "run this parser, pass its result to a function, and that function produces the next parser to run." The second parser isn't fixed โ€” it's computed from the first result.

The Intuition

You know `Option::and_then`:
let s = Some("42");
let n = s.and_then(|x| x.parse::<u32>().ok());
// n = Some(42) โ€” the second step depends on the first
Parser `and_then` is the same idea. The closure receives the parsed value and returns a new parser โ€” not a new value, a new parser. That new parser then runs on the remaining input. This is the difference between `map` and `and_then`: In functional programming terms, `and_then` is monadic bind (`>>=`). If `map` makes parsers functors, `and_then` makes them monads โ€” which means they can express any context-sensitive grammar.

How It Works in Rust

`and_then` โ€” monadic bind:
fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a  // f takes A and RETURNS a Parser, not just B
{
 Box::new(move |input: &'a str| {
     let (value, rest) = parser(input)?;  // run first parser
     (f(value))(rest)                     // f produces a new parser; run it on `rest`
 })
}
The type signature is the key: `F: Fn(A) -> Parser<'a, B>`. The function `f` returns a `Parser`, not just a `B`. Then `(f(value))(rest)` calls that returned parser on the remaining input. Length-prefixed strings โ€” a real use case:
fn length_prefixed<'a>() -> Parser<'a, &'a str> {
 and_then(parse_nat(), |n| {
     // `n` is the length we just parsed
     Box::new(move |input: &'a str| {
         if input.starts_with(':') {
             let rest = &input[1..];  // skip the ':'
             if rest.len() >= n {
                 Ok((&rest[..n], &rest[n..]))  // take exactly n bytes
             } else {
                 Err("Not enough characters".to_string())
             }
         } else {
             Err("Expected ':'".to_string())
         }
     })
 })
}
// parse_nat()  โ†’  and_then  โ†’  parse n chars after ':'
// "3:abcrest"  โ†’  n=3       โ†’  Ok(("abc", "rest"))
Notice: the inner closure captures `n` from the outer scope. The parser it creates knows how many characters to consume because it was created with that knowledge baked in. Conditional parsing โ€” choose the parser based on a tag:
fn conditional_parser<'a>() -> Parser<'a, String> {
 and_then(
     satisfy(|c| c == 'i' || c == 's', "type tag"),
     |tag_char| {
         if tag_char == 'i' {
             // 'i' means: parse digits
             map(many1(satisfy(|c| c.is_ascii_digit(), "digit")),
                 |chars| chars.into_iter().collect())
         } else {
             // 's' means: parse letters
             map(many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
                 |chars| chars.into_iter().collect())
         }
     },
 )
}
// "i42rest" โ†’ tag='i' โ†’ parse digits โ†’ Ok(("42", "rest"))
// "sabcREST" โ†’ tag='s' โ†’ parse letters โ†’ Ok(("abc", "REST"))

What This Unlocks

Key Differences

ConceptOCamlRust
Bind operator`p >>= fun x -> ...` (infix)`and_then(p, \x\...)` (function call)
Infix bind`let (>>=) = and_then` is idiomaticNot idiomatic in Rust; trait method preferred
Closure returns parserNatural โ€” functions return functions`F: Fn(A) -> Parser<'a, B> + 'a` explicit
vs. `map``>>=` is strictly more powerful`and_then` is strictly more powerful than `map`
Context sensitivityEasy โ€” functions close over anythingSame, with explicit lifetime bounds
// Example 160: FlatMap Parser
// flat_map / and_then: monadic chaining of parsers

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
where F: Fn(char) -> bool + 'a {
    let desc = desc.to_string();
    Box::new(move |input: &'a str| match input.chars().next() {
        Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
        _ => Err(format!("Expected {}", desc)),
    })
}

fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
    Box::new(move |input: &'a str| {
        let (first, mut rem) = parser(input)?;
        let mut results = vec![first];
        while let Ok((v, rest)) = parser(rem) { results.push(v); rem = rest; }
        Ok((results, rem))
    })
}

fn map<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> B + 'a {
    Box::new(move |input: &'a str| {
        let (v, rest) = parser(input)?;
        Ok((f(v), rest))
    })
}

// ============================================================
// Approach 1: and_then / bind โ€” monadic chaining
// ============================================================

fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
    Box::new(move |input: &'a str| {
        let (value, rest) = parser(input)?;
        (f(value))(rest)
    })
}

// ============================================================
// Approach 2: Context-sensitive โ€” length-prefixed string "3:abc"
// ============================================================

fn parse_nat<'a>() -> Parser<'a, usize> {
    map(
        many1(satisfy(|c| c.is_ascii_digit(), "digit")),
        |digits| digits.iter().fold(0usize, |acc, &d| acc * 10 + (d as usize - '0' as usize)),
    )
}

fn length_prefixed<'a>() -> Parser<'a, &'a str> {
    and_then(parse_nat(), |n| {
        Box::new(move |input: &'a str| {
            if input.starts_with(':') {
                let rest = &input[1..];
                if rest.len() >= n {
                    Ok((&rest[..n], &rest[n..]))
                } else {
                    Err("Not enough characters".to_string())
                }
            } else {
                Err("Expected ':'".to_string())
            }
        })
    })
}

// ============================================================
// Approach 3: Conditional parsing based on tag
// ============================================================

fn conditional_parser<'a>() -> Parser<'a, String> {
    and_then(
        satisfy(|c| c == 'i' || c == 's', "type tag"),
        |tag_char| {
            if tag_char == 'i' {
                map(
                    many1(satisfy(|c| c.is_ascii_digit(), "digit")),
                    |chars| chars.into_iter().collect(),
                )
            } else {
                map(
                    many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
                    |chars| chars.into_iter().collect(),
                )
            }
        },
    )
}

fn main() {
    println!("=== and_then basic ===");
    let p = and_then(
        satisfy(|c| c.is_ascii_digit(), "digit"),
        |_d| satisfy(|c| c.is_ascii_digit(), "digit"),
    );
    println!("{:?}", p("12rest")); // Ok(('2', "rest"))

    println!("\n=== length_prefixed ===");
    let p = length_prefixed();
    println!("{:?}", p("3:abcrest")); // Ok(("abc", "rest"))
    println!("{:?}", p("5:helloworld")); // Ok(("hello", "world"))

    println!("\n=== conditional ===");
    let p = conditional_parser();
    println!("{:?}", p("i42rest")); // Ok(("42", "rest"))
    println!("{:?}", p("sabcREST")); // Ok(("abc", "REST"))

    println!("\nโœ“ All examples completed");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_and_then_basic() {
        let p = and_then(
            satisfy(|c| c.is_ascii_digit(), "digit"),
            |d| map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| format!("{}{}", d, d2)),
        );
        assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
    }

    #[test]
    fn test_length_prefixed() {
        let p = length_prefixed();
        assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
    }

    #[test]
    fn test_length_prefixed_5() {
        let p = length_prefixed();
        assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
    }

    #[test]
    fn test_length_prefixed_too_short() {
        let p = length_prefixed();
        assert!(p("5:hi").is_err());
    }

    #[test]
    fn test_conditional_int() {
        let p = conditional_parser();
        assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
    }

    #[test]
    fn test_conditional_string() {
        let p = conditional_parser();
        assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
    }

    #[test]
    fn test_conditional_invalid_tag() {
        let p = conditional_parser();
        assert!(p("x123").is_err());
    }

    #[test]
    fn test_and_then_error_propagation() {
        let p = and_then(
            satisfy(|c| c.is_ascii_digit(), "digit"),
            |_| satisfy(|c| c.is_ascii_uppercase(), "upper"),
        );
        assert!(p("1a").is_err()); // second parser fails
    }
}
(* Example 160: FlatMap Parser *)
(* flat_map / and_then: monadic chaining of parsers *)

type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result

let satisfy pred desc : char parser = fun input ->
  if String.length input > 0 && pred input.[0] then
    Ok (input.[0], String.sub input 1 (String.length input - 1))
  else Error (Printf.sprintf "Expected %s" desc)

let tag expected : string parser = fun input ->
  let len = String.length expected in
  if String.length input >= len && String.sub input 0 len = expected then
    Ok (expected, String.sub input len (String.length input - len))
  else Error (Printf.sprintf "Expected \"%s\"" expected)

let many1 (p : 'a parser) : 'a list parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) ->
    let rec go acc rem = match p rem with
      | Ok (v, r) -> go (v :: acc) r
      | Error _ -> Ok (List.rev acc, rem)
    in match go [v] rest with
    | Ok (vs, r) -> Ok (vs, r)
    | Error e -> Error e

let map f p : 'b parser = fun input ->
  match p input with Ok (v, r) -> Ok (f v, r) | Error e -> Error e

(* Approach 1: and_then / bind โ€” monadic chaining *)
let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) -> (f v) rest

(* Infix operator for bind *)
let ( >>= ) = and_then

(* Approach 2: flat_map with context-sensitive parsing *)
(* Parse a length-prefixed string: "3:abc" *)
let parse_nat : int parser =
  map (fun chars ->
    List.fold_left (fun acc c -> acc * 10 + (Char.code c - Char.code '0')) 0 chars
  ) (many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))

let length_prefixed : string parser =
  parse_nat >>= fun n ->
  (satisfy (fun c -> c = ':') "colon") >>= fun _ ->
  (fun input ->
    if String.length input >= n then
      Ok (String.sub input 0 n, String.sub input n (String.length input - n))
    else Error "Not enough characters")

(* Approach 3: Conditional parsing based on first result *)
let conditional_parser : string parser = fun input ->
  let type_parser = satisfy (fun c -> c = 'i' || c = 's') "type tag" in
  (type_parser >>= fun tag ->
    if tag = 'i' then
      map (fun ds -> String.init (List.length ds) (List.nth ds))
        (many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))
    else
      map (fun ls -> String.init (List.length ls) (List.nth ls))
        (many1 (satisfy (fun c -> c >= 'a' && c <= 'z') "letter"))
  ) input

(* Tests *)
let () =
  (* and_then basic *)
  let p = and_then
    (satisfy (fun c -> c >= '0' && c <= '9') "digit")
    (fun d -> tag (String.make 1 d)) in
  assert (p "11rest" = Ok ("1", "rest"));

  (* length_prefixed *)
  assert (length_prefixed "3:abcrest" = Ok ("abc", "rest"));
  assert (length_prefixed "5:helloworld" = Ok ("hello", "world"));

  (* conditional *)
  assert (conditional_parser "i42rest" = Ok ("42", "rest"));
  assert (conditional_parser "sabcREST" = Ok ("abc", "REST"));

  print_endline "โœ“ All tests passed"

๐Ÿ“Š Detailed Comparison

Comparison: Example 160 โ€” FlatMap Parser

and_then / bind

OCaml:

๐Ÿช Show OCaml equivalent
let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
match p input with
| Error e -> Error e
| Ok (v, rest) -> (f v) rest

let ( >>= ) = and_then

Rust:

fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
 Box::new(move |input: &'a str| {
     let (value, rest) = parser(input)?;
     (f(value))(rest)
 })
}

Length-prefixed string

OCaml:

๐Ÿช Show OCaml equivalent
let length_prefixed : string parser =
parse_nat >>= fun n ->
(satisfy (fun c -> c = ':') "colon") >>= fun _ ->
(fun input ->
 if String.length input >= n then
   Ok (String.sub input 0 n, String.sub input n (String.length input - n))
 else Error "Not enough characters")

Rust:

fn length_prefixed<'a>() -> Parser<'a, &'a str> {
 and_then(parse_nat(), |n| {
     Box::new(move |input: &'a str| {
         if input.starts_with(':') {
             let rest = &input[1..];
             if rest.len() >= n {
                 Ok((&rest[..n], &rest[n..]))
             } else {
                 Err("Not enough characters".to_string())
             }
         } else {
             Err("Expected ':'".to_string())
         }
     })
 })
}