160: FlatMap Parser
Difficulty: โญโญโญ Level: Foundations `and_then(parser, f)` lets the next parser depend on the result of the previous one โ context-sensitive parsing without giving up composability.The Problem This Solves
`map` transforms a parsed value, but always produces a fixed type using a pure function. What if what you need to parse next depends on what you just parsed? Classic example: a length-prefixed string in a protocol like `"3:abc"`. First you parse the number `3`. Then โ and this is the key โ you parse exactly three characters. You can't express "parse N characters" as a fixed parser, because N changes based on the input. You need to parse one thing, look at its value, and decide what to parse next. `and_then` (also known as `flatMap` or monadic bind) is exactly this. It says: "run this parser, pass its result to a function, and that function produces the next parser to run." The second parser isn't fixed โ it's computed from the first result.The Intuition
You know `Option::and_then`:let s = Some("42");
let n = s.and_then(|x| x.parse::<u32>().ok());
// n = Some(42) โ the second step depends on the first
Parser `and_then` is the same idea. The closure receives the parsed value and returns a new parser โ not a new value, a new parser. That new parser then runs on the remaining input.
This is the difference between `map` and `and_then`:
- `map(p, f)`: run p, apply `f` to the result โ new value
- `and_then(p, f)`: run p, apply `f` to the result โ new parser, then run that parser
How It Works in Rust
`and_then` โ monadic bind:fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a // f takes A and RETURNS a Parser, not just B
{
Box::new(move |input: &'a str| {
let (value, rest) = parser(input)?; // run first parser
(f(value))(rest) // f produces a new parser; run it on `rest`
})
}
The type signature is the key: `F: Fn(A) -> Parser<'a, B>`. The function `f` returns a `Parser`, not just a `B`. Then `(f(value))(rest)` calls that returned parser on the remaining input.
Length-prefixed strings โ a real use case:
fn length_prefixed<'a>() -> Parser<'a, &'a str> {
and_then(parse_nat(), |n| {
// `n` is the length we just parsed
Box::new(move |input: &'a str| {
if input.starts_with(':') {
let rest = &input[1..]; // skip the ':'
if rest.len() >= n {
Ok((&rest[..n], &rest[n..])) // take exactly n bytes
} else {
Err("Not enough characters".to_string())
}
} else {
Err("Expected ':'".to_string())
}
})
})
}
// parse_nat() โ and_then โ parse n chars after ':'
// "3:abcrest" โ n=3 โ Ok(("abc", "rest"))
Notice: the inner closure captures `n` from the outer scope. The parser it creates knows how many characters to consume because it was created with that knowledge baked in.
Conditional parsing โ choose the parser based on a tag:
fn conditional_parser<'a>() -> Parser<'a, String> {
and_then(
satisfy(|c| c == 'i' || c == 's', "type tag"),
|tag_char| {
if tag_char == 'i' {
// 'i' means: parse digits
map(many1(satisfy(|c| c.is_ascii_digit(), "digit")),
|chars| chars.into_iter().collect())
} else {
// 's' means: parse letters
map(many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
|chars| chars.into_iter().collect())
}
},
)
}
// "i42rest" โ tag='i' โ parse digits โ Ok(("42", "rest"))
// "sabcREST" โ tag='s' โ parse letters โ Ok(("abc", "REST"))
What This Unlocks
- Context-sensitive grammars โ anything where "what comes next" depends on "what came before": length-prefixed data, tagged unions, variable-length records.
- Full grammar power โ `map` + `and_then` together give you the full expressiveness of monadic parsers. Any grammar that can be expressed programmatically can be expressed with these two.
- Protocol parsing โ binary protocol fields with length prefixes, type-tagged payloads, and similar patterns are exactly what `and_then` was made for.
Key Differences
| Concept | OCaml | Rust | ||
|---|---|---|---|---|
| Bind operator | `p >>= fun x -> ...` (infix) | `and_then(p, \ | x\ | ...)` (function call) |
| Infix bind | `let (>>=) = and_then` is idiomatic | Not idiomatic in Rust; trait method preferred | ||
| Closure returns parser | Natural โ functions return functions | `F: Fn(A) -> Parser<'a, B> + 'a` explicit | ||
| vs. `map` | `>>=` is strictly more powerful | `and_then` is strictly more powerful than `map` | ||
| Context sensitivity | Easy โ functions close over anything | Same, with explicit lifetime bounds |
// Example 160: FlatMap Parser
// flat_map / and_then: monadic chaining of parsers
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
where F: Fn(char) -> bool + 'a {
let desc = desc.to_string();
Box::new(move |input: &'a str| match input.chars().next() {
Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
_ => Err(format!("Expected {}", desc)),
})
}
fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |input: &'a str| {
let (first, mut rem) = parser(input)?;
let mut results = vec![first];
while let Ok((v, rest)) = parser(rem) { results.push(v); rem = rest; }
Ok((results, rem))
})
}
fn map<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> B + 'a {
Box::new(move |input: &'a str| {
let (v, rest) = parser(input)?;
Ok((f(v), rest))
})
}
// ============================================================
// Approach 1: and_then / bind โ monadic chaining
// ============================================================
fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
Box::new(move |input: &'a str| {
let (value, rest) = parser(input)?;
(f(value))(rest)
})
}
// ============================================================
// Approach 2: Context-sensitive โ length-prefixed string "3:abc"
// ============================================================
fn parse_nat<'a>() -> Parser<'a, usize> {
map(
many1(satisfy(|c| c.is_ascii_digit(), "digit")),
|digits| digits.iter().fold(0usize, |acc, &d| acc * 10 + (d as usize - '0' as usize)),
)
}
fn length_prefixed<'a>() -> Parser<'a, &'a str> {
and_then(parse_nat(), |n| {
Box::new(move |input: &'a str| {
if input.starts_with(':') {
let rest = &input[1..];
if rest.len() >= n {
Ok((&rest[..n], &rest[n..]))
} else {
Err("Not enough characters".to_string())
}
} else {
Err("Expected ':'".to_string())
}
})
})
}
// ============================================================
// Approach 3: Conditional parsing based on tag
// ============================================================
fn conditional_parser<'a>() -> Parser<'a, String> {
and_then(
satisfy(|c| c == 'i' || c == 's', "type tag"),
|tag_char| {
if tag_char == 'i' {
map(
many1(satisfy(|c| c.is_ascii_digit(), "digit")),
|chars| chars.into_iter().collect(),
)
} else {
map(
many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
|chars| chars.into_iter().collect(),
)
}
},
)
}
fn main() {
println!("=== and_then basic ===");
let p = and_then(
satisfy(|c| c.is_ascii_digit(), "digit"),
|_d| satisfy(|c| c.is_ascii_digit(), "digit"),
);
println!("{:?}", p("12rest")); // Ok(('2', "rest"))
println!("\n=== length_prefixed ===");
let p = length_prefixed();
println!("{:?}", p("3:abcrest")); // Ok(("abc", "rest"))
println!("{:?}", p("5:helloworld")); // Ok(("hello", "world"))
println!("\n=== conditional ===");
let p = conditional_parser();
println!("{:?}", p("i42rest")); // Ok(("42", "rest"))
println!("{:?}", p("sabcREST")); // Ok(("abc", "REST"))
println!("\nโ All examples completed");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_and_then_basic() {
let p = and_then(
satisfy(|c| c.is_ascii_digit(), "digit"),
|d| map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| format!("{}{}", d, d2)),
);
assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
}
#[test]
fn test_length_prefixed() {
let p = length_prefixed();
assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
}
#[test]
fn test_length_prefixed_5() {
let p = length_prefixed();
assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
}
#[test]
fn test_length_prefixed_too_short() {
let p = length_prefixed();
assert!(p("5:hi").is_err());
}
#[test]
fn test_conditional_int() {
let p = conditional_parser();
assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
}
#[test]
fn test_conditional_string() {
let p = conditional_parser();
assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
}
#[test]
fn test_conditional_invalid_tag() {
let p = conditional_parser();
assert!(p("x123").is_err());
}
#[test]
fn test_and_then_error_propagation() {
let p = and_then(
satisfy(|c| c.is_ascii_digit(), "digit"),
|_| satisfy(|c| c.is_ascii_uppercase(), "upper"),
);
assert!(p("1a").is_err()); // second parser fails
}
}
(* Example 160: FlatMap Parser *)
(* flat_map / and_then: monadic chaining of parsers *)
type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result
let satisfy pred desc : char parser = fun input ->
if String.length input > 0 && pred input.[0] then
Ok (input.[0], String.sub input 1 (String.length input - 1))
else Error (Printf.sprintf "Expected %s" desc)
let tag expected : string parser = fun input ->
let len = String.length expected in
if String.length input >= len && String.sub input 0 len = expected then
Ok (expected, String.sub input len (String.length input - len))
else Error (Printf.sprintf "Expected \"%s\"" expected)
let many1 (p : 'a parser) : 'a list parser = fun input ->
match p input with
| Error e -> Error e
| Ok (v, rest) ->
let rec go acc rem = match p rem with
| Ok (v, r) -> go (v :: acc) r
| Error _ -> Ok (List.rev acc, rem)
in match go [v] rest with
| Ok (vs, r) -> Ok (vs, r)
| Error e -> Error e
let map f p : 'b parser = fun input ->
match p input with Ok (v, r) -> Ok (f v, r) | Error e -> Error e
(* Approach 1: and_then / bind โ monadic chaining *)
let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
match p input with
| Error e -> Error e
| Ok (v, rest) -> (f v) rest
(* Infix operator for bind *)
let ( >>= ) = and_then
(* Approach 2: flat_map with context-sensitive parsing *)
(* Parse a length-prefixed string: "3:abc" *)
let parse_nat : int parser =
map (fun chars ->
List.fold_left (fun acc c -> acc * 10 + (Char.code c - Char.code '0')) 0 chars
) (many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))
let length_prefixed : string parser =
parse_nat >>= fun n ->
(satisfy (fun c -> c = ':') "colon") >>= fun _ ->
(fun input ->
if String.length input >= n then
Ok (String.sub input 0 n, String.sub input n (String.length input - n))
else Error "Not enough characters")
(* Approach 3: Conditional parsing based on first result *)
let conditional_parser : string parser = fun input ->
let type_parser = satisfy (fun c -> c = 'i' || c = 's') "type tag" in
(type_parser >>= fun tag ->
if tag = 'i' then
map (fun ds -> String.init (List.length ds) (List.nth ds))
(many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))
else
map (fun ls -> String.init (List.length ls) (List.nth ls))
(many1 (satisfy (fun c -> c >= 'a' && c <= 'z') "letter"))
) input
(* Tests *)
let () =
(* and_then basic *)
let p = and_then
(satisfy (fun c -> c >= '0' && c <= '9') "digit")
(fun d -> tag (String.make 1 d)) in
assert (p "11rest" = Ok ("1", "rest"));
(* length_prefixed *)
assert (length_prefixed "3:abcrest" = Ok ("abc", "rest"));
assert (length_prefixed "5:helloworld" = Ok ("hello", "world"));
(* conditional *)
assert (conditional_parser "i42rest" = Ok ("42", "rest"));
assert (conditional_parser "sabcREST" = Ok ("abc", "REST"));
print_endline "โ All tests passed"
๐ Detailed Comparison
Comparison: Example 160 โ FlatMap Parser
and_then / bind
OCaml:
๐ช Show OCaml equivalent
let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
match p input with
| Error e -> Error e
| Ok (v, rest) -> (f v) rest
let ( >>= ) = and_then
Rust:
fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
Box::new(move |input: &'a str| {
let (value, rest) = parser(input)?;
(f(value))(rest)
})
}Length-prefixed string
OCaml:
๐ช Show OCaml equivalent
let length_prefixed : string parser =
parse_nat >>= fun n ->
(satisfy (fun c -> c = ':') "colon") >>= fun _ ->
(fun input ->
if String.length input >= n then
Ok (String.sub input 0 n, String.sub input n (String.length input - n))
else Error "Not enough characters")
Rust:
fn length_prefixed<'a>() -> Parser<'a, &'a str> {
and_then(parse_nat(), |n| {
Box::new(move |input: &'a str| {
if input.starts_with(':') {
let rest = &input[1..];
if rest.len() >= n {
Ok((&rest[..n], &rest[n..]))
} else {
Err("Not enough characters".to_string())
}
} else {
Err("Expected ':'".to_string())
}
})
})
}