๐Ÿฆ€ Functional Rust

154: String Parser

Difficulty: โญโญโญ Level: Foundations `tag("hello")` matches an exact string literal โ€” the first multi-character parser, and a zero-copy one at that.

The Problem This Solves

Character parsers work one rune at a time. But real grammars need to match keywords: `"true"`, `"false"`, `"null"`, `"fn"`, `"let"`. You could chain five `char_parser` calls, but that's clunky. You want a single combinator that says "match this whole word." Beyond keywords, string parsers are the first place where Rust's zero-copy model pays off. When you match `"hello"` in a string, you don't need to allocate a new `String` โ€” you can return a `&str` that points directly into the original input. For a parser that might match millions of tokens, this avoids millions of allocations.

The Intuition

`tag("hello")` asks: "Does the input start with `hello`?" If yes, return a slice of the input that covers exactly those five bytes, and set the remaining input to start at byte 5. If no, report the mismatch. The key is `starts_with` โ€” a single O(n) check that handles all the character comparison at once. No loops, no intermediate strings. Case-insensitive matching (`tag_no_case`) is the same idea, but it compares lowercased versions. The returned slice is still from the original input โ€” you tell the caller "I matched something of the right length here" even if the casing differs.

How It Works in Rust

Exact match with `tag`:
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
 let expected_owned = expected.to_string();  // own the string; closure will capture it
 Box::new(move |input: &'a str| {
     if input.starts_with(&expected_owned) {
         // Zero-copy: slice into the original input
         let matched   = &input[..expected_owned.len()];  // the matched portion
         let remaining = &input[expected_owned.len()..];  // what comes after
         Ok((matched, remaining))
     } else {
         Err(format!("Expected \"{}\"", expected_owned))
     }
 })
}
The return type `&'a str` means "a slice borrowed from the input `'a`" โ€” no allocation. Both `matched` and `remaining` are views into the same original string. Case-insensitive with `tag_no_case`:
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
 let expected_lower = expected.to_lowercase();  // allocate once at parser creation
 let len = expected.len();
 Box::new(move |input: &'a str| {
     if input.len() >= len && input[..len].to_lowercase() == expected_lower {
         Ok((&input[..len], &input[len..]))  // return original casing from input
     } else {
         Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
     }
 })
}
Note: `input[..len].to_lowercase()` allocates a temporary `String` for comparison, but the returned value is still a zero-copy slice. Character-by-character matching (shows the composition):
fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
 let expected = expected.to_string();
 Box::new(move |input: &'a str| {
     let mut remaining = input;
     for expected_char in expected.chars() {
         match remaining.chars().next() {
             Some(c) if c == expected_char => {
                 remaining = &remaining[c.len_utf8()..];  // advance one char at a time
             }
             Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
             None    => return Err(format!("Expected '{}', got EOF", expected_char)),
         }
     }
     Ok((expected.clone(), remaining))
 })
}
This version returns an owned `String` (allocated) rather than a borrowed `&str`. Less efficient, but shows how `tag` could be built from char parsers. Usage:
let p = tag("hello");
println!("{:?}", p("hello world")); // Ok(("hello", " world"))
println!("{:?}", p("world"));       // Err("Expected \"hello\"")
println!("{:?}", p("hel"));         // Err โ€” too short

let p = tag_no_case("Hello");
println!("{:?}", p("HELLO!"));  // Ok(("HELLO", "!"))
println!("{:?}", p("HeLLo"));   // Ok(("HeLLo", ""))

What This Unlocks

Key Differences

ConceptOCamlRust
Return type`string` (always a new copy)`&'a str` (zero-copy slice into input)
Prefix check`String.sub input 0 len = expected``input.starts_with(expected)`
Case conversion`String.lowercase_ascii s``s.to_lowercase()` (allocates)
Memory modelGC manages string copiesExplicit lifetimes; slicing borrows original
Char-by-charRecursive with `String.make`Iterative with `&remaining[c.len_utf8()..]`
// Example 154: String Parser
// Parse exact string literals

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

// ============================================================
// Approach 1: Parse exact string (tag)
// ============================================================

fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
    let expected_owned = expected.to_string();
    Box::new(move |input: &'a str| {
        if input.starts_with(&expected_owned) {
            let rest = &input[expected_owned.len()..];
            Ok((&input[..expected_owned.len()], rest))
        } else {
            Err(format!("Expected \"{}\"", expected_owned))
        }
    })
}

// ============================================================
// Approach 2: Case-insensitive string match
// ============================================================

fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
    let expected_lower = expected.to_lowercase();
    let len = expected.len();
    Box::new(move |input: &'a str| {
        if input.len() >= len && input[..len].to_lowercase() == expected_lower {
            Ok((&input[..len], &input[len..]))
        } else {
            Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
        }
    })
}

// ============================================================
// Approach 3: Build from character-by-character matching
// ============================================================

fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
    let expected = expected.to_string();
    Box::new(move |input: &'a str| {
        let mut remaining = input;
        for expected_char in expected.chars() {
            match remaining.chars().next() {
                Some(c) if c == expected_char => {
                    remaining = &remaining[c.len_utf8()..];
                }
                Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
                None => return Err(format!("Expected '{}', got EOF", expected_char)),
            }
        }
        Ok((expected.clone(), remaining))
    })
}

fn main() {
    println!("=== tag ===");
    let p = tag("hello");
    println!("{:?}", p("hello world")); // Ok(("hello", " world"))
    println!("{:?}", p("world"));       // Err(...)

    println!("\n=== tag_no_case ===");
    let p = tag_no_case("Hello");
    println!("{:?}", p("HELLO world")); // Ok(("HELLO", " world"))
    println!("{:?}", p("HeLLo!"));      // Ok(("HeLLo", "!"))

    println!("\n=== string_from_chars ===");
    let p = string_from_chars("abc");
    println!("{:?}", p("abcdef")); // Ok(("abc", "def"))

    println!("\nโœ“ All examples completed");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_tag_match() {
        let p = tag("hello");
        assert_eq!(p("hello world"), Ok(("hello", " world")));
    }

    #[test]
    fn test_tag_exact() {
        let p = tag("hello");
        assert_eq!(p("hello"), Ok(("hello", "")));
    }

    #[test]
    fn test_tag_no_match() {
        let p = tag("hello");
        assert!(p("world").is_err());
    }

    #[test]
    fn test_tag_too_short() {
        let p = tag("hello");
        assert!(p("hel").is_err());
    }

    #[test]
    fn test_tag_no_case_upper() {
        let p = tag_no_case("Hello");
        assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
    }

    #[test]
    fn test_tag_no_case_mixed() {
        let p = tag_no_case("hello");
        assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
    }

    #[test]
    fn test_string_from_chars() {
        let p = string_from_chars("abc");
        assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
    }

    #[test]
    fn test_string_from_chars_fail() {
        let p = string_from_chars("abc");
        assert!(p("axc").is_err());
    }

    #[test]
    fn test_tag_empty_string() {
        let p = tag("");
        assert_eq!(p("anything"), Ok(("", "anything")));
    }
}
(* Example 154: String Parser *)
(* Parse exact string literals *)

type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result

(* Approach 1: Parse exact string *)
let tag (expected : string) : string parser = fun input ->
  let len = String.length expected in
  if String.length input >= len && String.sub input 0 len = expected then
    Ok (expected, String.sub input len (String.length input - len))
  else
    Error (Printf.sprintf "Expected \"%s\"" expected)

(* Approach 2: Case-insensitive string match *)
let tag_no_case (expected : string) : string parser = fun input ->
  let len = String.length expected in
  if String.length input >= len &&
     String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
    Ok (String.sub input 0 len, String.sub input len (String.length input - len))
  else
    Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)

(* Approach 3: Build string parser from char parsers *)
let char_parser (c : char) : char parser = fun input ->
  if String.length input > 0 && input.[0] = c then
    Ok (c, String.sub input 1 (String.length input - 1))
  else
    Error (Printf.sprintf "Expected '%c'" c)

let string_from_chars (s : string) : string parser = fun input ->
  let len = String.length s in
  let rec go i remaining =
    if i >= len then Ok (s, remaining)
    else
      match char_parser s.[i] remaining with
      | Ok (_, rest) -> go (i + 1) rest
      | Error e -> Error e
  in
  go 0 input

(* Tests *)
let () =
  assert (tag "hello" "hello world" = Ok ("hello", " world"));
  assert (tag "hello" "hello" = Ok ("hello", ""));
  assert (Result.is_error (tag "hello" "world"));
  assert (Result.is_error (tag "hello" "hel"));

  assert (tag_no_case "Hello" "HELLO world" = Ok ("HELLO", " world"));
  assert (tag_no_case "hello" "HeLLo!" = Ok ("HeLLo", "!"));

  assert (string_from_chars "abc" "abcdef" = Ok ("abc", "def"));
  assert (Result.is_error (string_from_chars "abc" "axc"));

  print_endline "โœ“ All tests passed"

๐Ÿ“Š Detailed Comparison

Comparison: Example 154 โ€” String Parser

tag (exact match)

OCaml:

๐Ÿช Show OCaml equivalent
let tag (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len && String.sub input 0 len = expected then
 Ok (expected, String.sub input len (String.length input - len))
else
 Error (Printf.sprintf "Expected \"%s\"" expected)

Rust:

fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
 let expected_owned = expected.to_string();
 Box::new(move |input: &'a str| {
     if input.starts_with(&expected_owned) {
         let rest = &input[expected_owned.len()..];
         Ok((&input[..expected_owned.len()], rest))
     } else {
         Err(format!("Expected \"{}\"", expected_owned))
     }
 })
}

tag_no_case

OCaml:

๐Ÿช Show OCaml equivalent
let tag_no_case (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len &&
  String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
 Ok (String.sub input 0 len, String.sub input len (String.length input - len))
else
 Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)

Rust:

fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
 let expected_lower = expected.to_lowercase();
 let len = expected.len();
 Box::new(move |input: &'a str| {
     if input.len() >= len && input[..len].to_lowercase() == expected_lower {
         Ok((&input[..len], &input[len..]))
     } else {
         Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
     }
 })
}