154: String Parser
Difficulty: โญโญโญ Level: Foundations `tag("hello")` matches an exact string literal โ the first multi-character parser, and a zero-copy one at that.The Problem This Solves
Character parsers work one rune at a time. But real grammars need to match keywords: `"true"`, `"false"`, `"null"`, `"fn"`, `"let"`. You could chain five `char_parser` calls, but that's clunky. You want a single combinator that says "match this whole word." Beyond keywords, string parsers are the first place where Rust's zero-copy model pays off. When you match `"hello"` in a string, you don't need to allocate a new `String` โ you can return a `&str` that points directly into the original input. For a parser that might match millions of tokens, this avoids millions of allocations.The Intuition
`tag("hello")` asks: "Does the input start with `hello`?" If yes, return a slice of the input that covers exactly those five bytes, and set the remaining input to start at byte 5. If no, report the mismatch. The key is `starts_with` โ a single O(n) check that handles all the character comparison at once. No loops, no intermediate strings. Case-insensitive matching (`tag_no_case`) is the same idea, but it compares lowercased versions. The returned slice is still from the original input โ you tell the caller "I matched something of the right length here" even if the casing differs.How It Works in Rust
Exact match with `tag`:fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string(); // own the string; closure will capture it
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
// Zero-copy: slice into the original input
let matched = &input[..expected_owned.len()]; // the matched portion
let remaining = &input[expected_owned.len()..]; // what comes after
Ok((matched, remaining))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}
The return type `&'a str` means "a slice borrowed from the input `'a`" โ no allocation. Both `matched` and `remaining` are views into the same original string.
Case-insensitive with `tag_no_case`:
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_lower = expected.to_lowercase(); // allocate once at parser creation
let len = expected.len();
Box::new(move |input: &'a str| {
if input.len() >= len && input[..len].to_lowercase() == expected_lower {
Ok((&input[..len], &input[len..])) // return original casing from input
} else {
Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
}
})
}
Note: `input[..len].to_lowercase()` allocates a temporary `String` for comparison, but the returned value is still a zero-copy slice.
Character-by-character matching (shows the composition):
fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
let expected = expected.to_string();
Box::new(move |input: &'a str| {
let mut remaining = input;
for expected_char in expected.chars() {
match remaining.chars().next() {
Some(c) if c == expected_char => {
remaining = &remaining[c.len_utf8()..]; // advance one char at a time
}
Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
None => return Err(format!("Expected '{}', got EOF", expected_char)),
}
}
Ok((expected.clone(), remaining))
})
}
This version returns an owned `String` (allocated) rather than a borrowed `&str`. Less efficient, but shows how `tag` could be built from char parsers.
Usage:
let p = tag("hello");
println!("{:?}", p("hello world")); // Ok(("hello", " world"))
println!("{:?}", p("world")); // Err("Expected \"hello\"")
println!("{:?}", p("hel")); // Err โ too short
let p = tag_no_case("Hello");
println!("{:?}", p("HELLO!")); // Ok(("HELLO", "!"))
println!("{:?}", p("HeLLo")); // Ok(("HeLLo", ""))
What This Unlocks
- Keyword and delimiter parsers โ match `"true"`, `"false"`, `"null"`, `"->"`, `"::"`, `"=>"` as single parser calls.
- Zero-cost abstractions โ `tag` returns `&str` slices, not `String` allocations; parsing a 1MB file doesn't mean 1MB of extra allocations.
- Building block for `choice` โ example 157 uses `choice(vec![tag("true"), tag("false"), tag("null")])` as a direct application.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Return type | `string` (always a new copy) | `&'a str` (zero-copy slice into input) |
| Prefix check | `String.sub input 0 len = expected` | `input.starts_with(expected)` |
| Case conversion | `String.lowercase_ascii s` | `s.to_lowercase()` (allocates) |
| Memory model | GC manages string copies | Explicit lifetimes; slicing borrows original |
| Char-by-char | Recursive with `String.make` | Iterative with `&remaining[c.len_utf8()..]` |
// Example 154: String Parser
// Parse exact string literals
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
// ============================================================
// Approach 1: Parse exact string (tag)
// ============================================================
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
let rest = &input[expected_owned.len()..];
Ok((&input[..expected_owned.len()], rest))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}
// ============================================================
// Approach 2: Case-insensitive string match
// ============================================================
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_lower = expected.to_lowercase();
let len = expected.len();
Box::new(move |input: &'a str| {
if input.len() >= len && input[..len].to_lowercase() == expected_lower {
Ok((&input[..len], &input[len..]))
} else {
Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
}
})
}
// ============================================================
// Approach 3: Build from character-by-character matching
// ============================================================
fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
let expected = expected.to_string();
Box::new(move |input: &'a str| {
let mut remaining = input;
for expected_char in expected.chars() {
match remaining.chars().next() {
Some(c) if c == expected_char => {
remaining = &remaining[c.len_utf8()..];
}
Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
None => return Err(format!("Expected '{}', got EOF", expected_char)),
}
}
Ok((expected.clone(), remaining))
})
}
fn main() {
println!("=== tag ===");
let p = tag("hello");
println!("{:?}", p("hello world")); // Ok(("hello", " world"))
println!("{:?}", p("world")); // Err(...)
println!("\n=== tag_no_case ===");
let p = tag_no_case("Hello");
println!("{:?}", p("HELLO world")); // Ok(("HELLO", " world"))
println!("{:?}", p("HeLLo!")); // Ok(("HeLLo", "!"))
println!("\n=== string_from_chars ===");
let p = string_from_chars("abc");
println!("{:?}", p("abcdef")); // Ok(("abc", "def"))
println!("\nโ All examples completed");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_tag_match() {
let p = tag("hello");
assert_eq!(p("hello world"), Ok(("hello", " world")));
}
#[test]
fn test_tag_exact() {
let p = tag("hello");
assert_eq!(p("hello"), Ok(("hello", "")));
}
#[test]
fn test_tag_no_match() {
let p = tag("hello");
assert!(p("world").is_err());
}
#[test]
fn test_tag_too_short() {
let p = tag("hello");
assert!(p("hel").is_err());
}
#[test]
fn test_tag_no_case_upper() {
let p = tag_no_case("Hello");
assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
}
#[test]
fn test_tag_no_case_mixed() {
let p = tag_no_case("hello");
assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
}
#[test]
fn test_string_from_chars() {
let p = string_from_chars("abc");
assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
}
#[test]
fn test_string_from_chars_fail() {
let p = string_from_chars("abc");
assert!(p("axc").is_err());
}
#[test]
fn test_tag_empty_string() {
let p = tag("");
assert_eq!(p("anything"), Ok(("", "anything")));
}
}
(* Example 154: String Parser *)
(* Parse exact string literals *)
type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result
(* Approach 1: Parse exact string *)
let tag (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len && String.sub input 0 len = expected then
Ok (expected, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\"" expected)
(* Approach 2: Case-insensitive string match *)
let tag_no_case (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len &&
String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
Ok (String.sub input 0 len, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)
(* Approach 3: Build string parser from char parsers *)
let char_parser (c : char) : char parser = fun input ->
if String.length input > 0 && input.[0] = c then
Ok (c, String.sub input 1 (String.length input - 1))
else
Error (Printf.sprintf "Expected '%c'" c)
let string_from_chars (s : string) : string parser = fun input ->
let len = String.length s in
let rec go i remaining =
if i >= len then Ok (s, remaining)
else
match char_parser s.[i] remaining with
| Ok (_, rest) -> go (i + 1) rest
| Error e -> Error e
in
go 0 input
(* Tests *)
let () =
assert (tag "hello" "hello world" = Ok ("hello", " world"));
assert (tag "hello" "hello" = Ok ("hello", ""));
assert (Result.is_error (tag "hello" "world"));
assert (Result.is_error (tag "hello" "hel"));
assert (tag_no_case "Hello" "HELLO world" = Ok ("HELLO", " world"));
assert (tag_no_case "hello" "HeLLo!" = Ok ("HeLLo", "!"));
assert (string_from_chars "abc" "abcdef" = Ok ("abc", "def"));
assert (Result.is_error (string_from_chars "abc" "axc"));
print_endline "โ All tests passed"
๐ Detailed Comparison
Comparison: Example 154 โ String Parser
tag (exact match)
OCaml:
๐ช Show OCaml equivalent
let tag (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len && String.sub input 0 len = expected then
Ok (expected, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\"" expected)
Rust:
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
let rest = &input[expected_owned.len()..];
Ok((&input[..expected_owned.len()], rest))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}tag_no_case
OCaml:
๐ช Show OCaml equivalent
let tag_no_case (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len &&
String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
Ok (String.sub input 0 len, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)
Rust:
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_lower = expected.to_lowercase();
let len = expected.len();
Box::new(move |input: &'a str| {
if input.len() >= len && input[..len].to_lowercase() == expected_lower {
Ok((&input[..len], &input[len..]))
} else {
Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
}
})
}