152: Character Parsers
Difficulty: βββ Level: Foundations The atom of parsing: consume exactly one character from the input β specific, any, or from a set.The Problem This Solves
Every parser, no matter how complex, eventually bottoms out at "read one character." JSON parsers read `{` and `}`. Number parsers read digits. Language parsers read letters. Before you can build anything bigger, you need these atomic parsers β the smallest possible parsers that do meaningful work. Single-character parsers also expose a real Rust concern: Unicode. A Rust `&str` is UTF-8, meaning a single "character" like `Γ©` takes 2 bytes, `β¬` takes 3, and emoji take 4. If you naively slice with `&input[1..]`, you'll panic on any non-ASCII character. The right way is `&input[c.len_utf8()..]` β advance by the actual byte count of the character you just consumed.The Intuition
Think of a parser as a cursor reading a document. `char_parser('h')` checks: "Is the next character `h`? If yes, advance past it and return `h`. If no, stop and report what you found." `any_char` is the same, but it never complains β whatever is there, take it. `none_of` and `one_of` work like character class filters: `none_of(vec!['x', 'y', 'z'])` accepts any character except those three. This mirrors regex's `[^xyz]` and `[abc]` syntax, but as composable functions instead of pattern strings.How It Works in Rust
Parse a specific character:fn char_parser<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
// c.len_utf8() correctly handles multi-byte Unicode
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got EOF", expected)),
}
})
}
The `move` captures `expected` by value. `input.chars().next()` returns the first Unicode scalar value without copying. The `if c == expected` guard on the pattern match is Rust's "match guard" β pattern matching with an extra condition.
Parse any single character:
fn any_char<'a>() -> Parser<'a, char> {
Box::new(|input: &'a str| {
match input.chars().next() {
Some(c) => Ok((c, &input[c.len_utf8()..])),
None => Err("Expected any character, got EOF".to_string()),
}
})
}
No `move` needed here β there's nothing to capture.
Parse a character NOT in a set:
fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Unexpected character '{}'", c)),
None => Err("Expected a character, got EOF".to_string()),
}
})
}
`chars` is moved into the closure (`move`). The `!chars.contains(&c)` check is `O(n)` β for large sets, consider `HashSet<char>`.
Usage:
let p = char_parser('h');
println!("{:?}", p("hello")); // Ok(('h', "ello"))
println!("{:?}", p("world")); // Err("Expected 'h', got 'w'")
// Unicode works correctly
let p = char_parser('Γ©');
println!("{:?}", p("Γ©cole")); // Ok(('Γ©', "cole")) β advanced 2 bytes
What This Unlocks
- Foundation for all other parsers β every combinator in examples 153β162 is built on top of functions exactly like these.
- Safe Unicode handling β `c.len_utf8()` means your parser works on any valid UTF-8 input without panics.
- Character class parsing β `one_of` and `none_of` let you express "match any vowel" or "match anything except a quote" without regex.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| First character | `input.[0]` (byte index) | `input.chars().next()` (Unicode scalar) |
| Advance past char | `String.sub input 1 (len-1)` | `&input[c.len_utf8()..]` |
| Set membership | `List.mem ch chars` | `chars.contains(&c)` |
| Unicode safety | Manual (byte-level strings) | Built-in (UTF-8 guaranteed by type system) |
| Closure capture | Automatic | `move` keyword required |
// Example 152: Character Parsers
// Parse single characters: char_parser, any_char, none_of, one_of
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
// ============================================================
// Approach 1: Parse a specific character
// ============================================================
fn char_parser<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got EOF", expected)),
}
})
}
// ============================================================
// Approach 2: Parse any character
// ============================================================
fn any_char<'a>() -> Parser<'a, char> {
Box::new(|input: &'a str| {
match input.chars().next() {
Some(c) => Ok((c, &input[c.len_utf8()..])),
None => Err("Expected any character, got EOF".to_string()),
}
})
}
// ============================================================
// Approach 3: Parse char NOT in set / IN set
// ============================================================
fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Unexpected character '{}'", c)),
None => Err("Expected a character, got EOF".to_string()),
}
})
}
fn one_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Character '{}' not in allowed set", c)),
None => Err("Expected a character, got EOF".to_string()),
}
})
}
fn main() {
println!("=== char_parser ===");
let p = char_parser('h');
println!("{:?}", p("hello")); // Ok(('h', "ello"))
println!("{:?}", p("world")); // Err(...)
println!("\n=== any_char ===");
let p = any_char();
println!("{:?}", p("abc")); // Ok(('a', "bc"))
println!("{:?}", p("")); // Err(...)
println!("\n=== none_of ===");
let p = none_of(vec!['x', 'y', 'z']);
println!("{:?}", p("abc")); // Ok(('a', "bc"))
println!("{:?}", p("xyz")); // Err(...)
println!("\n=== one_of ===");
let p = one_of(vec!['a', 'b', 'c']);
println!("{:?}", p("beta")); // Ok(('b', "eta"))
println!("\nβ All examples completed");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_char_parser_match() {
let p = char_parser('a');
assert_eq!(p("abc"), Ok(('a', "bc")));
}
#[test]
fn test_char_parser_no_match() {
let p = char_parser('a');
assert!(p("xyz").is_err());
}
#[test]
fn test_char_parser_empty() {
let p = char_parser('a');
assert!(p("").is_err());
}
#[test]
fn test_any_char_success() {
let p = any_char();
assert_eq!(p("hello"), Ok(('h', "ello")));
}
#[test]
fn test_any_char_single() {
let p = any_char();
assert_eq!(p("x"), Ok(('x', "")));
}
#[test]
fn test_any_char_empty() {
let p = any_char();
assert!(p("").is_err());
}
#[test]
fn test_none_of_allowed() {
let p = none_of(vec!['x', 'y', 'z']);
assert_eq!(p("abc"), Ok(('a', "bc")));
}
#[test]
fn test_none_of_blocked() {
let p = none_of(vec!['a', 'b']);
assert!(p("abc").is_err());
}
#[test]
fn test_one_of_match() {
let p = one_of(vec!['a', 'b', 'c']);
assert_eq!(p("beta"), Ok(('b', "eta")));
}
#[test]
fn test_one_of_no_match() {
let p = one_of(vec!['x', 'y']);
assert!(p("abc").is_err());
}
#[test]
fn test_unicode_char() {
let p = char_parser('Γ©');
assert_eq!(p("Γ©cole"), Ok(('Γ©', "cole")));
}
}
(* Example 152: Character Parsers *)
(* Parse single characters: char_parser, any_char, none_of *)
type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result
(* Helper to advance input by one character *)
let advance input =
if String.length input > 0 then
Some (input.[0], String.sub input 1 (String.length input - 1))
else
None
(* Approach 1: Parse a specific character *)
let char_parser (c : char) : char parser = fun input ->
match advance input with
| Some (ch, rest) when ch = c -> Ok (ch, rest)
| Some (ch, _) -> Error (Printf.sprintf "Expected '%c', got '%c'" c ch)
| None -> Error (Printf.sprintf "Expected '%c', got EOF" c)
(* Approach 2: Parse any character *)
let any_char : char parser = fun input ->
match advance input with
| Some (ch, rest) -> Ok (ch, rest)
| None -> Error "Expected any character, got EOF"
(* Approach 3: Parse any character NOT in the given set *)
let none_of (chars : char list) : char parser = fun input ->
match advance input with
| Some (ch, rest) ->
if List.mem ch chars then
Error (Printf.sprintf "Unexpected character '%c'" ch)
else
Ok (ch, rest)
| None -> Error "Expected a character, got EOF"
(* one_of: parse any character IN the given set *)
let one_of (chars : char list) : char parser = fun input ->
match advance input with
| Some (ch, rest) when List.mem ch chars -> Ok (ch, rest)
| Some (ch, _) -> Error (Printf.sprintf "Character '%c' not in allowed set" ch)
| None -> Error "Expected a character, got EOF"
(* Tests *)
let () =
(* char_parser tests *)
assert (char_parser 'a' "abc" = Ok ('a', "bc"));
assert (Result.is_error (char_parser 'a' "xyz"));
assert (Result.is_error (char_parser 'a' ""));
(* any_char tests *)
assert (any_char "hello" = Ok ('h', "ello"));
assert (any_char "x" = Ok ('x', ""));
assert (Result.is_error (any_char ""));
(* none_of tests *)
assert (none_of ['x'; 'y'; 'z'] "abc" = Ok ('a', "bc"));
assert (Result.is_error (none_of ['a'; 'b'] "abc"));
(* one_of tests *)
assert (one_of ['a'; 'b'; 'c'] "beta" = Ok ('b', "eta"));
assert (Result.is_error (one_of ['x'; 'y'] "abc"));
print_endline "β All tests passed"
π Detailed Comparison
Comparison: Example 152 β Character Parsers
char_parser
OCaml:
πͺ Show OCaml equivalent
let char_parser (c : char) : char parser = fun input ->
match advance input with
| Some (ch, rest) when ch = c -> Ok (ch, rest)
| Some (ch, _) -> Error (Printf.sprintf "Expected '%c', got '%c'" c ch)
| None -> Error (Printf.sprintf "Expected '%c', got EOF" c)
Rust:
fn char_parser<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got EOF", expected)),
}
})
}any_char
OCaml:
πͺ Show OCaml equivalent
let any_char : char parser = fun input ->
match advance input with
| Some (ch, rest) -> Ok (ch, rest)
| None -> Error "Expected any character, got EOF"
Rust:
fn any_char<'a>() -> Parser<'a, char> {
Box::new(|input: &'a str| {
match input.chars().next() {
Some(c) => Ok((c, &input[c.len_utf8()..])),
None => Err("Expected any character, got EOF".to_string()),
}
})
}none_of
OCaml:
πͺ Show OCaml equivalent
let none_of (chars : char list) : char parser = fun input ->
match advance input with
| Some (ch, rest) ->
if List.mem ch chars then Error (Printf.sprintf "Unexpected '%c'" ch)
else Ok (ch, rest)
| None -> Error "Expected a character, got EOF"
Rust:
fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Unexpected character '{}'", c)),
None => Err("Expected a character, got EOF".to_string()),
}
})
}