πŸ¦€ Functional Rust

152: Character Parsers

Difficulty: ⭐⭐⭐ Level: Foundations The atom of parsing: consume exactly one character from the input β€” specific, any, or from a set.

The Problem This Solves

Every parser, no matter how complex, eventually bottoms out at "read one character." JSON parsers read `{` and `}`. Number parsers read digits. Language parsers read letters. Before you can build anything bigger, you need these atomic parsers β€” the smallest possible parsers that do meaningful work. Single-character parsers also expose a real Rust concern: Unicode. A Rust `&str` is UTF-8, meaning a single "character" like `Γ©` takes 2 bytes, `€` takes 3, and emoji take 4. If you naively slice with `&input[1..]`, you'll panic on any non-ASCII character. The right way is `&input[c.len_utf8()..]` β€” advance by the actual byte count of the character you just consumed.

The Intuition

Think of a parser as a cursor reading a document. `char_parser('h')` checks: "Is the next character `h`? If yes, advance past it and return `h`. If no, stop and report what you found." `any_char` is the same, but it never complains β€” whatever is there, take it. `none_of` and `one_of` work like character class filters: `none_of(vec!['x', 'y', 'z'])` accepts any character except those three. This mirrors regex's `[^xyz]` and `[abc]` syntax, but as composable functions instead of pattern strings.

How It Works in Rust

Parse a specific character:
fn char_parser<'a>(expected: char) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {
     match input.chars().next() {
         // c.len_utf8() correctly handles multi-byte Unicode
         Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
         None    => Err(format!("Expected '{}', got EOF", expected)),
     }
 })
}
The `move` captures `expected` by value. `input.chars().next()` returns the first Unicode scalar value without copying. The `if c == expected` guard on the pattern match is Rust's "match guard" β€” pattern matching with an extra condition. Parse any single character:
fn any_char<'a>() -> Parser<'a, char> {
 Box::new(|input: &'a str| {
     match input.chars().next() {
         Some(c) => Ok((c, &input[c.len_utf8()..])),
         None    => Err("Expected any character, got EOF".to_string()),
     }
 })
}
No `move` needed here β€” there's nothing to capture. Parse a character NOT in a set:
fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {
     match input.chars().next() {
         Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Unexpected character '{}'", c)),
         None    => Err("Expected a character, got EOF".to_string()),
     }
 })
}
`chars` is moved into the closure (`move`). The `!chars.contains(&c)` check is `O(n)` β€” for large sets, consider `HashSet<char>`. Usage:
let p = char_parser('h');
println!("{:?}", p("hello")); // Ok(('h', "ello"))
println!("{:?}", p("world")); // Err("Expected 'h', got 'w'")

// Unicode works correctly
let p = char_parser('Γ©');
println!("{:?}", p("Γ©cole")); // Ok(('Γ©', "cole")) β€” advanced 2 bytes

What This Unlocks

Key Differences

ConceptOCamlRust
First character`input.[0]` (byte index)`input.chars().next()` (Unicode scalar)
Advance past char`String.sub input 1 (len-1)``&input[c.len_utf8()..]`
Set membership`List.mem ch chars``chars.contains(&c)`
Unicode safetyManual (byte-level strings)Built-in (UTF-8 guaranteed by type system)
Closure captureAutomatic`move` keyword required
// Example 152: Character Parsers
// Parse single characters: char_parser, any_char, none_of, one_of

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

// ============================================================
// Approach 1: Parse a specific character
// ============================================================

fn char_parser<'a>(expected: char) -> Parser<'a, char> {
    Box::new(move |input: &'a str| {
        match input.chars().next() {
            Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
            None => Err(format!("Expected '{}', got EOF", expected)),
        }
    })
}

// ============================================================
// Approach 2: Parse any character
// ============================================================

fn any_char<'a>() -> Parser<'a, char> {
    Box::new(|input: &'a str| {
        match input.chars().next() {
            Some(c) => Ok((c, &input[c.len_utf8()..])),
            None => Err("Expected any character, got EOF".to_string()),
        }
    })
}

// ============================================================
// Approach 3: Parse char NOT in set / IN set
// ============================================================

fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
    Box::new(move |input: &'a str| {
        match input.chars().next() {
            Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Unexpected character '{}'", c)),
            None => Err("Expected a character, got EOF".to_string()),
        }
    })
}

fn one_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
    Box::new(move |input: &'a str| {
        match input.chars().next() {
            Some(c) if chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Character '{}' not in allowed set", c)),
            None => Err("Expected a character, got EOF".to_string()),
        }
    })
}

fn main() {
    println!("=== char_parser ===");
    let p = char_parser('h');
    println!("{:?}", p("hello"));  // Ok(('h', "ello"))
    println!("{:?}", p("world"));  // Err(...)

    println!("\n=== any_char ===");
    let p = any_char();
    println!("{:?}", p("abc"));    // Ok(('a', "bc"))
    println!("{:?}", p(""));       // Err(...)

    println!("\n=== none_of ===");
    let p = none_of(vec!['x', 'y', 'z']);
    println!("{:?}", p("abc"));    // Ok(('a', "bc"))
    println!("{:?}", p("xyz"));    // Err(...)

    println!("\n=== one_of ===");
    let p = one_of(vec!['a', 'b', 'c']);
    println!("{:?}", p("beta"));   // Ok(('b', "eta"))

    println!("\nβœ“ All examples completed");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_char_parser_match() {
        let p = char_parser('a');
        assert_eq!(p("abc"), Ok(('a', "bc")));
    }

    #[test]
    fn test_char_parser_no_match() {
        let p = char_parser('a');
        assert!(p("xyz").is_err());
    }

    #[test]
    fn test_char_parser_empty() {
        let p = char_parser('a');
        assert!(p("").is_err());
    }

    #[test]
    fn test_any_char_success() {
        let p = any_char();
        assert_eq!(p("hello"), Ok(('h', "ello")));
    }

    #[test]
    fn test_any_char_single() {
        let p = any_char();
        assert_eq!(p("x"), Ok(('x', "")));
    }

    #[test]
    fn test_any_char_empty() {
        let p = any_char();
        assert!(p("").is_err());
    }

    #[test]
    fn test_none_of_allowed() {
        let p = none_of(vec!['x', 'y', 'z']);
        assert_eq!(p("abc"), Ok(('a', "bc")));
    }

    #[test]
    fn test_none_of_blocked() {
        let p = none_of(vec!['a', 'b']);
        assert!(p("abc").is_err());
    }

    #[test]
    fn test_one_of_match() {
        let p = one_of(vec!['a', 'b', 'c']);
        assert_eq!(p("beta"), Ok(('b', "eta")));
    }

    #[test]
    fn test_one_of_no_match() {
        let p = one_of(vec!['x', 'y']);
        assert!(p("abc").is_err());
    }

    #[test]
    fn test_unicode_char() {
        let p = char_parser('Γ©');
        assert_eq!(p("Γ©cole"), Ok(('Γ©', "cole")));
    }
}
(* Example 152: Character Parsers *)
(* Parse single characters: char_parser, any_char, none_of *)

type 'a parse_result = ('a * string, string) result

type 'a parser = string -> 'a parse_result

(* Helper to advance input by one character *)
let advance input =
  if String.length input > 0 then
    Some (input.[0], String.sub input 1 (String.length input - 1))
  else
    None

(* Approach 1: Parse a specific character *)
let char_parser (c : char) : char parser = fun input ->
  match advance input with
  | Some (ch, rest) when ch = c -> Ok (ch, rest)
  | Some (ch, _) -> Error (Printf.sprintf "Expected '%c', got '%c'" c ch)
  | None -> Error (Printf.sprintf "Expected '%c', got EOF" c)

(* Approach 2: Parse any character *)
let any_char : char parser = fun input ->
  match advance input with
  | Some (ch, rest) -> Ok (ch, rest)
  | None -> Error "Expected any character, got EOF"

(* Approach 3: Parse any character NOT in the given set *)
let none_of (chars : char list) : char parser = fun input ->
  match advance input with
  | Some (ch, rest) ->
    if List.mem ch chars then
      Error (Printf.sprintf "Unexpected character '%c'" ch)
    else
      Ok (ch, rest)
  | None -> Error "Expected a character, got EOF"

(* one_of: parse any character IN the given set *)
let one_of (chars : char list) : char parser = fun input ->
  match advance input with
  | Some (ch, rest) when List.mem ch chars -> Ok (ch, rest)
  | Some (ch, _) -> Error (Printf.sprintf "Character '%c' not in allowed set" ch)
  | None -> Error "Expected a character, got EOF"

(* Tests *)
let () =
  (* char_parser tests *)
  assert (char_parser 'a' "abc" = Ok ('a', "bc"));
  assert (Result.is_error (char_parser 'a' "xyz"));
  assert (Result.is_error (char_parser 'a' ""));

  (* any_char tests *)
  assert (any_char "hello" = Ok ('h', "ello"));
  assert (any_char "x" = Ok ('x', ""));
  assert (Result.is_error (any_char ""));

  (* none_of tests *)
  assert (none_of ['x'; 'y'; 'z'] "abc" = Ok ('a', "bc"));
  assert (Result.is_error (none_of ['a'; 'b'] "abc"));

  (* one_of tests *)
  assert (one_of ['a'; 'b'; 'c'] "beta" = Ok ('b', "eta"));
  assert (Result.is_error (one_of ['x'; 'y'] "abc"));

  print_endline "βœ“ All tests passed"

πŸ“Š Detailed Comparison

Comparison: Example 152 β€” Character Parsers

char_parser

OCaml:

πŸͺ Show OCaml equivalent
let char_parser (c : char) : char parser = fun input ->
match advance input with
| Some (ch, rest) when ch = c -> Ok (ch, rest)
| Some (ch, _) -> Error (Printf.sprintf "Expected '%c', got '%c'" c ch)
| None -> Error (Printf.sprintf "Expected '%c', got EOF" c)

Rust:

fn char_parser<'a>(expected: char) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {
     match input.chars().next() {
         Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
         None => Err(format!("Expected '{}', got EOF", expected)),
     }
 })
}

any_char

OCaml:

πŸͺ Show OCaml equivalent
let any_char : char parser = fun input ->
match advance input with
| Some (ch, rest) -> Ok (ch, rest)
| None -> Error "Expected any character, got EOF"

Rust:

fn any_char<'a>() -> Parser<'a, char> {
 Box::new(|input: &'a str| {
     match input.chars().next() {
         Some(c) => Ok((c, &input[c.len_utf8()..])),
         None => Err("Expected any character, got EOF".to_string()),
     }
 })
}

none_of

OCaml:

πŸͺ Show OCaml equivalent
let none_of (chars : char list) : char parser = fun input ->
match advance input with
| Some (ch, rest) ->
 if List.mem ch chars then Error (Printf.sprintf "Unexpected '%c'" ch)
 else Ok (ch, rest)
| None -> Error "Expected a character, got EOF"

Rust:

fn none_of<'a>(chars: Vec<char>) -> Parser<'a, char> {
 Box::new(move |input: &'a str| {
     match input.chars().next() {
         Some(c) if !chars.contains(&c) => Ok((c, &input[c.len_utf8()..])),
         Some(c) => Err(format!("Unexpected character '{}'", c)),
         None => Err("Expected a character, got EOF".to_string()),
     }
 })
}