480 Fundamental

String Chars

Functional Programming

Tutorial

The Problem

Strings in memory are byte sequences, but humans think in characters. For ASCII, bytes and characters coincide; for Unicode text, they diverge. Iterating bytes and assuming each is a character corrupts emoji, accented letters, CJK characters, and anything outside ASCII. Rust's char type is a Unicode scalar value (U+0000 to U+10FFFF, excluding surrogates), and .chars() decodes UTF-8 on the fly, yielding the correct unit for character counting, filtering, reversal, and indexing.

🎯 Learning Outcomes

• Iterate characters with .chars() vs. iterating bytes with .bytes()

• Count characters correctly for non-ASCII text with .chars().count()

• Filter characters by predicate and collect back to String

• Reverse a string character-by-character with .chars().rev().collect()

• Access the Nth character with .chars().nth(n) (O(N), not O(1))

Code Example

#![allow(clippy::all)]
// 480. chars() and char-level operations

#[cfg(test)]
mod tests {
    #[test]
    fn test_count() {
        assert_eq!("café".chars().count(), 4);
        assert_eq!("café".len(), 5);
    }
    #[test]
    fn test_filter() {
        let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
        assert_eq!(s, "123");
    }
    #[test]
    fn test_rev() {
        let s: String = "abcde".chars().rev().collect();
        assert_eq!(s, "edcba");
    }
    #[test]
    fn test_nth() {
        assert_eq!("hello".chars().nth(1), Some('e'));
    }
}

(* 480. chars() – OCaml *)
let () =
  let s = "Hello, World! 🌍" in
  Printf.printf "byte_len=%d\n" (String.length s);
  String.iter (fun c -> Printf.printf "%c " c) (String.sub s 0 7); print_newline ();
  let upper = String.map Char.uppercase_ascii s in
  Printf.printf "%s\n" upper;
  let alpha = String.concat "" (
    String.to_seq s |> Seq.filter (fun c -> Char.code c < 128 && (c>='a'&&c<='z'||c>='A'&&c<='Z'))
    |> Seq.map (String.make 1) |> List.of_seq) in
  Printf.printf "alpha: %s\n" alpha

Key Differences

**char semantics**: Rust's char is a 4-byte Unicode scalar value; OCaml's char is a 1-byte value (0–255). True Unicode characters in OCaml require Uchar.t.

Standard Unicode support: Rust handles multibyte UTF-8 correctly via .chars() without any external crate; OCaml requires Uutf or similar.

Collect from chars: Rust's FromIterator<char> for String enables .chars().filter(...).collect::<String>(); OCaml requires String.of_seq (4.07+) which works on bytes, not Unicode scalars.

Reversal safety: chars().rev().collect() correctly reverses character by character; reversing bytes with OCaml's Bytes can corrupt multi-byte sequences.

OCaml Approach

OCaml 4.07+ provides String.to_seq which yields char values (single bytes — not Unicode scalars):

String.to_seq "hello" |> Seq.filter (fun c -> c >= '0' && c <= '9')
                       |> String.of_seq  (* standard lib 4.07+ *)

For true Unicode character iteration, the Uutf library is required:

Uutf.String.fold_utf_8 (fun acc _ d ->
  match d with `Uchar u -> u :: acc | _ -> acc) [] "café"

OCaml's char is a single byte; Uchar.t (from uchar package) is the Unicode scalar equivalent.

Full Source

#![allow(clippy::all)]
// 480. chars() and char-level operations

#[cfg(test)]
mod tests {
    #[test]
    fn test_count() {
        assert_eq!("café".chars().count(), 4);
        assert_eq!("café".len(), 5);
    }
    #[test]
    fn test_filter() {
        let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
        assert_eq!(s, "123");
    }
    #[test]
    fn test_rev() {
        let s: String = "abcde".chars().rev().collect();
        assert_eq!(s, "edcba");
    }
    #[test]
    fn test_nth() {
        assert_eq!("hello".chars().nth(1), Some('e'));
    }
}

(* 480. chars() – OCaml *)
let () =
  let s = "Hello, World! 🌍" in
  Printf.printf "byte_len=%d\n" (String.length s);
  String.iter (fun c -> Printf.printf "%c " c) (String.sub s 0 7); print_newline ();
  let upper = String.map Char.uppercase_ascii s in
  Printf.printf "%s\n" upper;
  let alpha = String.concat "" (
    String.to_seq s |> Seq.filter (fun c -> Char.code c < 128 && (c>='a'&&c<='z'||c>='A'&&c<='Z'))
    |> Seq.map (String.make 1) |> List.of_seq) in
  Printf.printf "alpha: %s\n" alpha

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    #[test]
    fn test_count() {
        assert_eq!("café".chars().count(), 4);
        assert_eq!("café".len(), 5);
    }
    #[test]
    fn test_filter() {
        let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
        assert_eq!(s, "123");
    }
    #[test]
    fn test_rev() {
        let s: String = "abcde".chars().rev().collect();
        assert_eq!(s, "edcba");
    }
    #[test]
    fn test_nth() {
        assert_eq!("hello".chars().nth(1), Some('e'));
    }
}

Exercises

Palindrome check: Write is_palindrome(s: &str) -> bool that compares the string to its character-reversed form, handling Unicode correctly.

Char frequency map: Build a HashMap<char, usize> counting character occurrences in a &str using .chars() and .entry().and_modify().or_insert().

Grapheme-aware reverse: Use the unicode-segmentation crate's graphemes iterator to correctly reverse "e\u{0301}nde" (e + combining accent + nde) and compare the result to .chars().rev().collect().

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust