481 Fundamental

String Bytes

Functional Programming

Tutorial

The Problem

Network protocols, file formats, and cryptographic functions operate on bytes, not characters. A Rust String is a validated UTF-8 Vec<u8>, but sometimes you need the raw bytes: serialising to a binary protocol, computing a checksum, or interfacing with a C library that returns *const u8. The reverse — constructing a String from bytes — requires validation because not all byte sequences are valid UTF-8. Rust makes this validation explicit with from_utf8 (strict) and from_utf8_lossy (replaces invalid bytes with U+FFFD).

🎯 Learning Outcomes

• Iterate raw bytes with .bytes() yielding u8 values

• Convert a Vec<u8> to String with String::from_utf8, which returns Result

• Validate a &[u8] slice as UTF-8 with std::str::from_utf8

• Use String::from_utf8_lossy to convert potentially invalid bytes with replacement characters

• Understand the relationship between &str, &[u8], String, and Vec<u8>

Code Example

#![allow(clippy::all)]
// 481. bytes() and byte-level operations

#[cfg(test)]
mod tests {
    #[test]
    fn test_bytes() {
        assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
    }
    #[test]
    fn test_from() {
        assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
    }
    #[test]
    fn test_invalid() {
        assert!(String::from_utf8(vec![0xFF]).is_err());
    }
    #[test]
    fn test_lossy() {
        let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
        assert!(s.contains('h'));
    }
}

(* 481. Byte-level strings – OCaml *)
let () =
  let s = "Hello, World!" in
  let b = Bytes.of_string s in
  Printf.printf "bytes: ";
  Bytes.iter (fun c -> Printf.printf "%02x " (Char.code c)) b;
  print_newline ();
  let csum = Bytes.fold_left (fun a c -> a + Char.code c) 0 b in
  Printf.printf "checksum=%d\n" csum;
  let low = Bytes.map (fun c ->
    if c >= 'A' && c <= 'Z' then Char.chr (Char.code c + 32) else c) b in
  Printf.printf "lower: %s\n" (Bytes.to_string low);
  let raw = [|72;101;108;108;111|] in
  let s2 = Bytes.init (Array.length raw) (fun i -> Char.chr raw.(i)) in
  Printf.printf "from bytes: %s\n" (Bytes.to_string s2)

Key Differences

Type-level guarantee: Rust's String/&str guarantee UTF-8 validity; OCaml's string is unchecked bytes.

Explicit conversion: Rust requires from_utf8 (returning Result) to go from bytes to string; OCaml's Bytes.to_string is unconditional.

**from_utf8_lossy**: Rust provides a built-in lossy decoder that replaces invalid bytes; OCaml needs Uutf or manual implementation.

**&str as &[u8]**: Rust's str::as_bytes() gives a &[u8] view with no copy; OCaml's String.to_bytes allocates a new Bytes.t.

OCaml Approach

OCaml's Bytes.t is a mutable byte sequence; string is an immutable byte sequence. There is no UTF-8 validation in the standard library:

(* Bytes to string — unsafe in OCaml, no validation *)
let bytes = Bytes.of_string "hi"
let s = Bytes.to_string bytes

(* For UTF-8 validation, use uutf *)
let is_valid_utf8 s =
  Uutf.String.fold_utf_8 (fun ok _ d ->
    ok && d <> `Malformed) true s

OCaml makes no UTF-8 guarantees at the string type level — it is the programmer's responsibility.

Full Source

#![allow(clippy::all)]
// 481. bytes() and byte-level operations

#[cfg(test)]
mod tests {
    #[test]
    fn test_bytes() {
        assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
    }
    #[test]
    fn test_from() {
        assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
    }
    #[test]
    fn test_invalid() {
        assert!(String::from_utf8(vec![0xFF]).is_err());
    }
    #[test]
    fn test_lossy() {
        let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
        assert!(s.contains('h'));
    }
}

(* 481. Byte-level strings – OCaml *)
let () =
  let s = "Hello, World!" in
  let b = Bytes.of_string s in
  Printf.printf "bytes: ";
  Bytes.iter (fun c -> Printf.printf "%02x " (Char.code c)) b;
  print_newline ();
  let csum = Bytes.fold_left (fun a c -> a + Char.code c) 0 b in
  Printf.printf "checksum=%d\n" csum;
  let low = Bytes.map (fun c ->
    if c >= 'A' && c <= 'Z' then Char.chr (Char.code c + 32) else c) b in
  Printf.printf "lower: %s\n" (Bytes.to_string low);
  let raw = [|72;101;108;108;111|] in
  let s2 = Bytes.init (Array.length raw) (fun i -> Char.chr raw.(i)) in
  Printf.printf "from bytes: %s\n" (Bytes.to_string s2)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    #[test]
    fn test_bytes() {
        assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
    }
    #[test]
    fn test_from() {
        assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
    }
    #[test]
    fn test_invalid() {
        assert!(String::from_utf8(vec![0xFF]).is_err());
    }
    #[test]
    fn test_lossy() {
        let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
        assert!(s.contains('h'));
    }
}

Exercises

Hex encode: Write to_hex(s: &str) -> String that formats each byte as two lowercase hex digits using .bytes() and format!("{:02x}", b).

UTF-8 validator: Implement is_valid_utf8(bytes: &[u8]) -> bool using std::str::from_utf8 and write tests for valid ASCII, valid multibyte sequences, and truncated multibyte sequences.

Null-terminated bytes: Write to_c_str(s: &str) -> Vec<u8> that appends a null byte — handling any embedded nulls as an error — to produce a C-compatible byte string.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust