๐Ÿฆ€ Functional Rust

500: String Compression (Run-Length Encoding)

Difficulty: 2 Level: Beginner-Intermediate Compress a string by replacing runs of identical characters with a count and character: `"aabbbcc"` โ†’ `"2a3b2c"`.

The Problem This Solves

Run-length encoding (RLE) is the simplest useful compression algorithm. Where data contains long runs of repeated values โ€” bitmap images, simple logs, test fixtures โ€” RLE can cut size dramatically with zero dependencies and O(n) time. Beyond compression, the encode/decode cycle is a clean exercise in iterator-driven state machines. Real-world uses include BMP pixel data, PCX image format, certain network protocols, and any domain where you want a human-readable compact form of repeated values. This example also shows functional-style accumulation with `fold` instead of imperative loops with manual index management.

The Intuition

Walk the string one character at a time. Keep a "current character" and a running count. When the character changes, flush the current run to the output and start a new one. At the end, flush the last run. For decoding, just repeat each character by its count.

How It Works in Rust

Encoding with `fold` โ€” treats the accumulator as `(current_char, count, output_vec)`:
fn encode(s: &str) -> Vec<(usize, char)> {
 let mut chars = s.chars();
 let first = match chars.next() {
     None => return vec![],
     Some(c) => c,
 };
 let (cur, count, mut acc) =
     chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
         if c == cur {
             (cur, count + 1, acc)
         } else {
             acc.push((count, cur));
             (c, 1, acc)
         }
     });
 acc.push((count, cur));
 acc
}
The `fold` carries three pieces of state without any mutable variable in scope โ€” it threads state purely through the accumulator tuple. Decoding with `fold` โ€” builds a `String` by repeating each char:
fn decode(pairs: &[(usize, char)]) -> String {
 pairs.iter().fold(String::new(), |mut s, &(n, c)| {
     for _ in 0..n { s.push(c); }
     s
 })
}
Formatting โ€” `map` + `collect` over the pairs:
fn show_encoded(pairs: &[(usize, char)]) -> String {
 pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}
The roundtrip invariant `decode(encode(s)) == s` holds for all valid ASCII and UTF-8 strings.

What This Unlocks

Key Differences

ConceptOCamlRust
Fold with tuple state`List.fold_left` with tuple`Iterator::fold` with tuple accumulator
Mutable output in fold`Buffer.t` passed through`mut` captured inside closure, or `Vec` in accumulator
Repeat a char n times`String.make n c``for _ in 0..n { s.push(c) }`
Char comparison`Char.(=)``==` on `char`
//! # 500. String Compression โ€” Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.

/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
    let mut chars = s.chars();
    let first = match chars.next() {
        None => return vec![],
        Some(c) => c,
    };
    let (cur, count, mut acc) =
        chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
            if c == cur {
                (cur, count + 1, acc)
            } else {
                acc.push((count, cur));
                (c, 1, acc)
            }
        });
    acc.push((count, cur));
    acc
}

/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
    pairs
        .iter()
        .fold(String::new(), |mut s, &(n, c)| {
            for _ in 0..n {
                s.push(c);
            }
            s
        })
}

/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
    pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}

fn main() {
    let tests = ["aabbbcc", "aaaa", "abcde", "", "aabbcc"];
    for s in &tests {
        let enc = encode(s);
        let dec = decode(&enc);
        println!("Input:   {:?}", s);
        println!("Encoded: {}", show_encoded(&enc));
        println!("Decoded: {:?}", dec);
        println!("Match:   {}\n", s == &dec);
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode_basic() {
        assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
    }

    #[test]
    fn test_encode_all_same() {
        assert_eq!(encode("aaaa"), vec![(4, 'a')]);
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(encode("abcde"), vec![(1,'a'),(1,'b'),(1,'c'),(1,'d'),(1,'e')]);
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), vec![]);
    }

    #[test]
    fn test_roundtrip() {
        let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
        for s in &inputs {
            assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
        }
    }
}
(* Run-length encoding in OCaml *)

(* Encode: "aabbbcc" -> [(2,'a');(3,'b');(2,'c')] *)
let encode (s : string) : (int * char) list =
  let chars = List.of_seq (String.to_seq s) in
  match chars with
  | [] -> []
  | first :: rest ->
    let (cur, count, acc) =
      List.fold_left
        (fun (cur, count, acc) c ->
          if c = cur then (cur, count + 1, acc)
          else (c, 1, (count, cur) :: acc))
        (first, 1, [])
        rest
    in
    List.rev ((count, cur) :: acc)

(* Decode: [(2,'a');(3,'b')] -> "aabbb" *)
let decode (pairs : (int * char) list) : string =
  List.fold_left
    (fun acc (n, c) -> acc ^ String.make n c)
    ""
    pairs

(* Pretty print encoded form *)
let show_encoded pairs =
  let parts = List.map (fun (n, c) -> Printf.sprintf "%d%c" n c) pairs in
  String.concat "" parts

let () =
  let tests = ["aabbbcc"; "aaaa"; "abcde"; ""; "aabbcc"] in
  List.iter (fun s ->
    let enc = encode s in
    let dec = decode enc in
    Printf.printf "Input:   %S\n" s;
    Printf.printf "Encoded: %s\n" (show_encoded enc);
    Printf.printf "Decoded: %S\n" dec;
    Printf.printf "Match:   %b\n\n" (s = dec)
  ) tests