500: String Compression (Run-Length Encoding)
Difficulty: 2 Level: Beginner-Intermediate Compress a string by replacing runs of identical characters with a count and character: `"aabbbcc"` โ `"2a3b2c"`.The Problem This Solves
Run-length encoding (RLE) is the simplest useful compression algorithm. Where data contains long runs of repeated values โ bitmap images, simple logs, test fixtures โ RLE can cut size dramatically with zero dependencies and O(n) time. Beyond compression, the encode/decode cycle is a clean exercise in iterator-driven state machines. Real-world uses include BMP pixel data, PCX image format, certain network protocols, and any domain where you want a human-readable compact form of repeated values. This example also shows functional-style accumulation with `fold` instead of imperative loops with manual index management.The Intuition
Walk the string one character at a time. Keep a "current character" and a running count. When the character changes, flush the current run to the output and start a new one. At the end, flush the last run. For decoding, just repeat each character by its count.How It Works in Rust
Encoding with `fold` โ treats the accumulator as `(current_char, count, output_vec)`:fn encode(s: &str) -> Vec<(usize, char)> {
let mut chars = s.chars();
let first = match chars.next() {
None => return vec![],
Some(c) => c,
};
let (cur, count, mut acc) =
chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
if c == cur {
(cur, count + 1, acc)
} else {
acc.push((count, cur));
(c, 1, acc)
}
});
acc.push((count, cur));
acc
}
The `fold` carries three pieces of state without any mutable variable in scope โ it threads state purely through the accumulator tuple.
Decoding with `fold` โ builds a `String` by repeating each char:
fn decode(pairs: &[(usize, char)]) -> String {
pairs.iter().fold(String::new(), |mut s, &(n, c)| {
for _ in 0..n { s.push(c); }
s
})
}
Formatting โ `map` + `collect` over the pairs:
fn show_encoded(pairs: &[(usize, char)]) -> String {
pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}
The roundtrip invariant `decode(encode(s)) == s` holds for all valid ASCII and UTF-8 strings.
What This Unlocks
- Functional state machines โ `fold` with a compound accumulator models step-by-step state transitions without mutable variables.
- Codec pattern โ encode/decode as inverse functions tested by roundtrip property.
- Iterator composition โ building output strings from iterators without intermediate allocations for each element.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Fold with tuple state | `List.fold_left` with tuple | `Iterator::fold` with tuple accumulator |
| Mutable output in fold | `Buffer.t` passed through | `mut` captured inside closure, or `Vec` in accumulator |
| Repeat a char n times | `String.make n c` | `for _ in 0..n { s.push(c) }` |
| Char comparison | `Char.(=)` | `==` on `char` |
//! # 500. String Compression โ Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.
/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
let mut chars = s.chars();
let first = match chars.next() {
None => return vec![],
Some(c) => c,
};
let (cur, count, mut acc) =
chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
if c == cur {
(cur, count + 1, acc)
} else {
acc.push((count, cur));
(c, 1, acc)
}
});
acc.push((count, cur));
acc
}
/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
pairs
.iter()
.fold(String::new(), |mut s, &(n, c)| {
for _ in 0..n {
s.push(c);
}
s
})
}
/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}
fn main() {
let tests = ["aabbbcc", "aaaa", "abcde", "", "aabbcc"];
for s in &tests {
let enc = encode(s);
let dec = decode(&enc);
println!("Input: {:?}", s);
println!("Encoded: {}", show_encoded(&enc));
println!("Decoded: {:?}", dec);
println!("Match: {}\n", s == &dec);
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_basic() {
assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
}
#[test]
fn test_encode_all_same() {
assert_eq!(encode("aaaa"), vec![(4, 'a')]);
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(encode("abcde"), vec![(1,'a'),(1,'b'),(1,'c'),(1,'d'),(1,'e')]);
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), vec![]);
}
#[test]
fn test_roundtrip() {
let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
for s in &inputs {
assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
}
}
}
(* Run-length encoding in OCaml *)
(* Encode: "aabbbcc" -> [(2,'a');(3,'b');(2,'c')] *)
let encode (s : string) : (int * char) list =
let chars = List.of_seq (String.to_seq s) in
match chars with
| [] -> []
| first :: rest ->
let (cur, count, acc) =
List.fold_left
(fun (cur, count, acc) c ->
if c = cur then (cur, count + 1, acc)
else (c, 1, (count, cur) :: acc))
(first, 1, [])
rest
in
List.rev ((count, cur) :: acc)
(* Decode: [(2,'a');(3,'b')] -> "aabbb" *)
let decode (pairs : (int * char) list) : string =
List.fold_left
(fun acc (n, c) -> acc ^ String.make n c)
""
pairs
(* Pretty print encoded form *)
let show_encoded pairs =
let parts = List.map (fun (n, c) -> Printf.sprintf "%d%c" n c) pairs in
String.concat "" parts
let () =
let tests = ["aabbbcc"; "aaaa"; "abcde"; ""; "aabbcc"] in
List.iter (fun s ->
let enc = encode s in
let dec = decode enc in
Printf.printf "Input: %S\n" s;
Printf.printf "Encoded: %s\n" (show_encoded enc);
Printf.printf "Decoded: %S\n" dec;
Printf.printf "Match: %b\n\n" (s = dec)
) tests