ExamplesBy LevelBy TopicLearning Paths
101 Advanced

Huffman Encoding

TreesGreedy Algorithms

Tutorial

The Problem

Build an optimal prefix-free binary code for a set of characters with given frequencies using Huffman's greedy algorithm: repeatedly merge the two lowest-frequency nodes until a single tree remains, then assign "0"/"1" on left/right branches to derive each character's code.

🎯 Learning Outcomes

  • • How to turn an OCaml algebraic type (type htree = Leaf | Node) into an owned
  • Rust enum with Box<HTree> children

  • • Using BinaryHeap<Reverse<W>> for a min-heap without unsafe or external crates
  • • Why a custom Ord wrapper (FreqOrd) keeps ordering logic separate from the
  • data type

  • • How OCaml's list-sort loop maps to a sorted Vec with remove(0) in Rust,
  • and why the BinaryHeap version is the idiomatic upgrade

    🦀 The Rust Way

    The idiomatic Rust solution wraps each HTree in a FreqOrd newtype that implements Ord by frequency, then uses BinaryHeap<Reverse<FreqOrd>> to get O(log n) insert/remove instead of O(n) re-sort. The recursive solution mirrors OCaml's list-sort style exactly, trading efficiency for directness. Both share the same codes traversal function.

    Code Example

    use std::cmp::{Ordering, Reverse};
    use std::collections::BinaryHeap;
    
    pub enum HTree {
        Leaf(char, u32),
        Node(Box<HTree>, Box<HTree>, u32),
    }
    
    // Newtype that orders by frequency only, enabling min-heap via Reverse<FreqOrd>
    struct FreqOrd(HTree);
    impl Ord for FreqOrd {
        fn cmp(&self, other: &Self) -> Ordering {
            self.0.freq().cmp(&other.0.freq())
        }
    }
    
    pub fn build_tree(freqs: &[(char, u32)]) -> Option<HTree> {
        let mut heap: BinaryHeap<Reverse<FreqOrd>> = freqs
            .iter()
            .map(|&(c, f)| Reverse(FreqOrd(HTree::Leaf(c, f))))
            .collect();
        while heap.len() > 1 {
            let Reverse(FreqOrd(a)) = heap.pop()?;
            let Reverse(FreqOrd(b)) = heap.pop()?;
            let freq = a.freq() + b.freq();
            heap.push(Reverse(FreqOrd(HTree::Node(Box::new(a), Box::new(b), freq))));
        }
        heap.pop().map(|Reverse(FreqOrd(t))| t)
    }

    Key Differences

  • Recursive types: OCaml type htree = … | Node of htree * htree * int
  • is natively recursive; Rust requires Box<HTree> to give the enum a known size on the stack.

  • Min-heap: OCaml's List.sort re-sorts each step; Rust offers
  • BinaryHeap (max-heap) which becomes a min-heap via Reverse<W>.

  • Custom ordering: OCaml pattern-matches a comparison function inline;
  • Rust externalises ordering into a newtype (FreqOrd) implementing the Ord trait, keeping HTree itself free of ordering concerns.

  • String building: OCaml uses ^ (string concat) naturally in a recursive
  • call; Rust uses format!("{prefix}0") passing an owned String down the call stack to avoid repeated allocations at the top level.

    OCaml Approach

    OCaml represents the tree as a sum type and builds it by sorting a list on each iteration. List.sort re-runs on every merge step, giving O(n² log n) total work, but the code is compact and transparent. Code generation is a natural structural recursion over the tree, concatenating prefix strings.

    Full Source

    #![allow(dead_code)]
    //! 101: Huffman Encoding — Greedy Tree Building
    //! Builds an optimal prefix-free code tree by repeatedly merging the two
    //! lowest-frequency nodes, mirroring the OCaml pattern-match / list-sort approach.
    
    use std::cmp::{Ordering, Reverse};
    use std::collections::BinaryHeap;
    
    // ── Tree type ────────────────────────────────────────────────────────────────
    
    #[derive(Debug)]
    pub enum HTree {
        Leaf(char, u32),
        Node(Box<HTree>, Box<HTree>, u32),
    }
    
    impl HTree {
        pub fn freq(&self) -> u32 {
            match self {
                HTree::Leaf(_, f) | HTree::Node(_, _, f) => *f,
            }
        }
    }
    
    // Wrapper so BinaryHeap<Reverse<FreqOrd>> acts as a min-heap by frequency.
    // All four Ord traits must be consistent; we use frequency only.
    struct FreqOrd(HTree);
    
    impl PartialEq for FreqOrd {
        fn eq(&self, other: &Self) -> bool {
            self.0.freq() == other.0.freq()
        }
    }
    impl Eq for FreqOrd {}
    
    impl Ord for FreqOrd {
        fn cmp(&self, other: &Self) -> Ordering {
            self.0.freq().cmp(&other.0.freq())
        }
    }
    impl PartialOrd for FreqOrd {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            Some(self.cmp(other))
        }
    }
    
    // ── Solution 1: Idiomatic Rust — BinaryHeap min-heap ────────────────────────
    //
    // Uses std's BinaryHeap (max-heap) wrapped with Reverse to get a min-heap.
    // Each iteration pops the two cheapest nodes and pushes their merged parent.
    // O(n log n) — same asymptotic cost as the OCaml sort-per-step version but
    // with O(log n) insert/remove instead of O(n) re-sort.
    
    pub fn build_tree(freqs: &[(char, u32)]) -> Option<HTree> {
        if freqs.is_empty() {
            return None;
        }
        let mut heap: BinaryHeap<Reverse<FreqOrd>> = freqs
            .iter()
            .map(|&(c, f)| Reverse(FreqOrd(HTree::Leaf(c, f))))
            .collect();
    
        while heap.len() > 1 {
            let Reverse(FreqOrd(a)) = heap.pop()?;
            let Reverse(FreqOrd(b)) = heap.pop()?;
            let freq = a.freq() + b.freq();
            heap.push(Reverse(FreqOrd(HTree::Node(
                Box::new(a),
                Box::new(b),
                freq,
            ))));
        }
    
        heap.pop().map(|Reverse(FreqOrd(t))| t)
    }
    
    // ── Solution 2: Functional/recursive — mirrors OCaml list-sort style ─────────
    //
    // Keeps a sorted Vec and re-sorts after each merge, exactly as the OCaml
    // `go` function does.  More transparent but O(n² log n).
    
    pub fn build_tree_recursive(freqs: &[(char, u32)]) -> Option<HTree> {
        if freqs.is_empty() {
            return None;
        }
        let mut trees: Vec<HTree> = freqs.iter().map(|&(c, f)| HTree::Leaf(c, f)).collect();
        trees.sort_by_key(|t| t.freq());
    
        fn go(mut trees: Vec<HTree>) -> Option<HTree> {
            match trees.len() {
                0 => None,
                1 => trees.into_iter().next(),
                _ => {
                    // Remove two cheapest (front of sorted list)
                    let a = trees.remove(0);
                    let b = trees.remove(0);
                    let freq = a.freq() + b.freq();
                    let merged = HTree::Node(Box::new(a), Box::new(b), freq);
                    // Re-insert and re-sort, then recurse
                    let mut next: Vec<HTree> = std::iter::once(merged).chain(trees).collect();
                    next.sort_by_key(|t| t.freq());
                    go(next)
                }
            }
        }
    
        go(trees)
    }
    
    // ── Code generation ───────────────────────────────────────────────────────────
    //
    // Traverse the tree, prepending "0" for left branches and "1" for right.
    // Returns (char, code-string) pairs; leaves carry the final code.
    
    pub fn codes(tree: &HTree) -> Vec<(char, String)> {
        fn go(node: &HTree, prefix: String, out: &mut Vec<(char, String)>) {
            match node {
                HTree::Leaf(c, _) => out.push((*c, prefix)),
                HTree::Node(left, right, _) => {
                    go(left, format!("{prefix}0"), out);
                    go(right, format!("{prefix}1"), out);
                }
            }
        }
        let mut out = Vec::new();
        go(tree, String::new(), &mut out);
        out
    }
    
    // ── Tests ─────────────────────────────────────────────────────────────────────
    
    #[cfg(test)]
    mod tests {
        use super::*;
        use std::collections::HashSet;
    
        fn sample_freqs() -> Vec<(char, u32)> {
            vec![
                ('a', 5),
                ('b', 9),
                ('c', 12),
                ('d', 13),
                ('e', 16),
                ('f', 45),
            ]
        }
    
        #[test]
        fn test_single_char_produces_one_empty_code() {
            let tree = build_tree(&[('x', 7)]).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 1);
            assert_eq!(c[0].0, 'x');
            // A single leaf has no branching; the convention gives it an empty prefix.
            assert_eq!(c[0].1, "");
        }
    
        #[test]
        fn test_two_chars_get_single_bit_codes() {
            let tree = build_tree(&[('a', 3), ('b', 7)]).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 2);
            // Both codes must be exactly 1 bit
            assert!(c.iter().all(|(_, code)| code.len() == 1));
            // Codes must differ
            assert_ne!(c[0].1, c[1].1);
        }
    
        #[test]
        fn test_highest_frequency_gets_shortest_code() {
            let tree = build_tree(&sample_freqs()).unwrap();
            let c = codes(&tree);
            let f_len = c.iter().find(|(ch, _)| *ch == 'f').unwrap().1.len();
            let a_len = c.iter().find(|(ch, _)| *ch == 'a').unwrap().1.len();
            // f (freq 45) must have a strictly shorter code than a (freq 5)
            assert!(
                f_len < a_len,
                "f code len {f_len} should be < a code len {a_len}"
            );
        }
    
        #[test]
        fn test_codes_are_unique_and_correct_count() {
            let tree = build_tree(&sample_freqs()).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 6);
            let unique: HashSet<&str> = c.iter().map(|(_, s)| s.as_str()).collect();
            assert_eq!(unique.len(), 6, "all codes must be distinct");
        }
    
        #[test]
        fn test_root_frequency_equals_total() {
            let freqs = sample_freqs();
            let tree = build_tree(&freqs).unwrap();
            let total: u32 = freqs.iter().map(|(_, f)| f).sum();
            assert_eq!(tree.freq(), total);
        }
    
        #[test]
        fn test_recursive_and_heap_produce_same_total_frequency() {
            let freqs = sample_freqs();
            let t1 = build_tree(&freqs).unwrap();
            let t2 = build_tree_recursive(&freqs).unwrap();
            assert_eq!(t1.freq(), t2.freq());
            assert_eq!(codes(&t1).len(), codes(&t2).len());
        }
    
        #[test]
        fn test_empty_input_returns_none() {
            assert!(build_tree(&[]).is_none());
            assert!(build_tree_recursive(&[]).is_none());
        }
    
        #[test]
        fn test_all_codes_are_binary_strings() {
            let tree = build_tree(&sample_freqs()).unwrap();
            for (_, code) in codes(&tree) {
                assert!(
                    code.chars().all(|c| c == '0' || c == '1'),
                    "code {code:?} contains non-binary characters"
                );
            }
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        use std::collections::HashSet;
    
        fn sample_freqs() -> Vec<(char, u32)> {
            vec![
                ('a', 5),
                ('b', 9),
                ('c', 12),
                ('d', 13),
                ('e', 16),
                ('f', 45),
            ]
        }
    
        #[test]
        fn test_single_char_produces_one_empty_code() {
            let tree = build_tree(&[('x', 7)]).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 1);
            assert_eq!(c[0].0, 'x');
            // A single leaf has no branching; the convention gives it an empty prefix.
            assert_eq!(c[0].1, "");
        }
    
        #[test]
        fn test_two_chars_get_single_bit_codes() {
            let tree = build_tree(&[('a', 3), ('b', 7)]).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 2);
            // Both codes must be exactly 1 bit
            assert!(c.iter().all(|(_, code)| code.len() == 1));
            // Codes must differ
            assert_ne!(c[0].1, c[1].1);
        }
    
        #[test]
        fn test_highest_frequency_gets_shortest_code() {
            let tree = build_tree(&sample_freqs()).unwrap();
            let c = codes(&tree);
            let f_len = c.iter().find(|(ch, _)| *ch == 'f').unwrap().1.len();
            let a_len = c.iter().find(|(ch, _)| *ch == 'a').unwrap().1.len();
            // f (freq 45) must have a strictly shorter code than a (freq 5)
            assert!(
                f_len < a_len,
                "f code len {f_len} should be < a code len {a_len}"
            );
        }
    
        #[test]
        fn test_codes_are_unique_and_correct_count() {
            let tree = build_tree(&sample_freqs()).unwrap();
            let c = codes(&tree);
            assert_eq!(c.len(), 6);
            let unique: HashSet<&str> = c.iter().map(|(_, s)| s.as_str()).collect();
            assert_eq!(unique.len(), 6, "all codes must be distinct");
        }
    
        #[test]
        fn test_root_frequency_equals_total() {
            let freqs = sample_freqs();
            let tree = build_tree(&freqs).unwrap();
            let total: u32 = freqs.iter().map(|(_, f)| f).sum();
            assert_eq!(tree.freq(), total);
        }
    
        #[test]
        fn test_recursive_and_heap_produce_same_total_frequency() {
            let freqs = sample_freqs();
            let t1 = build_tree(&freqs).unwrap();
            let t2 = build_tree_recursive(&freqs).unwrap();
            assert_eq!(t1.freq(), t2.freq());
            assert_eq!(codes(&t1).len(), codes(&t2).len());
        }
    
        #[test]
        fn test_empty_input_returns_none() {
            assert!(build_tree(&[]).is_none());
            assert!(build_tree_recursive(&[]).is_none());
        }
    
        #[test]
        fn test_all_codes_are_binary_strings() {
            let tree = build_tree(&sample_freqs()).unwrap();
            for (_, code) in codes(&tree) {
                assert!(
                    code.chars().all(|c| c == '0' || c == '1'),
                    "code {code:?} contains non-binary characters"
                );
            }
        }
    }

    Deep Comparison

    OCaml vs Rust: Huffman Encoding

    Side-by-Side Code

    OCaml

    type htree = Leaf of char * int | Node of htree * htree * int
    
    let freq = function Leaf (_,f) -> f | Node (_,_,f) -> f
    
    let build_tree freqs =
      let trees = List.map (fun (c,f) -> Leaf (c,f)) freqs
        |> List.sort (fun a b -> compare (freq a) (freq b)) in
      let rec go = function
        | [t] -> t
        | a :: b :: rest ->
          let merged = Node (a, b, freq a + freq b) in
          let trees = List.sort (fun a b -> compare (freq a) (freq b)) (merged :: rest) in
          go trees
        | [] -> failwith "empty"
      in go trees
    
    let rec codes prefix = function
      | Leaf (c, _) -> [(c, prefix)]
      | Node (l, r, _) -> codes (prefix ^ "0") l @ codes (prefix ^ "1") r
    

    Rust (idiomatic — BinaryHeap min-heap)

    use std::cmp::{Ordering, Reverse};
    use std::collections::BinaryHeap;
    
    pub enum HTree {
        Leaf(char, u32),
        Node(Box<HTree>, Box<HTree>, u32),
    }
    
    // Newtype that orders by frequency only, enabling min-heap via Reverse<FreqOrd>
    struct FreqOrd(HTree);
    impl Ord for FreqOrd {
        fn cmp(&self, other: &Self) -> Ordering {
            self.0.freq().cmp(&other.0.freq())
        }
    }
    
    pub fn build_tree(freqs: &[(char, u32)]) -> Option<HTree> {
        let mut heap: BinaryHeap<Reverse<FreqOrd>> = freqs
            .iter()
            .map(|&(c, f)| Reverse(FreqOrd(HTree::Leaf(c, f))))
            .collect();
        while heap.len() > 1 {
            let Reverse(FreqOrd(a)) = heap.pop()?;
            let Reverse(FreqOrd(b)) = heap.pop()?;
            let freq = a.freq() + b.freq();
            heap.push(Reverse(FreqOrd(HTree::Node(Box::new(a), Box::new(b), freq))));
        }
        heap.pop().map(|Reverse(FreqOrd(t))| t)
    }
    

    Rust (functional/recursive — mirrors OCaml list-sort)

    pub fn build_tree_recursive(freqs: &[(char, u32)]) -> Option<HTree> {
        let mut trees: Vec<HTree> = freqs.iter()
            .map(|&(c, f)| HTree::Leaf(c, f))
            .collect();
        trees.sort_by_key(|t| t.freq());
    
        fn go(mut trees: Vec<HTree>) -> Option<HTree> {
            match trees.len() {
                0 => None,
                1 => trees.into_iter().next(),
                _ => {
                    let a = trees.remove(0);
                    let b = trees.remove(0);
                    let freq = a.freq() + b.freq();
                    let merged = HTree::Node(Box::new(a), Box::new(b), freq);
                    let mut next: Vec<HTree> = std::iter::once(merged).chain(trees).collect();
                    next.sort_by_key(|t| t.freq());
                    go(next)
                }
            }
        }
        go(trees)
    }
    

    Type Signatures

    ConceptOCamlRust
    Tree typetype htree = Leaf of char * int \| Node of htree * htree * intenum HTree { Leaf(char, u32), Node(Box<HTree>, Box<HTree>, u32) }
    Frequency accessorlet freq = function Leaf (_,f) -> f \| Node (_,_,f) -> ffn freq(&self) -> u32 { match self { Leaf(_, f) \| Node(_, _, f) => *f } }
    Build functionval build_tree : (char * int) list -> htreefn build_tree(freqs: &[(char, u32)]) -> Option<HTree>
    Code generatorval codes : string -> htree -> (char * string) listfn codes(tree: &HTree) -> Vec<(char, String)>
    Priority queueList.sort (re-sort each step)BinaryHeap<Reverse<FreqOrd>>

    Key Insights

  • Box for recursive types: OCaml's htree is natively recursive because
  • the runtime uses heap-allocated values uniformly. Rust's enum must know its size at compile time, so child nodes must be Box<HTree>. This is not boilerplate — it is an explicit ownership declaration.

  • Min-heap without a crate: OCaml has no built-in priority queue so the
  • idiomatic choice is List.sort. Rust's std::collections::BinaryHeap is a max-heap; wrapping values in Reverse<T> flips the ordering to give an efficient O(log n) min-heap, replacing the O(n log n) re-sort in the recursive version.

  • Trait-based ordering vs. inline comparator: OCaml threads a comparison
  • function (fun a b -> compare (freq a) (freq b)) directly into List.sort. Rust encodes ordering in the type system via impl Ord for FreqOrd, keeping comparison logic declared once and reused implicitly wherever the type is ordered. The FreqOrd newtype isolates this concern so HTree itself carries no ordering assumption.

  • Owned strings vs. string slices: OCaml's ^ operator copies both
  • operands into a new string on every recursive call — O(depth) allocation per leaf. The Rust version passes an owned String down the call stack with format!("{prefix}0"), which is the same cost but makes the allocation explicit and borrow-checker-safe without needing &str lifetime annotations.

  • Pattern matching on slice vs. list: The OCaml go function matches
  • | a :: b :: rest on a list head. Rust uses trees.remove(0) on a Vec, which is O(n) but equivalent. A slice-pattern version [a, b, rest @ ..] is possible with Box<[HTree]> but adds complexity for no algorithmic gain.

    When to Use Each Style

    Use idiomatic Rust (BinaryHeap) when: building or processing large priority queues where O(log n) insert/remove matters — real compression pipelines, scheduler implementations, or any greedy algorithm over many elements.

    Use recursive Rust (Vec + sort) when: the data set is small (< 256 symbols for standard Huffman), the directness of the OCaml translation aids comprehension, or you are porting OCaml code and want a mechanical first-pass before optimising.

    Open Source Repos