ExamplesBy LevelBy TopicLearning Paths
1165 Advanced

1165-building-a-huffman-tree-from-character-frequencies — Huffman Encoding: Greedy Tree Building

Functional Programming

Tutorial

The Problem

Huffman coding, invented by David Huffman in 1952, is an optimal prefix-free encoding: common characters get short bit sequences, rare characters get long ones. It is used in ZIP, gzip, PNG, JPEG, and MP3 — virtually every compression format incorporates Huffman coding.

The algorithm builds a binary tree greedily using a priority queue: repeatedly merge the two lowest-frequency nodes until one tree remains. This greedy strategy provably produces the optimal prefix code (Shannon entropy bound).

🎯 Learning Outcomes

  • • Build a Huffman tree using a min-heap (priority queue)
  • • Understand the greedy merge strategy: always merge the two smallest
  • • Generate prefix-free codes by walking the tree (left=0, right=1)
  • • Encode and decode text using the generated codebook
  • • Understand why Huffman codes are optimal (proof by exchange argument)
  • Code Example

    #![allow(clippy::all)]
    // placeholder

    Key Differences

  • Priority queue: Rust uses BinaryHeap<Reverse<(freq, id, Node)>> with a tie-breaking counter; OCaml uses Set.Make with a custom comparator.
  • Recursive type: Rust's HuffNode::Internal uses Box<HuffNode> for the children (recursive type); OCaml's huffnode is recursive without annotation.
  • **Ord requirement**: Rust's BinaryHeap requires Ord on the element type — the counter id breaks ties for equal frequencies; OCaml's Set comparator handles this directly.
  • Codebook generation: Both walk the tree recursively, passing a prefix bit string down: left branch appends 0, right branch appends 1.
  • OCaml Approach

    type huffnode =
      | Leaf of char * int
      | Internal of int * huffnode * huffnode
    
    module PQ = Set.Make(struct
      type t = int * int * huffnode
      let compare (f1, i1, _) (f2, i2, _) = compare (f1, i1) (f2, i2)
    end)
    

    OCaml uses Set.Make as a priority queue (sorted by frequency), or the psq library for a proper priority queue. The algorithm structure is identical.

    Full Source

    #![allow(clippy::all)]
    // placeholder

    Deep Comparison

    <!-- Comparison will be generated by Claude Code -->

    Exercises

  • Implement generate_codes(tree: &HuffNode) -> HashMap<char, String> that produces the codebook by walking the tree.
  • Write encode(text: &str, codes: &HashMap<char, String>) -> String and decode(bits: &str, tree: &HuffNode) -> String.
  • Compute the compression ratio: compare the number of bits in the Huffman encoding to the 8-bit ASCII encoding of the same text.
  • Open Source Repos