1033-binary-heap-topk — Top-K with BinaryHeap
Tutorial
The Problem
Finding the k largest (or smallest) elements from a large stream is a common problem in data processing: top-10 most visited pages from a billion-row log, top-k nearest neighbors in machine learning, k most recent events. Sorting the entire dataset is O(n log n), but maintaining a heap of size k gives O(n log k) — much better when k is small relative to n.
Rust's BinaryHeap is a max-heap by default. For top-k (keeping the k largest), use a min-heap of size k with Reverse<T>: when a new element is larger than the heap minimum, evict the minimum and insert the new element.
🎯 Learning Outcomes
Reverse<T> to turn BinaryHeap into a min-heapBinaryHeap as a priority queue for task schedulingCode Example
//! 1033: Top-K Elements with `BinaryHeap`
//!
//! Rust ships a binary heap in the standard library — `std::collections::BinaryHeap`
//! is a max-heap. Wrap values in `std::cmp::Reverse` to flip it into a min-heap,
//! which is exactly what the Top-K pattern needs: keep a bounded min-heap and
//! evict the smallest whenever a new, larger element arrives.
use std::cmp::Reverse;
use std::collections::BinaryHeap;
/// Return the `k` largest elements from `data` in descending order.
///
/// Uses a bounded min-heap of size `k`, giving `O(n log k)` time.
/// If `k == 0` the result is empty; if `k >= data.len()` all elements are returned sorted.
pub fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
if k == 0 {
return Vec::new();
}
let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
for &val in data {
heap.push(Reverse(val));
if heap.len() > k {
heap.pop();
}
}
let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
result.sort_unstable_by(|a, b| b.cmp(a));
result
}
/// Return the `k` elements of `data` with the largest keys, ordered by key descending.
///
/// `key_fn` extracts the ordering key from each element. Ties are broken by the
/// original index so the result is deterministic.
pub fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
K: Ord,
F: Fn(&T) -> K,
{
if k == 0 {
return Vec::new();
}
let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
for (i, item) in data.iter().enumerate() {
heap.push(Reverse((key_fn(item), i)));
if heap.len() > k {
heap.pop();
}
}
let mut entries: Vec<(K, usize)> = heap.into_iter().map(|Reverse(pair)| pair).collect();
entries.sort_by(|(ka, ia), (kb, ib)| kb.cmp(ka).then(ia.cmp(ib)));
entries.into_iter().map(|(_, i)| &data[i]).collect()
}
/// Demonstrate `BinaryHeap` behaving as a max-heap with natural ordering.
pub fn max_heap_demo() {
let mut heap = BinaryHeap::new();
for x in [3, 1, 4, 1, 5] {
heap.push(x);
}
assert_eq!(heap.pop(), Some(5));
assert_eq!(heap.pop(), Some(4));
assert_eq!(heap.pop(), Some(3));
assert_eq!(heap.peek(), Some(&1));
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn top_k_basic() {
let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
assert_eq!(top_k(3, &data), vec![9, 6, 5]);
assert_eq!(top_k(1, &data), vec![9]);
assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
}
#[test]
fn top_k_edge_cases() {
assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
assert_eq!(top_k(3, &[]), Vec::<i32>::new());
assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
}
#[test]
fn top_k_by_length() {
let words = ["hi", "hello", "hey", "howdy", "h"];
let longest = top_k_by(3, &words, |w| w.len());
assert_eq!(longest.len(), 3);
assert_eq!(*longest[0], "howdy");
assert_eq!(*longest[1], "hello");
}
#[test]
fn max_heap_works() {
max_heap_demo();
}
#[test]
fn into_sorted_vec_is_ascending() {
let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
}
#[test]
fn reverse_gives_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
for x in [5, 1, 3] {
min_heap.push(Reverse(x));
}
assert_eq!(min_heap.pop(), Some(Reverse(1)));
}
}Key Differences
BinaryHeap is a max-heap by default; a min-heap requires Reverse<T>. OCaml's CCHeap takes a custom comparison function.BinaryHeap is mutable; OCaml's functional priority queues (CCFQueue) are persistent.BinaryHeap is in Rust's std; OCaml requires a third-party crate for heap structures.Reverse wrapper**: Rust's Reverse<T> inverts the Ord comparison without allocating; OCaml inverts comparison via a custom functor argument.OCaml Approach
OCaml lacks a standard heap. The algorithmic pattern uses sorting or a functional priority queue:
(* Simple O(n log n) approach via sort *)
let top_k k data =
List.sort (fun a b -> compare b a) data
|> (fun sorted -> List.filteri (fun i _ -> i < k) sorted)
The containers library provides CCHeap and CCFQueue (functional priority queues) for the O(n log k) approach.
Full Source
//! 1033: Top-K Elements with `BinaryHeap`
//!
//! Rust ships a binary heap in the standard library — `std::collections::BinaryHeap`
//! is a max-heap. Wrap values in `std::cmp::Reverse` to flip it into a min-heap,
//! which is exactly what the Top-K pattern needs: keep a bounded min-heap and
//! evict the smallest whenever a new, larger element arrives.
use std::cmp::Reverse;
use std::collections::BinaryHeap;
/// Return the `k` largest elements from `data` in descending order.
///
/// Uses a bounded min-heap of size `k`, giving `O(n log k)` time.
/// If `k == 0` the result is empty; if `k >= data.len()` all elements are returned sorted.
pub fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
if k == 0 {
return Vec::new();
}
let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
for &val in data {
heap.push(Reverse(val));
if heap.len() > k {
heap.pop();
}
}
let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
result.sort_unstable_by(|a, b| b.cmp(a));
result
}
/// Return the `k` elements of `data` with the largest keys, ordered by key descending.
///
/// `key_fn` extracts the ordering key from each element. Ties are broken by the
/// original index so the result is deterministic.
pub fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
K: Ord,
F: Fn(&T) -> K,
{
if k == 0 {
return Vec::new();
}
let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
for (i, item) in data.iter().enumerate() {
heap.push(Reverse((key_fn(item), i)));
if heap.len() > k {
heap.pop();
}
}
let mut entries: Vec<(K, usize)> = heap.into_iter().map(|Reverse(pair)| pair).collect();
entries.sort_by(|(ka, ia), (kb, ib)| kb.cmp(ka).then(ia.cmp(ib)));
entries.into_iter().map(|(_, i)| &data[i]).collect()
}
/// Demonstrate `BinaryHeap` behaving as a max-heap with natural ordering.
pub fn max_heap_demo() {
let mut heap = BinaryHeap::new();
for x in [3, 1, 4, 1, 5] {
heap.push(x);
}
assert_eq!(heap.pop(), Some(5));
assert_eq!(heap.pop(), Some(4));
assert_eq!(heap.pop(), Some(3));
assert_eq!(heap.peek(), Some(&1));
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn top_k_basic() {
let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
assert_eq!(top_k(3, &data), vec![9, 6, 5]);
assert_eq!(top_k(1, &data), vec![9]);
assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
}
#[test]
fn top_k_edge_cases() {
assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
assert_eq!(top_k(3, &[]), Vec::<i32>::new());
assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
}
#[test]
fn top_k_by_length() {
let words = ["hi", "hello", "hey", "howdy", "h"];
let longest = top_k_by(3, &words, |w| w.len());
assert_eq!(longest.len(), 3);
assert_eq!(*longest[0], "howdy");
assert_eq!(*longest[1], "hello");
}
#[test]
fn max_heap_works() {
max_heap_demo();
}
#[test]
fn into_sorted_vec_is_ascending() {
let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
}
#[test]
fn reverse_gives_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
for x in [5, 1, 3] {
min_heap.push(Reverse(x));
}
assert_eq!(min_heap.pop(), Some(Reverse(1)));
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn top_k_basic() {
let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
assert_eq!(top_k(3, &data), vec![9, 6, 5]);
assert_eq!(top_k(1, &data), vec![9]);
assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
}
#[test]
fn top_k_edge_cases() {
assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
assert_eq!(top_k(3, &[]), Vec::<i32>::new());
assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
}
#[test]
fn top_k_by_length() {
let words = ["hi", "hello", "hey", "howdy", "h"];
let longest = top_k_by(3, &words, |w| w.len());
assert_eq!(longest.len(), 3);
assert_eq!(*longest[0], "howdy");
assert_eq!(*longest[1], "hello");
}
#[test]
fn max_heap_works() {
max_heap_demo();
}
#[test]
fn into_sorted_vec_is_ascending() {
let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
}
#[test]
fn reverse_gives_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
for x in [5, 1, 3] {
min_heap.push(Reverse(x));
}
assert_eq!(min_heap.pop(), Some(Reverse(1)));
}
}
Deep Comparison
Top-K Elements with BinaryHeap — Comparison
Core Insight
The top-K problem is a classic priority queue application. Maintain a min-heap of size K; for each element, push it and evict the smallest if heap exceeds K. Result: only the K largest remain. Rust has BinaryHeap in std; OCaml has no built-in heap.
OCaml Approach
Rust Approach
BinaryHeap<T> — max-heap by defaultBinaryHeap<Reverse<T>> — min-heap via newtype wrapperpush + pop for bounded heap patterninto_sorted_vec() for sorted extractionpeek() for O(1) max/min accessComparison Table
| Feature | OCaml | Rust (BinaryHeap) |
|---|---|---|
| Stdlib heap | No | Yes |
| Default order | N/A | Max-heap |
| Min-heap | N/A | Reverse<T> wrapper |
| Top-K complexity | O(n log n) sort | O(n log k) heap |
| Heap sort | Manual | into_sorted_vec() |
| Peek | N/A | peek() O(1) |
| Custom ordering | N/A | Implement Ord |
Exercises
bottom_k(k: usize, data: &[i32]) -> Vec<i32> (k smallest elements) using a max-heap instead of a min-heap.kth_largest(k: usize, data: &[i32]) -> Option<i32> function that returns only the kth largest element without collecting all k.BinaryHeap<(Priority, Task)> where higher priority numbers are processed first.