๐Ÿฆ€ Functional Rust

356: Advanced HashMap Patterns

Difficulty: 2 Level: Intermediate Grouping, counting, caching, and inverting โ€” the idioms that turn HashMap from "key โ†’ value" into a data processing workhorse.

The Problem This Solves

You know how to insert and look up values in a `HashMap`. But real programs need more: group a list of items by category, count word frequencies, build an inverted index from value back to key, or cache expensive computation results. Done naively, these all involve awkward "does this key exist?" checks followed by either insert or update. Rust's `HashMap` has a richer API than most developers discover. The entry API (`entry().or_insert()`) handles insert-or-update in a single lookup. The `or_insert_with` and `and_modify` variants let you run closures only when needed. Understanding these patterns turns what would be 5-line conditional blocks into single expressive statements. A second issue is performance. Rust's default hasher (SipHash) is DoS-resistant but not the fastest for integer keys or short strings. When you need raw throughput and the keys come from a trusted source, swapping the hasher with a type alias costs one line of code.

The Intuition

Python's `collections.Counter` and `collections.defaultdict` are the closest equivalent. `Counter` auto-initializes counts to zero; `defaultdict` auto-initializes any missing key with a factory. Rust's entry API gives you both behaviors on a plain `HashMap` โ€” the `or_insert(0)` idiom is the Counter pattern, and `or_insert_with(Vec::new)` is the defaultdict pattern. The key mental shift: stop thinking of `.get()` + `.insert()` as the fundamental operations. The entry API is the idiomatic foundation โ€” it enters the map slot once and lets you read, modify, or initialize it in one coherent operation.

How It Works in Rust

use std::collections::HashMap;

// Pattern 1: Word frequency count (the Counter pattern)
let words = vec!["apple", "banana", "apple", "cherry", "banana", "apple"];
let mut freq: HashMap<&str, u32> = HashMap::new();
for word in &words {
 *freq.entry(word).or_insert(0) += 1; // insert 0 if missing, then increment
}
// freq: {"apple": 3, "banana": 2, "cherry": 1}

// Pattern 2: Group items by key (defaultdict(list) pattern)
let items = vec![("fruit", "apple"), ("veggie", "carrot"), ("fruit", "banana")];
let mut groups: HashMap<&str, Vec<&str>> = HashMap::new();
for (category, item) in &items {
 groups.entry(category).or_insert_with(Vec::new).push(item);
}
// groups: {"fruit": ["apple", "banana"], "veggie": ["carrot"]}

// Pattern 3: Conditional update (and_modify chained with or_insert)
let mut scores: HashMap<&str, i32> = HashMap::new();
scores.entry("alice")
 .and_modify(|s| *s += 10) // update if present
 .or_insert(10);            // insert if absent

// Pattern 4: Invert a map (value โ†’ key)
let original: HashMap<&str, u32> = [("alice", 1), ("bob", 2)].into_iter().collect();
let inverted: HashMap<u32, &str> = original.iter().map(|(&k, &v)| (v, k)).collect();

// Pattern 5: Memoization / lazy caching
fn expensive(n: u64) -> u64 { n * n } // placeholder
let mut cache: HashMap<u64, u64> = HashMap::new();
let result = *cache.entry(42).or_insert_with(|| expensive(42));

What This Unlocks

Key Differences

ConceptOCamlRust
Hash map`Hashtbl``HashMap<K, V>`
Auto-init missing key`find` + `add` manually`.entry().or_insert(default)`
Group-bymanual`.entry().or_insert_with(Vec::new).push(v)`
Conditional update`find` + `replace``.entry().and_modify(f).or_insert(v)`
Iteration`Hashtbl.iter``.iter()` / `.keys()` / `.values()`
Custom hasherN/A`HashMap<K,V,BuildHasher>` type param
use std::collections::HashMap;

fn word_count(text: &str) -> HashMap<String, usize> {
    let mut map = HashMap::new();
    for word in text.split_whitespace() {
        *map.entry(word.to_string()).or_insert(0) += 1;
    }
    map
}

fn group_by<T, K, F>(items: Vec<T>, key: F) -> HashMap<K, Vec<T>>
where K: Eq + std::hash::Hash, F: Fn(&T) -> K
{
    let mut map: HashMap<K, Vec<T>> = HashMap::new();
    for item in items { map.entry(key(&item)).or_default().push(item); }
    map
}

fn frequency_top_n(map: &HashMap<String,usize>, n: usize) -> Vec<(&str, usize)> {
    let mut pairs: Vec<_> = map.iter().map(|(k,&v)|(k.as_str(),v)).collect();
    pairs.sort_by(|a,b| b.1.cmp(&a.1));
    pairs.into_iter().take(n).collect()
}

fn main() {
    let text = "the cat sat on the mat the cat";
    let mut wc = word_count(text);
    let mut sorted: Vec<_> = wc.iter().collect();
    sorted.sort_by_key(|(k,_)| k.clone());
    for (w,n) in sorted { println!("{w}: {n}"); }

    println!("Top 2: {:?}", frequency_top_n(&wc, 2));

    let nums: Vec<i32> = (1..=10).collect();
    let grouped = group_by(nums, |&x| if x%2==0 {"even"} else {"odd"});
    let mut evens = grouped["even"].clone(); evens.sort();
    let mut odds = grouped["odd"].clone(); odds.sort();
    println!("Evens: {evens:?}");
    println!("Odds: {odds:?}");
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn word_count_basic() {
        let wc = word_count("a b a c a b");
        assert_eq!(wc["a"], 3); assert_eq!(wc["b"], 2); assert_eq!(wc["c"], 1);
    }
    #[test] fn group_by_parity() {
        let g = group_by(vec![1,2,3,4,5], |&x| x%2);
        assert_eq!(g[&0].iter().filter(|&&x|x%2==0).count(), g[&0].len());
    }
}
(* OCaml: Hashtbl patterns *)

let word_count text =
  let tbl = Hashtbl.create 16 in
  String.split_on_char ' ' text |> List.filter ((<>) "") |> List.iter (fun w ->
    let n = try Hashtbl.find tbl w with Not_found -> 0 in
    Hashtbl.replace tbl w (n+1)
  );
  tbl

let group_by f lst =
  let tbl = Hashtbl.create 8 in
  List.iter (fun x ->
    let k = f x in
    let group = try Hashtbl.find tbl k with Not_found -> [] in
    Hashtbl.replace tbl k (x::group)
  ) lst; tbl

let () =
  let wc = word_count "the cat sat on the mat the cat" in
  Hashtbl.iter (fun w n -> Printf.printf "%s: %d\n" w n) wc