068: Frequency Counter
Difficulty: โญ Level: Foundations Count how many times each word (or any value) appears โ using `HashMap` and the `entry` API.The Problem This Solves
You have a list of items and you want to know how often each one appears. Word counts in a document. Event types in a log file. HTTP status codes in a request trace. Every programmer needs this pattern constantly. The painful way: for each unique item, scan the whole list and count. O(nยฒ). The dictionary approach is O(n) and every modern language has it. Python: `collections.Counter`. JavaScript: build an object with `{}`. Java: `HashMap<String, Integer>` with a null-check. Rust's `HashMap` is the same idea, but the update pattern is more explicit โ and more expressive. The `entry()` API is idiomatic Rust: "get the entry for this key, inserting a default if it's absent, then give me a mutable reference to the value."The Intuition
The key insight is `entry().or_insert(0)`. It means: "look up this key; if it's not there, insert 0; either way, give me a mutable reference to the value." Then you `+= 1` through that reference. In Python: `counts[word] = counts.get(word, 0) + 1`. In JavaScript: `counts[word] = (counts[word] || 0) + 1`. Rust's version is more structured but expresses the same idea, and it avoids two hash lookups (one to check, one to insert) by doing it in one step.How It Works in Rust
use std::collections::HashMap;
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
let lower = word.to_lowercase();
// entry(): find or create the slot for this key
// or_insert(0): if absent, insert 0 and return &mut 0
*freq.entry(lower).or_insert(0) += 1;
}
freq
}
Functional style with `fold`:
pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
text.split_whitespace()
.map(|w| w.to_lowercase())
.fold(HashMap::new(), |mut acc, word| {
*acc.entry(word).or_insert(0) += 1;
acc
})
}
Use `BTreeMap` when you need sorted output (like OCaml's `Map`, which is always ordered):
use std::collections::BTreeMap;
pub fn word_freq_sorted(text: &str) -> BTreeMap<String, usize> {
let mut freq = BTreeMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}
What This Unlocks
- Log analysis โ count error codes, HTTP methods, user agents from raw log lines
- Histogram building โ any "distribution of X" problem is a frequency counter
- Duplicate detection โ after counting, filter for `count > 1` to find duplicates
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Map type | `Map.Make(String)` (ordered, immutable) | `HashMap` (unordered) or `BTreeMap` (ordered) |
| Update pattern | `find` + `add` (two steps, replace whole map) | `entry().or_insert()` (one step, mutable ref) |
| Missing key | `try โฆ with Not_found -> 0` | `.or_insert(0)` handles it inline |
| Ordered output | Default (always ordered) | Use `BTreeMap` explicitly |
/// # Map Module โ Frequency Counter
///
/// Word frequency counting using HashMap โ the Rust equivalent of OCaml's Map.Make.
use std::collections::HashMap;
/// Idiomatic Rust: split, lowercase, count with entry API.
/// The `entry().or_insert(0)` pattern is the standard way to update-or-insert.
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
let lower = word.to_lowercase();
// `entry` API: get mutable reference, inserting default if absent
*freq.entry(lower).or_insert(0) += 1;
}
freq
}
/// Functional style using fold (iterator chain).
pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
text.split_whitespace()
.map(|w| w.to_lowercase())
.fold(HashMap::new(), |mut acc, word| {
*acc.entry(word).or_insert(0) += 1;
acc
})
}
/// Using BTreeMap for sorted output (like OCaml's Map which is ordered).
pub fn word_freq_sorted(text: &str) -> std::collections::BTreeMap<String, usize> {
let mut freq = std::collections::BTreeMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let freq = word_freq("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_empty() {
let freq = word_freq("");
assert!(freq.is_empty());
}
#[test]
fn test_single_word() {
let freq = word_freq("hello");
assert_eq!(freq["hello"], 1);
}
#[test]
fn test_case_insensitive() {
let freq = word_freq("Hello hello HELLO");
assert_eq!(freq["hello"], 3);
}
#[test]
fn test_functional_version() {
let freq = word_freq_functional("a b a c b a");
assert_eq!(freq["a"], 3);
assert_eq!(freq["b"], 2);
}
#[test]
fn test_sorted() {
let freq = word_freq_sorted("b a c a b");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
}
}
fn main() {
println!("{:?}", freq["the"], 3);
println!("{:?}", freq["cat"], 2);
println!("{:?}", freq["sat"], 1);
}
(* Map Module โ Frequency Counter *)
module SMap = Map.Make(String)
let word_freq text =
text |> String.split_on_char ' '
|> List.map String.lowercase_ascii
|> List.fold_left (fun acc w ->
let count = try SMap.find w acc with Not_found -> 0 in
SMap.add w (count + 1) acc
) SMap.empty
let () =
let freq = word_freq "the cat sat on the mat the cat" in
SMap.iter (Printf.printf "%s: %d\n") freq;
assert (SMap.find "the" freq = 3);
assert (SMap.find "cat" freq = 2);
Printf.printf "Frequency counter tests passed!\n"
๐ Detailed Comparison
Frequency Counter โ OCaml vs Rust Comparison
Core Insight
Counting frequencies is the classic map use case. OCaml's `Map.Make` functor creates an immutable balanced tree map โ each `add` creates a new map. Rust's `HashMap` is mutable with an `entry` API that makes update-or-insert a one-liner. For sorted output, Rust offers `BTreeMap`.
OCaml Approach
`Map.Make(String)` creates a module with an ordered map backed by balanced binary trees. Each update via `SMap.add` returns a new immutable map (structural sharing keeps this efficient). The `find` + `Not_found` exception pattern is common but slightly awkward.
Rust Approach
`HashMap::entry()` returns an `Entry` enum (Occupied or Vacant) โ calling `.or_insert(0)` provides a mutable reference to the count, which can be incremented in place. No exception handling needed. For ordered iteration, use `BTreeMap` instead.
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Memory | Immutable tree (shared nodes) | Mutable hash table |
| Null safety | `Not_found` exception | `entry` API (no missing key panic) |
| Errors | Exception on missing key | `Entry` enum handles presence |
| Iteration | `SMap.iter` (sorted by key) | Unordered (HashMap) or sorted (BTreeMap) |
| Update | `find` + `add` (new map) | `entry().or_insert()` (in-place) |
Things Rust Learners Should Notice
1. `entry()` API โ the killer feature for maps; avoids double-lookup (check then insert) 2. `*freq.entry(k).or_insert(0) += 1` โ dereference the mutable ref, then increment 3. `HashMap` vs `BTreeMap` โ hash is O(1) average but unordered; BTree is O(log n) but sorted 4. OCaml's Map is immutable โ functional purity at the cost of creating new nodes on each update 5. `split_whitespace()` โ handles multiple spaces, tabs, newlines; better than `split(' ')`
Further Reading
- [HashMap::entry](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.entry)
- [BTreeMap](https://doc.rust-lang.org/std/collections/struct.BTreeMap.html)