028: Sort By Length
Difficulty: 1 Level: Foundations Sort a list of lists by length โ and optionally by length frequency (rarest lengths first).The Problem This Solves
You have a list of word lists, file paths, or token sequences of varying lengths and need to sort them: shortest first (like a spell-checker ranking by word length), or by how rare each length is (useful for prioritizing unusual patterns). The task: given `[['a','b','c'], ['d','e'], ['f'], ['g','h']]`, sort by length to get `[['f'], ['d','e'], ['g','h'], ['a','b','c']]`. In Python: `sorted(lst, key=len)`. One line. Done. Rust's version is equally concise โ `.sort_by_key(|v| v.len())` โ but it works on generic `Vec<T>` elements and is guaranteed to be a stable sort (preserving relative order of equal-length lists). The frequency-sort variant shows how to combine a HashMap with a custom comparator in a clean, readable way.The Intuition
In Python:sorted(lists, key=len) # sort by length
sorted(lists, key=lambda lst: (lists.count(len(lst)), len(lst))) # sort by freq
In Rust, `.sort_by_key(|v| v.len())` is the direct translation of `key=len`. The key is extracted once per element and used for comparison โ same semantics as Python's `key=` argument.
For the frequency sort: first build a frequency map (how many lists have each length), then sort by `(frequency_of_this_length, length_itself)`. This two-level sort means: rarest length first; among equal-frequency lengths, shorter ones come first.
How It Works in Rust
// Part a: sort by length, shortest first
fn sort_by_length<T: Clone>(lists: &[Vec<T>]) -> Vec<Vec<T>> {
let mut result = lists.to_vec();
result.sort_by_key(|v| v.len()); // stable sort by length
result
}
// Part b: sort by how rare each length is (rarest first)
fn sort_by_length_freq<T: Clone>(lists: &[Vec<T>]) -> Vec<Vec<T>> {
// Step 1: count how many lists have each length
let mut freq: HashMap<usize, usize> = HashMap::new();
for lst in lists {
*freq.entry(lst.len()).or_insert(0) += 1;
}
let mut result = lists.to_vec();
// Step 2: sort by (frequency of this length, length itself)
result.sort_by(|a, b| {
let fa = freq[&a.len()];
let fb = freq[&b.len()];
fa.cmp(&fb).then_with(|| a.len().cmp(&b.len()))
});
result
}
- `sort_by_key` โ stable sort using an extracted key (like Python's `key=`)
- `HashMap::entry(...).or_insert(0)` โ count occurrences idiomatically
- `fa.cmp(&fb).then_with(...)` โ multi-level comparison: first by frequency, then by length as tiebreaker
- `sort_by` vs `sort_by_key` โ use `sort_by` when you need a comparison involving both elements simultaneously
What This Unlocks
- NLP preprocessing โ sort token sequences by length before batching (common in ML pipelines).
- Log analysis โ sort log entry groups by length to find outlier patterns.
- Priority ranking โ sort candidates by a derived metric (length as proxy for complexity).
Key Differences
| Concept | OCaml | Rust | ||
|---|---|---|---|---|
| Sort by key | `List.sort (fun a b -> compare (len a) (len b))` | `.sort_by_key(\ | v\ | v.len())` |
| Stability | `List.stable_sort` | `.sort_by_key` is always stable | ||
| Frequency count | `List.fold_left` to build a map | `HashMap::entry().or_insert(0)` | ||
| Multi-level sort | Nested `compare` | `.cmp().then_with(...)` chained comparison | ||
| Generic over T | Polymorphic `'a list list` | `Vec<Vec<T>>` with `T: Clone` bound |
// Sort by Length โ 99 Problems #28
// a) Sort a list of lists by length (shortest first).
// b) Sort by length frequency (rarest length first).
fn sort_by_length<T: Clone>(lists: &[Vec<T>]) -> Vec<Vec<T>> {
let mut result = lists.to_vec();
result.sort_by_key(|v| v.len());
result
}
/// Sort by length frequency: shorter-frequency groups come first.
fn sort_by_length_freq<T: Clone>(lists: &[Vec<T>]) -> Vec<Vec<T>> {
// Count how often each length appears
let mut freq: std::collections::HashMap<usize, usize> = std::collections::HashMap::new();
for lst in lists {
*freq.entry(lst.len()).or_insert(0) += 1;
}
let mut result = lists.to_vec();
// Sort by frequency of the length, then by length itself for stability
result.sort_by(|a, b| {
let fa = freq[&a.len()];
let fb = freq[&b.len()];
fa.cmp(&fb).then_with(|| a.len().cmp(&b.len()))
});
result
}
fn main() {
let input: Vec<Vec<char>> = vec![
vec!['a', 'b', 'c'],
vec!['d', 'e'],
vec!['f', 'g', 'h', 'i'],
vec!['j'],
vec!['k', 'l'],
vec!['m', 'n', 'o'],
];
println!("Input:");
for l in &input { println!(" {:?}", l); }
println!("Sorted by length:");
for l in sort_by_length(&input) { println!(" {:?}", l); }
println!("Sorted by length frequency:");
for l in sort_by_length_freq(&input) { println!(" {:?}", l); }
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_sort_by_length_basic() {
let input = vec![vec![1, 2, 3], vec![4], vec![5, 6]];
let sorted = sort_by_length(&input);
assert_eq!(sorted[0], vec![4]);
assert_eq!(sorted[1], vec![5, 6]);
assert_eq!(sorted[2], vec![1, 2, 3]);
}
#[test]
fn test_sort_preserves_all_elements() {
let input = vec![vec!['a', 'b', 'c'], vec!['d', 'e'], vec!['f']];
let sorted = sort_by_length(&input);
assert_eq!(sorted.len(), 3);
let total: usize = sorted.iter().map(|v| v.len()).sum();
assert_eq!(total, 6);
}
#[test]
fn test_sort_by_freq() {
// Two lists of len 2, one of len 3 โ len-3 has freq=1 (rarest), should come first
let input = vec![vec![1, 2, 3], vec![4, 5], vec![6, 7]];
let sorted = sort_by_length_freq(&input);
assert_eq!(sorted[0].len(), 3); // rarest
}
#[test]
fn test_empty() {
let empty: Vec<Vec<i32>> = vec![];
assert_eq!(sort_by_length(&empty), vec![] as Vec<Vec<i32>>);
}
}
(* Sort By Length *)
(* OCaml 99 Problems #28 *)
(* Implementation for example 28 *)
(* Tests *)
let () =
(* Add tests *)
print_endline "โ OCaml tests passed"