075: Merge Sort — Functional Divide and Conquer
Difficulty: 3 Level: Intermediate Split, sort recursively, merge — the canonical divide-and-conquer algorithm expressed with slices and `Vec`.The Problem This Solves
Merge sort is the textbook algorithm for stable, predictable O(n log n) sorting. It's especially valuable when you need: stability (equal elements preserve order), external sorting (data doesn't fit in memory), or parallel sorting (each half can be sorted independently). More importantly for Rust learners, merge sort is where ownership patterns become visible. OCaml's list-based merge uses structural sharing and pattern matching on cons cells. Rust's slice-based merge owns its inputs and must manage allocation explicitly — you see exactly where memory is created and destroyed.The Intuition
Three steps: 1. Split the input in half. 2. Sort each half recursively (base case: length ≤ 1 is already sorted). 3. Merge the two sorted halves by comparing front elements. Merging is the core operation: walk two sorted lists in lockstep, always taking the smaller front element. When one list is exhausted, append the rest of the other. OCaml's merge uses list cons (`h1 :: merge ...`) which is O(1) per step with structural sharing. Rust's `Vec::push` and `extend_from_slice` allocate contiguously — fewer cache misses, but each level of recursion creates new allocations.How It Works in Rust
// Merge two sorted slices into one sorted Vec
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len()); // pre-allocate
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if left[i] <= right[j] { // stable: left wins on tie
result.push(left[i].clone()); i += 1;
} else {
result.push(right[j].clone()); j += 1;
}
}
result.extend_from_slice(&left[i..]); // append remaining
result.extend_from_slice(&right[j..]);
result
}
// Recursive sort: split at midpoint, sort each half, merge
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
if list.len() <= 1 { return list.to_vec(); } // base case
let mid = list.len() / 2;
let left = merge_sort(&list[..mid]); // borrow left half
let right = merge_sort(&list[mid..]); // borrow right half
merge(&left, &right)
}
// Iterator-based merge: more functional, uses Peekable for lookahead
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let mut li = left.iter().peekable();
let mut ri = right.iter().peekable();
loop {
match (li.peek(), ri.peek()) {
(Some(&l), Some(&r)) => {
if l <= r { result.push(li.next().unwrap().clone()); }
else { result.push(ri.next().unwrap().clone()); }
}
(Some(_), None) => result.extend(li.cloned()),
(None, _) => { result.extend(ri.cloned()); break; }
}
if li.peek().is_none() { result.extend(ri.cloned()); break; }
}
result
}
For production sorting, prefer `slice.sort()` (Timsort) or `slice.sort_unstable()`. Merge sort here is pedagogical — it shows ownership patterns, recursion on slices, and the merge step cleanly.
What This Unlocks
- External sort: merge-sort's structure extends naturally to file-based sorting (sort chunks, merge files).
- Parallel sort: each recursive half is independent — easy to parallelize with `rayon`.
- Generic sort utilities: `merge<T: Ord>` works on anything orderable.
Key Differences
| Concept | OCaml | Rust | |
|---|---|---|---|
| Split | `List.filteri` or manual | `&list[..mid]` / `&list[mid..]` | |
| Merge step | `h1 :: merge ...` (cons, O(1)) | `Vec::push` + `extend_from_slice` | |
| Memory model | Structural sharing (GC) | Explicit allocation per level | |
| Base case | `[] \ | [_]` pattern match | `list.len() <= 1` |
| In-place option | No (immutable) | `slice.sort()` for in-place Timsort |
/// Merge Sort — Functional Divide and Conquer
///
/// Pure functional merge sort: split the list, sort each half, merge.
/// OCaml uses pattern matching on lists; Rust uses slices and Vec.
/// Merge two sorted slices into a new sorted Vec.
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if left[i] <= right[j] {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
/// Functional merge sort — recursive, creates new Vecs at each level.
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort(&list[..mid]);
let right = merge_sort(&list[mid..]);
merge(&left, &right)
}
/// Merge using iterators — more functional style.
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let mut li = left.iter().peekable();
let mut ri = right.iter().peekable();
loop {
match (li.peek(), ri.peek()) {
(Some(l), Some(r)) => {
if l <= r {
result.push((*li.next().unwrap()).clone());
} else {
result.push((*ri.next().unwrap()).clone());
}
}
(Some(_), None) => {
result.extend(li.cloned());
break;
}
(None, Some(_)) => {
result.extend(ri.cloned());
break;
}
(None, None) => break,
}
}
result
}
/// Custom comparator version (like OCaml's `cmp` parameter).
pub fn merge_sort_by<T: Clone>(list: &[T], cmp: &impl Fn(&T, &T) -> std::cmp::Ordering) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort_by(&list[..mid], cmp);
let right = merge_sort_by(&list[mid..], cmp);
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if cmp(&left[i], &right[j]) != std::cmp::Ordering::Greater {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
assert_eq!(merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
}
#[test]
fn test_empty() {
assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
}
#[test]
fn test_single() {
assert_eq!(merge_sort(&[42]), vec![42]);
}
#[test]
fn test_already_sorted() {
assert_eq!(merge_sort(&[1, 2, 3, 4, 5]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_reverse_sorted() {
assert_eq!(merge_sort(&[5, 4, 3, 2, 1]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_duplicates() {
assert_eq!(merge_sort(&[3, 1, 2, 1, 3]), vec![1, 1, 2, 3, 3]);
}
#[test]
fn test_strings() {
assert_eq!(
merge_sort(&["banana", "apple", "cherry"]),
vec!["apple", "banana", "cherry"]
);
}
#[test]
fn test_custom_cmp() {
// Sort descending
let result = merge_sort_by(&[1, 5, 3, 2, 4], &|a: &i32, b: &i32| b.cmp(a));
assert_eq!(result, vec![5, 4, 3, 2, 1]);
}
}
fn main() {
println!("{:?}", merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
println!("{:?}", merge_sort::<i32>(&[]), Vec::<i32>::new());
println!("{:?}", merge_sort(&[42]), vec![42]);
}
let rec merge cmp l1 l2 = match l1, l2 with
| [], l | l, [] -> l
| h1 :: t1, h2 :: t2 ->
if cmp h1 h2 <= 0 then h1 :: merge cmp t1 l2
else h2 :: merge cmp l1 t2
let rec split = function
| [] -> [], []
| [x] -> [x], []
| a :: b :: rest ->
let l, r = split rest in
a :: l, b :: r
let rec merge_sort cmp = function
| ([] | [_]) as l -> l
| l ->
let left, right = split l in
merge cmp (merge_sort cmp left) (merge_sort cmp right)
let () =
let sorted = merge_sort compare [5; 2; 8; 1; 9; 3] in
assert (sorted = [1; 2; 3; 5; 8; 9]);
assert (merge_sort compare [] = []);
assert (merge_sort compare [42] = [42]);
assert (merge_sort compare [3; 1; 2; 1; 3] = [1; 1; 2; 3; 3]);
print_endline "All assertions passed."
📊 Detailed Comparison
Merge Sort — Functional Divide and Conquer: OCaml vs Rust
The Core Insight
Merge sort is the quintessential functional sorting algorithm because it's naturally recursive and doesn't require mutation. The comparison reveals how OCaml's linked lists and Rust's contiguous Vecs lead to different implementation strategies with the same O(n log n) complexity but different constant factors.OCaml Approach
OCaml's merge sort is textbook elegant. `split` alternates elements into two lists using pattern matching on pairs (`a :: b :: rest`). `merge` recursively cons-es the smaller head onto the merged tail. The `compare` function is passed as a first-class value for custom ordering. All intermediate lists share structure through the GC — no copying needed. The `([] | [_]) as l -> l` or-pattern cleanly handles base cases.Rust Approach
Rust uses slices (`&[T]`) for input and `Vec<T>` for output. Splitting is trivial with slice indexing: `&list[..mid]` and `&list[mid..]`. Merging uses index-based iteration with `push` and `extend_from_slice` for efficient bulk copying. `Vec::with_capacity` pre-allocates the exact needed size, avoiding reallocation. The `Ord` trait bound replaces OCaml's `compare` function parameter. A peekable iterator variant shows the more functional style.Side-by-Side
| Concept | OCaml | Rust | |
|---|---|---|---|
| Split | Pattern match `a :: b :: rest` | Slice indexing `&list[..mid]` | |
| Merge | Recursive cons `h1 :: merge ...` | `Vec::push` + `extend_from_slice` | |
| Base case | `([] \ | [_]) as l -> l` | `if list.len() <= 1` |
| Comparator | `cmp` function parameter | `Ord` trait bound or closure | |
| Memory | GC, structural sharing | `Vec` allocation, contiguous | |
| Stability | Stable (preserves order) | Stable (preserves order) |
What Rust Learners Should Notice
- `Vec::with_capacity(n)` is a crucial optimization — it avoids O(log n) reallocations during the merge phase
- Slice references (`&[T]`) give you zero-cost views into data without copying — similar to how OCaml lists share tails
- `extend_from_slice` copies a contiguous block efficiently, much faster than pushing elements one by one
- The `Ord` trait bound means any type that implements comparison can be sorted — no need to pass a comparator function (though `merge_sort_by` with a closure also works)
- In production Rust, use `slice::sort()` or `slice::sort_unstable()` — they use Timsort / pdqsort respectively, both highly optimized
Further Reading
- [The Rust Book — Generics and Traits](https://doc.rust-lang.org/book/ch10-02-traits.html)
- [OCaml List Sorting](https://cs3110.github.io/textbook/chapters/ds/bst.html)