🦀 Functional Rust

075: Merge Sort — Functional Divide and Conquer

Difficulty: 3 Level: Intermediate Split, sort recursively, merge — the canonical divide-and-conquer algorithm expressed with slices and `Vec`.

The Problem This Solves

Merge sort is the textbook algorithm for stable, predictable O(n log n) sorting. It's especially valuable when you need: stability (equal elements preserve order), external sorting (data doesn't fit in memory), or parallel sorting (each half can be sorted independently). More importantly for Rust learners, merge sort is where ownership patterns become visible. OCaml's list-based merge uses structural sharing and pattern matching on cons cells. Rust's slice-based merge owns its inputs and must manage allocation explicitly — you see exactly where memory is created and destroyed.

The Intuition

Three steps: 1. Split the input in half. 2. Sort each half recursively (base case: length ≤ 1 is already sorted). 3. Merge the two sorted halves by comparing front elements. Merging is the core operation: walk two sorted lists in lockstep, always taking the smaller front element. When one list is exhausted, append the rest of the other. OCaml's merge uses list cons (`h1 :: merge ...`) which is O(1) per step with structural sharing. Rust's `Vec::push` and `extend_from_slice` allocate contiguously — fewer cache misses, but each level of recursion creates new allocations.

How It Works in Rust

// Merge two sorted slices into one sorted Vec
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
 let mut result = Vec::with_capacity(left.len() + right.len()); // pre-allocate
 let (mut i, mut j) = (0, 0);
 while i < left.len() && j < right.len() {
     if left[i] <= right[j] {   // stable: left wins on tie
         result.push(left[i].clone()); i += 1;
     } else {
         result.push(right[j].clone()); j += 1;
     }
 }
 result.extend_from_slice(&left[i..]);   // append remaining
 result.extend_from_slice(&right[j..]);
 result
}

// Recursive sort: split at midpoint, sort each half, merge
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
 if list.len() <= 1 { return list.to_vec(); }  // base case
 let mid = list.len() / 2;
 let left  = merge_sort(&list[..mid]);   // borrow left half
 let right = merge_sort(&list[mid..]);   // borrow right half
 merge(&left, &right)
}

// Iterator-based merge: more functional, uses Peekable for lookahead
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
 let mut result = Vec::with_capacity(left.len() + right.len());
 let mut li = left.iter().peekable();
 let mut ri = right.iter().peekable();
 loop {
     match (li.peek(), ri.peek()) {
         (Some(&l), Some(&r)) => {
             if l <= r { result.push(li.next().unwrap().clone()); }
             else       { result.push(ri.next().unwrap().clone()); }
         }
         (Some(_), None) => result.extend(li.cloned()),
         (None, _)       => { result.extend(ri.cloned()); break; }
     }
     if li.peek().is_none() { result.extend(ri.cloned()); break; }
 }
 result
}
For production sorting, prefer `slice.sort()` (Timsort) or `slice.sort_unstable()`. Merge sort here is pedagogical — it shows ownership patterns, recursion on slices, and the merge step cleanly.

What This Unlocks

Key Differences

ConceptOCamlRust
Split`List.filteri` or manual`&list[..mid]` / `&list[mid..]`
Merge step`h1 :: merge ...` (cons, O(1))`Vec::push` + `extend_from_slice`
Memory modelStructural sharing (GC)Explicit allocation per level
Base case`[] \[_]` pattern match`list.len() <= 1`
In-place optionNo (immutable)`slice.sort()` for in-place Timsort
/// Merge Sort — Functional Divide and Conquer
///
/// Pure functional merge sort: split the list, sort each half, merge.
/// OCaml uses pattern matching on lists; Rust uses slices and Vec.

/// Merge two sorted slices into a new sorted Vec.
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
    let mut result = Vec::with_capacity(left.len() + right.len());
    let (mut i, mut j) = (0, 0);
    while i < left.len() && j < right.len() {
        if left[i] <= right[j] {
            result.push(left[i].clone());
            i += 1;
        } else {
            result.push(right[j].clone());
            j += 1;
        }
    }
    result.extend_from_slice(&left[i..]);
    result.extend_from_slice(&right[j..]);
    result
}

/// Functional merge sort — recursive, creates new Vecs at each level.
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
    if list.len() <= 1 {
        return list.to_vec();
    }
    let mid = list.len() / 2;
    let left = merge_sort(&list[..mid]);
    let right = merge_sort(&list[mid..]);
    merge(&left, &right)
}

/// Merge using iterators — more functional style.
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
    let mut result = Vec::with_capacity(left.len() + right.len());
    let mut li = left.iter().peekable();
    let mut ri = right.iter().peekable();
    loop {
        match (li.peek(), ri.peek()) {
            (Some(l), Some(r)) => {
                if l <= r {
                    result.push((*li.next().unwrap()).clone());
                } else {
                    result.push((*ri.next().unwrap()).clone());
                }
            }
            (Some(_), None) => {
                result.extend(li.cloned());
                break;
            }
            (None, Some(_)) => {
                result.extend(ri.cloned());
                break;
            }
            (None, None) => break,
        }
    }
    result
}

/// Custom comparator version (like OCaml's `cmp` parameter).
pub fn merge_sort_by<T: Clone>(list: &[T], cmp: &impl Fn(&T, &T) -> std::cmp::Ordering) -> Vec<T> {
    if list.len() <= 1 {
        return list.to_vec();
    }
    let mid = list.len() / 2;
    let left = merge_sort_by(&list[..mid], cmp);
    let right = merge_sort_by(&list[mid..], cmp);
    let mut result = Vec::with_capacity(left.len() + right.len());
    let (mut i, mut j) = (0, 0);
    while i < left.len() && j < right.len() {
        if cmp(&left[i], &right[j]) != std::cmp::Ordering::Greater {
            result.push(left[i].clone());
            i += 1;
        } else {
            result.push(right[j].clone());
            j += 1;
        }
    }
    result.extend_from_slice(&left[i..]);
    result.extend_from_slice(&right[j..]);
    result
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic() {
        assert_eq!(merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
    }

    #[test]
    fn test_empty() {
        assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
    }

    #[test]
    fn test_single() {
        assert_eq!(merge_sort(&[42]), vec![42]);
    }

    #[test]
    fn test_already_sorted() {
        assert_eq!(merge_sort(&[1, 2, 3, 4, 5]), vec![1, 2, 3, 4, 5]);
    }

    #[test]
    fn test_reverse_sorted() {
        assert_eq!(merge_sort(&[5, 4, 3, 2, 1]), vec![1, 2, 3, 4, 5]);
    }

    #[test]
    fn test_duplicates() {
        assert_eq!(merge_sort(&[3, 1, 2, 1, 3]), vec![1, 1, 2, 3, 3]);
    }

    #[test]
    fn test_strings() {
        assert_eq!(
            merge_sort(&["banana", "apple", "cherry"]),
            vec!["apple", "banana", "cherry"]
        );
    }

    #[test]
    fn test_custom_cmp() {
        // Sort descending
        let result = merge_sort_by(&[1, 5, 3, 2, 4], &|a: &i32, b: &i32| b.cmp(a));
        assert_eq!(result, vec![5, 4, 3, 2, 1]);
    }
}

fn main() {
    println!("{:?}", merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
    println!("{:?}", merge_sort::<i32>(&[]), Vec::<i32>::new());
    println!("{:?}", merge_sort(&[42]), vec![42]);
}
let rec merge cmp l1 l2 = match l1, l2 with
  | [], l | l, [] -> l
  | h1 :: t1, h2 :: t2 ->
    if cmp h1 h2 <= 0 then h1 :: merge cmp t1 l2
    else h2 :: merge cmp l1 t2

let rec split = function
  | [] -> [], []
  | [x] -> [x], []
  | a :: b :: rest ->
    let l, r = split rest in
    a :: l, b :: r

let rec merge_sort cmp = function
  | ([] | [_]) as l -> l
  | l ->
    let left, right = split l in
    merge cmp (merge_sort cmp left) (merge_sort cmp right)

let () =
  let sorted = merge_sort compare [5; 2; 8; 1; 9; 3] in
  assert (sorted = [1; 2; 3; 5; 8; 9]);
  assert (merge_sort compare [] = []);
  assert (merge_sort compare [42] = [42]);
  assert (merge_sort compare [3; 1; 2; 1; 3] = [1; 1; 2; 3; 3]);
  print_endline "All assertions passed."

📊 Detailed Comparison

Merge Sort — Functional Divide and Conquer: OCaml vs Rust

The Core Insight

Merge sort is the quintessential functional sorting algorithm because it's naturally recursive and doesn't require mutation. The comparison reveals how OCaml's linked lists and Rust's contiguous Vecs lead to different implementation strategies with the same O(n log n) complexity but different constant factors.

OCaml Approach

OCaml's merge sort is textbook elegant. `split` alternates elements into two lists using pattern matching on pairs (`a :: b :: rest`). `merge` recursively cons-es the smaller head onto the merged tail. The `compare` function is passed as a first-class value for custom ordering. All intermediate lists share structure through the GC — no copying needed. The `([] | [_]) as l -> l` or-pattern cleanly handles base cases.

Rust Approach

Rust uses slices (`&[T]`) for input and `Vec<T>` for output. Splitting is trivial with slice indexing: `&list[..mid]` and `&list[mid..]`. Merging uses index-based iteration with `push` and `extend_from_slice` for efficient bulk copying. `Vec::with_capacity` pre-allocates the exact needed size, avoiding reallocation. The `Ord` trait bound replaces OCaml's `compare` function parameter. A peekable iterator variant shows the more functional style.

Side-by-Side

ConceptOCamlRust
SplitPattern match `a :: b :: rest`Slice indexing `&list[..mid]`
MergeRecursive cons `h1 :: merge ...``Vec::push` + `extend_from_slice`
Base case`([] \[_]) as l -> l``if list.len() <= 1`
Comparator`cmp` function parameter`Ord` trait bound or closure
MemoryGC, structural sharing`Vec` allocation, contiguous
StabilityStable (preserves order)Stable (preserves order)

What Rust Learners Should Notice

  • `Vec::with_capacity(n)` is a crucial optimization — it avoids O(log n) reallocations during the merge phase
  • Slice references (`&[T]`) give you zero-cost views into data without copying — similar to how OCaml lists share tails
  • `extend_from_slice` copies a contiguous block efficiently, much faster than pushing elements one by one
  • The `Ord` trait bound means any type that implements comparison can be sorted — no need to pass a comparator function (though `merge_sort_by` with a closure also works)
  • In production Rust, use `slice::sort()` or `slice::sort_unstable()` — they use Timsort / pdqsort respectively, both highly optimized

Further Reading

  • [The Rust Book — Generics and Traits](https://doc.rust-lang.org/book/ch10-02-traits.html)
  • [OCaml List Sorting](https://cs3110.github.io/textbook/chapters/ds/bst.html)