075 Intermediate

075 — Merge Sort

Functional Programming

Tutorial

The Problem

Merge sort (John von Neumann, 1945) is the canonical divide-and-conquer sorting algorithm: split the list in half, sort each half recursively, merge the sorted halves. It guarantees O(n log n) in all cases — unlike quicksort which degrades to O(n²) on sorted input. It is the algorithm used in Rust's Vec::sort (TimSort) and Java's Arrays.sort for objects.

Merge sort is naturally recursive and maps directly to functional programming patterns: the merge function is a fold over two sorted lists, and the sort function is a structural recursion on list halves. Understanding merge sort deeply unlocks intuition for all divide-and-conquer algorithms.

🎯 Learning Outcomes

• Implement the two-phase merge sort: split, recurse, merge

• Write a merge function for combining two sorted slices in O(n+m)

• Implement the generic version with a custom comparator F: Fn(&T, &T) -> Ordering

• Understand why the base case is len <= 1 (a list of 0 or 1 elements is already sorted)

• Compare recursive merge sort with Rust's built-in sort (TimSort)

Code Example

//! 075: Merge Sort
//!
//! Three variations on divide-and-conquer sorting:
//! 1. Classic recursive merge sort on `Ord` slices.
//! 2. Generic version parameterised by a comparator closure.
//! 3. Bottom-up fold-style sort that pairwise-merges singletons.
//!
//! The implementations mirror the OCaml originals by returning fresh `Vec`s
//! rather than sorting in place — this keeps the code purely functional and
//! easy to compare with the OCaml source.

use std::cmp::Ordering;

/// Merge two already-sorted slices into a new sorted `Vec`.
pub fn merge<T: Ord + Clone>(l1: &[T], l2: &[T]) -> Vec<T> {
    merge_by(l1, l2, |a, b| a.cmp(b))
}

/// Classic recursive merge sort: split in half, recurse, merge.
pub fn merge_sort<T: Ord + Clone>(xs: &[T]) -> Vec<T> {
    merge_sort_by(xs, |a, b| a.cmp(b))
}

/// Merge two sorted slices using a custom comparator.
pub fn merge_by<T, F>(l1: &[T], l2: &[T], cmp: F) -> Vec<T>
where
    T: Clone,
    F: Fn(&T, &T) -> Ordering,
{
    let mut out = Vec::with_capacity(l1.len() + l2.len());
    let (mut i, mut j) = (0, 0);
    while i < l1.len() && j < l2.len() {
        if cmp(&l1[i], &l2[j]) != Ordering::Greater {
            out.push(l1[i].clone());
            i += 1;
        } else {
            out.push(l2[j].clone());
            j += 1;
        }
    }
    out.extend_from_slice(&l1[i..]);
    out.extend_from_slice(&l2[j..]);
    out
}

/// Generic merge sort with a custom comparator.
pub fn merge_sort_by<T, F>(xs: &[T], cmp: F) -> Vec<T>
where
    T: Clone,
    F: Fn(&T, &T) -> Ordering + Copy,
{
    if xs.len() <= 1 {
        return xs.to_vec();
    }
    let (left, right) = xs.split_at(xs.len() / 2);
    merge_by(&merge_sort_by(left, cmp), &merge_sort_by(right, cmp), cmp)
}

/// Bottom-up merge sort: start with singletons and repeatedly pair-merge.
pub fn merge_sort_fold<T: Ord + Clone>(xs: &[T]) -> Vec<T> {
    if xs.is_empty() {
        return Vec::new();
    }
    let mut runs: Vec<Vec<T>> = xs.iter().map(|x| vec![x.clone()]).collect();
    while runs.len() > 1 {
        runs = runs
            .chunks(2)
            .map(|pair| match pair {
                [a, b] => merge(a, b),
                [a] => a.clone(),
                _ => unreachable!(),
            })
            .collect();
    }
    runs.pop().unwrap_or_default()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn sorts_typical_list() {
        assert_eq!(
            merge_sort(&[5, 3, 8, 1, 9, 2, 7]),
            vec![1, 2, 3, 5, 7, 8, 9]
        );
    }

    #[test]
    fn handles_edge_cases() {
        assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort(&[1]), vec![1]);
        assert_eq!(merge_sort(&[2, 1]), vec![1, 2]);
    }

    #[test]
    fn preserves_duplicates() {
        assert_eq!(merge_sort(&[3, 1, 3, 2, 1, 2]), vec![1, 1, 2, 2, 3, 3]);
    }

    #[test]
    fn merge_combines_two_sorted_slices() {
        assert_eq!(merge(&[1, 4, 7], &[2, 3, 8]), vec![1, 2, 3, 4, 7, 8]);
        assert_eq!(merge::<i32>(&[], &[1, 2]), vec![1, 2]);
        assert_eq!(merge(&[1, 2], &[]), vec![1, 2]);
    }

    #[test]
    fn ascending_with_default_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| a.cmp(b)),
            vec![1, 3, 5, 8]
        );
    }

    #[test]
    fn descending_with_reversed_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| b.cmp(a)),
            vec![8, 5, 3, 1]
        );
    }

    #[test]
    fn sorts_strings_by_length() {
        let words = ["elephant", "cat", "bumblebee", "dog"];
        let sorted = merge_sort_by(&words, |a, b| a.len().cmp(&b.len()));
        assert_eq!(sorted, vec!["cat", "dog", "elephant", "bumblebee"]);
    }

    #[test]
    fn bottom_up_matches_recursive() {
        let xs = [5, 3, 8, 1, 9, 2, 7];
        assert_eq!(merge_sort_fold(&xs), vec![1, 2, 3, 5, 7, 8, 9]);
        assert_eq!(merge_sort_fold::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort_fold(&[42]), vec![42]);
    }
}

(* 075: Merge Sort *)

(* Approach 1: Classic recursive merge sort *)
let rec merge l1 l2 =
  match l1, l2 with
  | [], l | l, [] -> l
  | x :: xs, y :: ys ->
    if x <= y then x :: merge xs (y :: ys)
    else y :: merge (x :: xs) ys

let rec split = function
  | [] -> ([], [])
  | [x] -> ([x], [])
  | x :: y :: rest ->
    let (l, r) = split rest in
    (x :: l, y :: r)

let rec merge_sort = function
  | [] -> []
  | [x] -> [x]
  | lst ->
    let (l, r) = split lst in
    merge (merge_sort l) (merge_sort r)

(* Approach 2: Generic merge sort with comparator *)
let rec merge_by cmp l1 l2 =
  match l1, l2 with
  | [], l | l, [] -> l
  | x :: xs, y :: ys ->
    if cmp x y <= 0 then x :: merge_by cmp xs (y :: ys)
    else y :: merge_by cmp (x :: xs) ys

let rec merge_sort_by cmp = function
  | [] -> []
  | [x] -> [x]
  | lst ->
    let (l, r) = split lst in
    merge_by cmp (merge_sort_by cmp l) (merge_sort_by cmp r)

(* Approach 3: Bottom-up style using fold *)
let merge_sort_fold lst =
  let singletons = List.map (fun x -> [x]) lst in
  let rec reduce = function
    | [] -> []
    | [x] -> x
    | pairs ->
      let rec pair_merge = function
        | [] -> []
        | [x] -> [x]
        | x :: y :: rest -> merge x y :: pair_merge rest
      in
      reduce (pair_merge pairs)
  in
  reduce singletons

(* Tests *)
let () =
  assert (merge_sort [5; 3; 8; 1; 9; 2; 7] = [1; 2; 3; 5; 7; 8; 9]);
  assert (merge_sort [] = []);
  assert (merge_sort [1] = [1]);
  assert (merge_sort [2; 1] = [1; 2]);
  assert (merge_sort_by compare [5; 3; 8; 1] = [1; 3; 5; 8]);
  assert (merge_sort_by (fun a b -> compare b a) [5; 3; 8; 1] = [8; 5; 3; 1]);
  assert (merge_sort_fold [5; 3; 8; 1; 9; 2; 7] = [1; 2; 3; 5; 7; 8; 9]);
  Printf.printf "✓ All tests passed\n"

Key Differences

Slice vs list: Rust sorts slices (&[i32]) and produces new Vec<i32>. OCaml sorts list and produces new lists. List-based merge sort is purely functional; slice-based requires cloning.

In-place option: Rust can implement in-place merge sort using Vec::split_at_mut and rotate_left. OCaml's immutable lists cannot be sorted in place.

Stability: Rust's sort is stable (preserves relative order of equal elements). sort_unstable is faster but not stable. OCaml's List.sort is not stable; List.stable_sort is.

**split_at**: Rust's v.split_at(mid) is O(1). OCaml must traverse to the midpoint: let rec split_at n = function ... | x :: t -> let (l, r) = split_at (n-1) t in (x :: l, r) — O(n/2).

OCaml Approach

OCaml's merge sort uses recursive pattern matching on lists:

let rec merge l1 l2 = match l1, l2 with
  | [], l | l, [] -> l
  | x :: t1, y :: t2 ->
    if x <= y then x :: merge t1 l2
    else y :: merge l1 t2

let rec merge_sort = function
  | ([] | [_]) as lst -> lst
  | lst ->
    let mid = List.length lst / 2 in
    let left = List.filteri (fun i _ -> i < mid) lst in
    let right = List.filteri (fun i _ -> i >= mid) lst in
    merge (merge_sort left) (merge_sort right)

The list-based version is purely functional with no mutation. List.stable_sort in OCaml's stdlib is an optimized merge sort.

Full Source

//! 075: Merge Sort
//!
//! Three variations on divide-and-conquer sorting:
//! 1. Classic recursive merge sort on `Ord` slices.
//! 2. Generic version parameterised by a comparator closure.
//! 3. Bottom-up fold-style sort that pairwise-merges singletons.
//!
//! The implementations mirror the OCaml originals by returning fresh `Vec`s
//! rather than sorting in place — this keeps the code purely functional and
//! easy to compare with the OCaml source.

use std::cmp::Ordering;

/// Merge two already-sorted slices into a new sorted `Vec`.
pub fn merge<T: Ord + Clone>(l1: &[T], l2: &[T]) -> Vec<T> {
    merge_by(l1, l2, |a, b| a.cmp(b))
}

/// Classic recursive merge sort: split in half, recurse, merge.
pub fn merge_sort<T: Ord + Clone>(xs: &[T]) -> Vec<T> {
    merge_sort_by(xs, |a, b| a.cmp(b))
}

/// Merge two sorted slices using a custom comparator.
pub fn merge_by<T, F>(l1: &[T], l2: &[T], cmp: F) -> Vec<T>
where
    T: Clone,
    F: Fn(&T, &T) -> Ordering,
{
    let mut out = Vec::with_capacity(l1.len() + l2.len());
    let (mut i, mut j) = (0, 0);
    while i < l1.len() && j < l2.len() {
        if cmp(&l1[i], &l2[j]) != Ordering::Greater {
            out.push(l1[i].clone());
            i += 1;
        } else {
            out.push(l2[j].clone());
            j += 1;
        }
    }
    out.extend_from_slice(&l1[i..]);
    out.extend_from_slice(&l2[j..]);
    out
}

/// Generic merge sort with a custom comparator.
pub fn merge_sort_by<T, F>(xs: &[T], cmp: F) -> Vec<T>
where
    T: Clone,
    F: Fn(&T, &T) -> Ordering + Copy,
{
    if xs.len() <= 1 {
        return xs.to_vec();
    }
    let (left, right) = xs.split_at(xs.len() / 2);
    merge_by(&merge_sort_by(left, cmp), &merge_sort_by(right, cmp), cmp)
}

/// Bottom-up merge sort: start with singletons and repeatedly pair-merge.
pub fn merge_sort_fold<T: Ord + Clone>(xs: &[T]) -> Vec<T> {
    if xs.is_empty() {
        return Vec::new();
    }
    let mut runs: Vec<Vec<T>> = xs.iter().map(|x| vec![x.clone()]).collect();
    while runs.len() > 1 {
        runs = runs
            .chunks(2)
            .map(|pair| match pair {
                [a, b] => merge(a, b),
                [a] => a.clone(),
                _ => unreachable!(),
            })
            .collect();
    }
    runs.pop().unwrap_or_default()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn sorts_typical_list() {
        assert_eq!(
            merge_sort(&[5, 3, 8, 1, 9, 2, 7]),
            vec![1, 2, 3, 5, 7, 8, 9]
        );
    }

    #[test]
    fn handles_edge_cases() {
        assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort(&[1]), vec![1]);
        assert_eq!(merge_sort(&[2, 1]), vec![1, 2]);
    }

    #[test]
    fn preserves_duplicates() {
        assert_eq!(merge_sort(&[3, 1, 3, 2, 1, 2]), vec![1, 1, 2, 2, 3, 3]);
    }

    #[test]
    fn merge_combines_two_sorted_slices() {
        assert_eq!(merge(&[1, 4, 7], &[2, 3, 8]), vec![1, 2, 3, 4, 7, 8]);
        assert_eq!(merge::<i32>(&[], &[1, 2]), vec![1, 2]);
        assert_eq!(merge(&[1, 2], &[]), vec![1, 2]);
    }

    #[test]
    fn ascending_with_default_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| a.cmp(b)),
            vec![1, 3, 5, 8]
        );
    }

    #[test]
    fn descending_with_reversed_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| b.cmp(a)),
            vec![8, 5, 3, 1]
        );
    }

    #[test]
    fn sorts_strings_by_length() {
        let words = ["elephant", "cat", "bumblebee", "dog"];
        let sorted = merge_sort_by(&words, |a, b| a.len().cmp(&b.len()));
        assert_eq!(sorted, vec!["cat", "dog", "elephant", "bumblebee"]);
    }

    #[test]
    fn bottom_up_matches_recursive() {
        let xs = [5, 3, 8, 1, 9, 2, 7];
        assert_eq!(merge_sort_fold(&xs), vec![1, 2, 3, 5, 7, 8, 9]);
        assert_eq!(merge_sort_fold::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort_fold(&[42]), vec![42]);
    }
}

(* 075: Merge Sort *)

(* Approach 1: Classic recursive merge sort *)
let rec merge l1 l2 =
  match l1, l2 with
  | [], l | l, [] -> l
  | x :: xs, y :: ys ->
    if x <= y then x :: merge xs (y :: ys)
    else y :: merge (x :: xs) ys

let rec split = function
  | [] -> ([], [])
  | [x] -> ([x], [])
  | x :: y :: rest ->
    let (l, r) = split rest in
    (x :: l, y :: r)

let rec merge_sort = function
  | [] -> []
  | [x] -> [x]
  | lst ->
    let (l, r) = split lst in
    merge (merge_sort l) (merge_sort r)

(* Approach 2: Generic merge sort with comparator *)
let rec merge_by cmp l1 l2 =
  match l1, l2 with
  | [], l | l, [] -> l
  | x :: xs, y :: ys ->
    if cmp x y <= 0 then x :: merge_by cmp xs (y :: ys)
    else y :: merge_by cmp (x :: xs) ys

let rec merge_sort_by cmp = function
  | [] -> []
  | [x] -> [x]
  | lst ->
    let (l, r) = split lst in
    merge_by cmp (merge_sort_by cmp l) (merge_sort_by cmp r)

(* Approach 3: Bottom-up style using fold *)
let merge_sort_fold lst =
  let singletons = List.map (fun x -> [x]) lst in
  let rec reduce = function
    | [] -> []
    | [x] -> x
    | pairs ->
      let rec pair_merge = function
        | [] -> []
        | [x] -> [x]
        | x :: y :: rest -> merge x y :: pair_merge rest
      in
      reduce (pair_merge pairs)
  in
  reduce singletons

(* Tests *)
let () =
  assert (merge_sort [5; 3; 8; 1; 9; 2; 7] = [1; 2; 3; 5; 7; 8; 9]);
  assert (merge_sort [] = []);
  assert (merge_sort [1] = [1]);
  assert (merge_sort [2; 1] = [1; 2]);
  assert (merge_sort_by compare [5; 3; 8; 1] = [1; 3; 5; 8]);
  assert (merge_sort_by (fun a b -> compare b a) [5; 3; 8; 1] = [8; 5; 3; 1]);
  assert (merge_sort_fold [5; 3; 8; 1; 9; 2; 7] = [1; 2; 3; 5; 7; 8; 9]);
  Printf.printf "✓ All tests passed\n"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn sorts_typical_list() {
        assert_eq!(
            merge_sort(&[5, 3, 8, 1, 9, 2, 7]),
            vec![1, 2, 3, 5, 7, 8, 9]
        );
    }

    #[test]
    fn handles_edge_cases() {
        assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort(&[1]), vec![1]);
        assert_eq!(merge_sort(&[2, 1]), vec![1, 2]);
    }

    #[test]
    fn preserves_duplicates() {
        assert_eq!(merge_sort(&[3, 1, 3, 2, 1, 2]), vec![1, 1, 2, 2, 3, 3]);
    }

    #[test]
    fn merge_combines_two_sorted_slices() {
        assert_eq!(merge(&[1, 4, 7], &[2, 3, 8]), vec![1, 2, 3, 4, 7, 8]);
        assert_eq!(merge::<i32>(&[], &[1, 2]), vec![1, 2]);
        assert_eq!(merge(&[1, 2], &[]), vec![1, 2]);
    }

    #[test]
    fn ascending_with_default_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| a.cmp(b)),
            vec![1, 3, 5, 8]
        );
    }

    #[test]
    fn descending_with_reversed_comparator() {
        assert_eq!(
            merge_sort_by(&[5, 3, 8, 1], |a: &i32, b: &i32| b.cmp(a)),
            vec![8, 5, 3, 1]
        );
    }

    #[test]
    fn sorts_strings_by_length() {
        let words = ["elephant", "cat", "bumblebee", "dog"];
        let sorted = merge_sort_by(&words, |a, b| a.len().cmp(&b.len()));
        assert_eq!(sorted, vec!["cat", "dog", "elephant", "bumblebee"]);
    }

    #[test]
    fn bottom_up_matches_recursive() {
        let xs = [5, 3, 8, 1, 9, 2, 7];
        assert_eq!(merge_sort_fold(&xs), vec![1, 2, 3, 5, 7, 8, 9]);
        assert_eq!(merge_sort_fold::<i32>(&[]), Vec::<i32>::new());
        assert_eq!(merge_sort_fold(&[42]), vec![42]);
    }
}

Deep Comparison

Core Insight

Merge sort: split into halves, recursively sort each half, merge sorted halves. O(n log n) guaranteed. The functional version is particularly elegant because merging sorted lists is a natural recursive operation.

OCaml Approach

• Pattern match on list to split

• Recursive merge of two sorted lists

• No mutation — creates new lists at each step

Rust Approach

• Split slice at midpoint

• Recursive sort on sub-slices

• Merge into new Vec — or use sort() (introsort)

Comparison Table

Feature	OCaml	Rust
Split	Pattern match / `List.nth`	`split_at(mid)`
Merge	Recursive cons	Push to Vec
In-place	No (functional)	Possible but complex
Built-in	`List.sort`	`.sort()` (introsort)

Exercises

In-place merge sort: Implement an in-place merge sort for &mut Vec<i32> using rotate_left for the merge step. This avoids cloning but is O(n² log n) due to rotation cost.

External sort: Merge sort naturally extends to external sorting (data too large for RAM). Design an algorithm that reads sorted chunks from disk and merges them. What data structure manages the merge heap?

Natural merge sort: Detect pre-sorted runs and use them as the base for merging (this is the core of TimSort). Implement timsort_lite that detects ascending runs and merges them.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust