๐Ÿฆ€ Functional Rust

753: Benchmark Harness: Measuring Hot Functions

Difficulty: 3 Level: Advanced Measure function performance with warmup, per-iteration timing, and percentile statistics โ€” the pattern behind `cargo bench` and Criterion.

The Problem This Solves

"My code is slow" is not actionable. "The p99 latency of `string_alloc` is 3ร— higher than `string_prealloc`" is. Performance measurement requires more care than functional testing: CPU frequency scaling, CPU cache warmup, branch predictor state, and OS scheduling all introduce noise. A naive timing loop produces misleading results. The standard solution is warmup iterations (let the CPU reach peak frequency and fill caches), many measurement iterations (statistical stability), and percentile reporting (p99 catches tail latency that mean hides). This pattern is exactly what Criterion (the standard Rust benchmarking crate) implements. Understanding the raw pattern โ€” even without Criterion โ€” teaches you what `#[bench]` and `criterion::Criterion::bench_function()` are doing internally. For quick comparisons between two implementations, a hand-rolled harness is often sufficient. `std::hint::black_box` is critical: it prevents the compiler from optimizing away the function call entirely when the result isn't used. Without it, the benchmark measures nothing.

The Intuition

Run the function `warmup` times to stabilize CPU state. Then run it `iters` times, recording the elapsed time for each call. Sort the timings and compute percentiles: min (best case), p50 (median), p90, p99, max (worst case). The mean can be misleading if there are occasional slow outliers; p99 reveals tail latency that affects real users in production systems.

How It Works in Rust

use std::hint::black_box;   // prevents dead-code elimination
use std::time::Instant;

fn bench<F, R>(name: &str, warmup: usize, iters: usize, mut f: F) -> Stats
where F: FnMut() -> R
{
 // Warmup: allow CPU frequency scaling and cache filling
 for _ in 0..warmup { black_box(f()); }

 let mut samples = Vec::with_capacity(iters);
 for _ in 0..iters {
     let t0 = Instant::now();
     black_box(f());    // black_box prevents optimizing f() away
     samples.push(t0.elapsed());
 }
 compute_stats(samples)
}

fn compute_stats(mut samples: Vec<Duration>) -> Stats {
 samples.sort_unstable();
 let n = samples.len();
 Stats {
     mean: samples.iter().sum::<Duration>() / n as u32,
     min:  samples[0],
     p50:  samples[n * 50 / 100],
     p90:  samples[n * 90 / 100],
     p99:  samples[n * 99 / 100],
     max:  *samples.last().unwrap(),
 }
}

// Compare two implementations
bench("sum_loop(1000)",    50, 5000, || sum_loop(black_box(1000)));
bench("sum_formula(1000)", 50, 5000, || sum_formula(black_box(1000)));
For real-world benchmarking, use the `criterion` crate: it adds statistical significance testing, warm cache management, and HTML reports. The `cargo bench` command runs functions marked `#[bench]` in nightly, but Criterion works on stable.

What This Unlocks

Key Differences

ConceptOCamlRust
Prevent optimization`Sys.opaque_identity``std::hint::black_box` โ€” stable since Rust 1.66
High-res timing`Unix.gettimeofday` (ยตs)`std::time::Instant` โ€” nanosecond resolution
Benchmark framework`core_bench` (Jane Street)`criterion` crate (de facto standard)
Percentile computationManual sort + index`sort_unstable()` then indexed access โ€” same pattern
/// 753: Benchmark Harness โ€” measuring hot functions with percentiles

use std::hint::black_box;
use std::time::{Duration, Instant};

// โ”€โ”€ Statistics โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

struct Stats {
    mean:  Duration,
    min:   Duration,
    p50:   Duration,
    p90:   Duration,
    p99:   Duration,
    max:   Duration,
}

fn compute_stats(mut samples: Vec<Duration>) -> Stats {
    assert!(!samples.is_empty());
    samples.sort_unstable();
    let n = samples.len();
    let total: Duration = samples.iter().sum();
    Stats {
        mean: total / n as u32,
        min:  samples[0],
        p50:  samples[n * 50 / 100],
        p90:  samples[n * 90 / 100],
        p99:  samples[n * 99 / 100],
        max:  *samples.last().unwrap(),
    }
}

// โ”€โ”€ Harness โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

fn bench<F, R>(name: &str, warmup: usize, iters: usize, mut f: F) -> Stats
where F: FnMut() -> R
{
    // Warmup: fill CPU caches, let frequency scale up
    for _ in 0..warmup { black_box(f()); }

    let mut samples = Vec::with_capacity(iters);
    for _ in 0..iters {
        let t0 = Instant::now();
        black_box(f());
        samples.push(t0.elapsed());
    }

    let stats = compute_stats(samples);
    print_bench(name, &stats);
    stats
}

fn print_bench(name: &str, s: &Stats) {
    println!(
        "{:<35} mean={:>8.2?} min={:>8.2?} p50={:>8.2?} p90={:>8.2?} p99={:>8.2?} max={:>8.2?}",
        name, s.mean, s.min, s.p50, s.p90, s.p99, s.max
    );
}

// โ”€โ”€ Functions to benchmark โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

fn sum_loop(n: u64) -> u64 {
    let mut acc = 0u64;
    for i in 0..=n { acc = acc.wrapping_add(i); }
    acc
}

fn sum_iter(n: u64) -> u64 {
    (0..=n).sum()
}

fn sum_formula(n: u64) -> u64 {
    n.wrapping_mul(n.wrapping_add(1)) / 2
}

fn string_alloc_per_call(n: usize) -> String {
    (0..n).map(|i| format!("{}", i)).collect::<Vec<_>>().join(",")
}

fn string_prealloc(n: usize) -> String {
    let mut s = String::with_capacity(n * 4);
    for i in 0..n {
        if i > 0 { s.push(','); }
        s.push_str(&i.to_string());
    }
    s
}

fn main() {
    println!("{:=<80}", "");
    println!(" Benchmark Harness โ€” hot function comparison");
    println!("{:=<80}", "");
    println!();

    bench("sum_loop(1000)",    50, 5000, || sum_loop(black_box(1000)));
    bench("sum_iter(1000)",    50, 5000, || sum_iter(black_box(1000)));
    bench("sum_formula(1000)", 50, 5000, || sum_formula(black_box(1000)));
    println!();
    bench("string_alloc(50)",    20, 1000, || string_alloc_per_call(black_box(50)));
    bench("string_prealloc(50)", 20, 1000, || string_prealloc(black_box(50)));
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn all_sum_functions_agree() {
        for n in [0u64, 1, 10, 100, 1000] {
            let loop_r    = sum_loop(n);
            let iter_r    = sum_iter(n);
            let formula_r = sum_formula(n);
            assert_eq!(loop_r, iter_r,    "loop vs iter at n={}", n);
            assert_eq!(iter_r, formula_r, "iter vs formula at n={}", n);
        }
    }

    #[test]
    fn string_functions_produce_same_output() {
        for n in [0, 1, 5, 20] {
            let alloc = string_alloc_per_call(n);
            let pre   = string_prealloc(n);
            assert_eq!(alloc, pre, "mismatch at n={}", n);
        }
    }

    #[test]
    fn bench_runs_without_panic() {
        bench("test_bench", 5, 20, || sum_formula(black_box(42)));
    }

    #[test]
    fn stats_percentiles_ordered() {
        let stats = compute_stats(
            (0..1000).map(|i| Duration::from_nanos(i)).collect()
        );
        assert!(stats.min <= stats.p50);
        assert!(stats.p50 <= stats.p90);
        assert!(stats.p90 <= stats.p99);
        assert!(stats.p99 <= stats.max);
    }
}
(* 753: Bench Harness โ€” OCaml *)

let time_fn f =
  let t0 = Unix.gettimeofday () in
  f ();
  let t1 = Unix.gettimeofday () in
  (t1 -. t0) *. 1e9  (* nanoseconds *)

let bench_function ?(iters=10_000) name f =
  (* warmup *)
  for _ = 1 to 100 do ignore (f ()) done;
  (* measure *)
  let samples = Array.init iters (fun _ -> time_fn (fun () -> ignore (f ()))) in
  Array.sort compare samples;
  let n = float_of_int iters in
  let mean = Array.fold_left (+.) 0.0 samples /. n in
  let p50 = samples.(iters / 2) in
  let p90 = samples.(int_of_float (n *. 0.9)) in
  let p99 = samples.(int_of_float (n *. 0.99)) in
  Printf.printf "%-30s mean=%.1fns p50=%.1fns p90=%.1fns p99=%.1fns\n"
    name mean p50 p90 p99

(* Functions to benchmark *)
let sum_recursive n =
  let rec go acc i =
    if i > n then acc else go (acc + i) (i + 1)
  in go 0 0

let sum_formula n = n * (n + 1) / 2

let () =
  bench_function "sum_recursive(100)" (fun () -> sum_recursive 100);
  bench_function "sum_formula(100)"   (fun () -> sum_formula 100)