๐Ÿฆ€ Functional Rust

732: Benchmarking Harness

Difficulty: 3 Level: Advanced Build a Criterion-style micro-benchmark harness from `std` โ€” warm up, iterate, `black_box`, and report statistics.

The Problem This Solves

Running a function once and timing it tells you almost nothing useful. Your result is contaminated by instruction cache cold starts, branch predictor misses, CPU frequency scaling, OS scheduling jitter, and dynamic compiler optimisations. A single measurement is noise, not signal. A proper benchmark harness addresses all of these. Warmup runs prime the instruction cache and branch predictor. Many iterations amortise scheduling jitter. `std::hint::black_box` prevents the compiler from optimising away the computation it's supposed to measure โ€” without it, LLVM may realise the result is never used and delete the entire benchmark body. Statistical reporting (mean, standard deviation, min/max) gives you a signal-to-noise picture: a low standard deviation means consistent results you can trust; a high one means something else is interfering. Understanding how benchmarking frameworks work internally is essential before reaching for `criterion` or `divan`. You need to know why `black_box` is non-optional, why warmup matters, and what the reported numbers actually mean.

The Intuition

Measuring performance is like weighing yourself: you don't just step on the scale once and declare the number accurate. You step on multiple times, discard the outliers, and average the rest. The scale (timer) has noise. Your weight (computation) has natural variation. `black_box` is making sure you're actually on the scale โ€” not that the scale is printing a cached number from yesterday. The `Instant::now()` / `elapsed()` pair is `std`'s monotonic wall-clock timer. Wrapping the measured function in `black_box` ensures its inputs and outputs are treated as opaque โ€” the compiler cannot eliminate or constant-fold through the measurement.

How It Works in Rust

use std::hint::black_box;
use std::time::{Duration, Instant};

fn bench<T, F: FnMut() -> T>(label: &str, iters: u64, mut f: F) -> Duration {
 // Warmup: prime caches and branch predictor
 for _ in 0..iters / 10 {
     black_box(f());
 }

 let start = Instant::now();
 for _ in 0..iters {
     black_box(f());  // black_box: result treated as observable
 }
 let total = start.elapsed();

 let mean = total / iters as u32;
 println!("{label}: mean={mean:?} (n={iters})");
 total
}

// Usage:
bench("sum_squares(1000)", 10_000, || {
 black_box(sum_squares(black_box(1000u64)))
});
Standard deviation over N samples: collect each iteration's duration into a `Vec<f64>` of nanoseconds, compute `mean`, then `sqrt(sum((x - mean)^2) / n)`. High `ฯƒ` relative to mean โ†’ noisy environment or pathological input dependency.

What This Unlocks

Key Differences

ConceptOCamlRust
Wall-clock timer`Unix.gettimeofday``std::time::Instant::now()`
Prevent dead-code elimination`Sys.opaque_identity``std::hint::black_box`
Benchmark framework`Core_bench``criterion`, `divan`
Warmup phaseManualPart of harness design
Statistical reporting`Core_bench` columnsCustom mean/stddev from raw durations
Closure as workload`fun () -> expr``{ black_box(expr) }`
/// 732: Benchmarking Harness โ€” Criterion-style, std-only

use std::hint::black_box;
use std::time::{Duration, Instant};

// โ”€โ”€ Core Harness โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

struct BenchResult {
    label: &'static str,
    mean:  Duration,
    min:   Duration,
    max:   Duration,
    stddev_ns: f64,
    iters: u64,
}

impl BenchResult {
    fn print(&self) {
        println!(
            "{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} ฯƒ={:.0}ns  (n={})",
            self.label,
            self.mean,
            self.min,
            self.max,
            self.stddev_ns,
            self.iters,
        );
    }
}

fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
where
    F: FnMut() -> R,
{
    // Warmup โ€” fill CPU caches, allow CPU to ramp up frequency
    for _ in 0..warmup {
        black_box(f());
    }

    let mut samples = Vec::with_capacity(iters as usize);

    for _ in 0..iters {
        let t0 = Instant::now();
        let result = f();
        let elapsed = t0.elapsed();
        black_box(result);          // prevent dead-code elimination
        samples.push(elapsed);
    }

    let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
    let mean_ns = total_ns / iters as u128;
    let mean = Duration::from_nanos(mean_ns as u64);

    let min = *samples.iter().min().unwrap();
    let max = *samples.iter().max().unwrap();

    let variance_ns: f64 = samples
        .iter()
        .map(|d| {
            let diff = d.as_nanos() as f64 - mean_ns as f64;
            diff * diff
        })
        .sum::<f64>()
        / iters as f64;

    BenchResult {
        label,
        mean,
        min,
        max,
        stddev_ns: variance_ns.sqrt(),
        iters,
    }
}

// โ”€โ”€ Functions to Benchmark โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

fn sum_naive(n: u64) -> u64 {
    (0..n).sum()
}

fn sum_formula(n: u64) -> u64 {
    n * (n - 1) / 2
}

fn string_push(n: usize) -> String {
    let mut s = String::with_capacity(n);
    for _ in 0..n { s.push('x'); }
    s
}

fn vec_collect(n: usize) -> Vec<u64> {
    (0..n as u64).collect()
}

fn main() {
    println!("{:=<80}", "");
    println!(" Benchmarking Harness (std-only) โ€” 100 iters, 20 warmup");
    println!("{:=<80}", "");

    let warmup = 20;
    let iters = 100;

    bench("sum_naive(10_000)", warmup, iters, || sum_naive(black_box(10_000))).print();
    bench("sum_formula(10_000)", warmup, iters, || sum_formula(black_box(10_000))).print();
    bench("string_push(1_000)", warmup, iters, || string_push(black_box(1_000))).print();
    bench("vec_collect(1_000)", warmup, iters, || vec_collect(black_box(1_000))).print();

    println!("{:=<80}", "");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn bench_runs_warmup_and_iters() {
        let mut call_count = 0u64;
        let result = bench("test", 5, 10, || { call_count += 1; call_count });
        // warmup(5) + iters(10) = 15 calls
        assert_eq!(call_count, 15);
        assert_eq!(result.iters, 10);
    }

    #[test]
    fn bench_min_le_mean_le_max() {
        let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
        assert!(r.min <= r.mean);
        assert!(r.mean <= r.max);
    }

    #[test]
    fn sum_naive_correct() {
        assert_eq!(sum_naive(5), 10);   // 0+1+2+3+4
        assert_eq!(sum_naive(0), 0);
    }

    #[test]
    fn sum_formula_matches_naive() {
        for n in 1..=20u64 {
            assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
        }
    }
}
(* 732: Benchmarking Harness โ€” OCaml stdlib version *)

let time_ns () =
  let t = Unix.gettimeofday () in
  Int64.of_float (t *. 1e9)

let benchmark ?(warmup=10) ?(iters=1000) label f =
  (* Warmup *)
  for _ = 1 to warmup do ignore (f ()) done;
  (* Measure *)
  let times = Array.init iters (fun _ ->
    let t0 = Unix.gettimeofday () in
    ignore (f ());
    let t1 = Unix.gettimeofday () in
    (t1 -. t0) *. 1e6  (* microseconds *)
  ) in
  (* Stats *)
  let n = float_of_int iters in
  let mean = Array.fold_left (+.) 0.0 times /. n in
  let variance = Array.fold_left (fun acc t ->
    let d = t -. mean in acc +. d *. d) 0.0 times /. n in
  let stddev = Float.sqrt variance in
  Printf.printf "%-30s mean=%.2fยตs stddev=%.2fยตs\n" label mean stddev

let () =
  (* Example: benchmark list creation *)
  benchmark "List.init 1000" (fun () ->
    List.init 1000 (fun i -> i * i));
  benchmark "String.concat" (fun () ->
    String.concat "," (List.init 100 string_of_int))