732: Benchmarking Harness
Difficulty: 3 Level: Advanced Build a Criterion-style micro-benchmark harness from `std` โ warm up, iterate, `black_box`, and report statistics.The Problem This Solves
Running a function once and timing it tells you almost nothing useful. Your result is contaminated by instruction cache cold starts, branch predictor misses, CPU frequency scaling, OS scheduling jitter, and dynamic compiler optimisations. A single measurement is noise, not signal. A proper benchmark harness addresses all of these. Warmup runs prime the instruction cache and branch predictor. Many iterations amortise scheduling jitter. `std::hint::black_box` prevents the compiler from optimising away the computation it's supposed to measure โ without it, LLVM may realise the result is never used and delete the entire benchmark body. Statistical reporting (mean, standard deviation, min/max) gives you a signal-to-noise picture: a low standard deviation means consistent results you can trust; a high one means something else is interfering. Understanding how benchmarking frameworks work internally is essential before reaching for `criterion` or `divan`. You need to know why `black_box` is non-optional, why warmup matters, and what the reported numbers actually mean.The Intuition
Measuring performance is like weighing yourself: you don't just step on the scale once and declare the number accurate. You step on multiple times, discard the outliers, and average the rest. The scale (timer) has noise. Your weight (computation) has natural variation. `black_box` is making sure you're actually on the scale โ not that the scale is printing a cached number from yesterday. The `Instant::now()` / `elapsed()` pair is `std`'s monotonic wall-clock timer. Wrapping the measured function in `black_box` ensures its inputs and outputs are treated as opaque โ the compiler cannot eliminate or constant-fold through the measurement.How It Works in Rust
use std::hint::black_box;
use std::time::{Duration, Instant};
fn bench<T, F: FnMut() -> T>(label: &str, iters: u64, mut f: F) -> Duration {
// Warmup: prime caches and branch predictor
for _ in 0..iters / 10 {
black_box(f());
}
let start = Instant::now();
for _ in 0..iters {
black_box(f()); // black_box: result treated as observable
}
let total = start.elapsed();
let mean = total / iters as u32;
println!("{label}: mean={mean:?} (n={iters})");
total
}
// Usage:
bench("sum_squares(1000)", 10_000, || {
black_box(sum_squares(black_box(1000u64)))
});
Standard deviation over N samples: collect each iteration's duration into a `Vec<f64>` of nanoseconds, compute `mean`, then `sqrt(sum((x - mean)^2) / n)`. High `ฯ` relative to mean โ noisy environment or pathological input dependency.
What This Unlocks
- Trustworthy performance numbers: Warmup + multiple iterations + `black_box` = numbers you can actually compare across code changes.
- Understanding `criterion`/`divan`: These frameworks follow the same structure โ once you've built the primitives, you understand what the libraries are doing and why.
- Catching regressions: A baseline benchmark run committed to the repo lets you detect performance regressions in CI before they reach production.
Key Differences
| Concept | OCaml | Rust | ||
|---|---|---|---|---|
| Wall-clock timer | `Unix.gettimeofday` | `std::time::Instant::now()` | ||
| Prevent dead-code elimination | `Sys.opaque_identity` | `std::hint::black_box` | ||
| Benchmark framework | `Core_bench` | `criterion`, `divan` | ||
| Warmup phase | Manual | Part of harness design | ||
| Statistical reporting | `Core_bench` columns | Custom mean/stddev from raw durations | ||
| Closure as workload | `fun () -> expr` | ` | { black_box(expr) }` |
/// 732: Benchmarking Harness โ Criterion-style, std-only
use std::hint::black_box;
use std::time::{Duration, Instant};
// โโ Core Harness โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
struct BenchResult {
label: &'static str,
mean: Duration,
min: Duration,
max: Duration,
stddev_ns: f64,
iters: u64,
}
impl BenchResult {
fn print(&self) {
println!(
"{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} ฯ={:.0}ns (n={})",
self.label,
self.mean,
self.min,
self.max,
self.stddev_ns,
self.iters,
);
}
}
fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
where
F: FnMut() -> R,
{
// Warmup โ fill CPU caches, allow CPU to ramp up frequency
for _ in 0..warmup {
black_box(f());
}
let mut samples = Vec::with_capacity(iters as usize);
for _ in 0..iters {
let t0 = Instant::now();
let result = f();
let elapsed = t0.elapsed();
black_box(result); // prevent dead-code elimination
samples.push(elapsed);
}
let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
let mean_ns = total_ns / iters as u128;
let mean = Duration::from_nanos(mean_ns as u64);
let min = *samples.iter().min().unwrap();
let max = *samples.iter().max().unwrap();
let variance_ns: f64 = samples
.iter()
.map(|d| {
let diff = d.as_nanos() as f64 - mean_ns as f64;
diff * diff
})
.sum::<f64>()
/ iters as f64;
BenchResult {
label,
mean,
min,
max,
stddev_ns: variance_ns.sqrt(),
iters,
}
}
// โโ Functions to Benchmark โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fn sum_naive(n: u64) -> u64 {
(0..n).sum()
}
fn sum_formula(n: u64) -> u64 {
n * (n - 1) / 2
}
fn string_push(n: usize) -> String {
let mut s = String::with_capacity(n);
for _ in 0..n { s.push('x'); }
s
}
fn vec_collect(n: usize) -> Vec<u64> {
(0..n as u64).collect()
}
fn main() {
println!("{:=<80}", "");
println!(" Benchmarking Harness (std-only) โ 100 iters, 20 warmup");
println!("{:=<80}", "");
let warmup = 20;
let iters = 100;
bench("sum_naive(10_000)", warmup, iters, || sum_naive(black_box(10_000))).print();
bench("sum_formula(10_000)", warmup, iters, || sum_formula(black_box(10_000))).print();
bench("string_push(1_000)", warmup, iters, || string_push(black_box(1_000))).print();
bench("vec_collect(1_000)", warmup, iters, || vec_collect(black_box(1_000))).print();
println!("{:=<80}", "");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn bench_runs_warmup_and_iters() {
let mut call_count = 0u64;
let result = bench("test", 5, 10, || { call_count += 1; call_count });
// warmup(5) + iters(10) = 15 calls
assert_eq!(call_count, 15);
assert_eq!(result.iters, 10);
}
#[test]
fn bench_min_le_mean_le_max() {
let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
assert!(r.min <= r.mean);
assert!(r.mean <= r.max);
}
#[test]
fn sum_naive_correct() {
assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
assert_eq!(sum_naive(0), 0);
}
#[test]
fn sum_formula_matches_naive() {
for n in 1..=20u64 {
assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
}
}
}
(* 732: Benchmarking Harness โ OCaml stdlib version *)
let time_ns () =
let t = Unix.gettimeofday () in
Int64.of_float (t *. 1e9)
let benchmark ?(warmup=10) ?(iters=1000) label f =
(* Warmup *)
for _ = 1 to warmup do ignore (f ()) done;
(* Measure *)
let times = Array.init iters (fun _ ->
let t0 = Unix.gettimeofday () in
ignore (f ());
let t1 = Unix.gettimeofday () in
(t1 -. t0) *. 1e6 (* microseconds *)
) in
(* Stats *)
let n = float_of_int iters in
let mean = Array.fold_left (+.) 0.0 times /. n in
let variance = Array.fold_left (fun acc t ->
let d = t -. mean in acc +. d *. d) 0.0 times /. n in
let stddev = Float.sqrt variance in
Printf.printf "%-30s mean=%.2fยตs stddev=%.2fยตs\n" label mean stddev
let () =
(* Example: benchmark list creation *)
benchmark "List.init 1000" (fun () ->
List.init 1000 (fun i -> i * i));
benchmark "String.concat" (fun () ->
String.concat "," (List.init 100 string_of_int))