723 Advanced

Branchless Programming

Functional Programming

Tutorial

The Problem

Modern CPUs speculatively execute both sides of a branch before knowing which path is taken, relying on a branch predictor trained on past behavior. When the prediction is correct, branches are nearly free. When the predictor is wrong—either because the branch outcome is data-dependent and unpredictable, or because the branch alternates irregularly—a misprediction flushes the pipeline and costs 10–20 cycles. For tight loops over large, unsorted data (sorting networks, cryptography, parsers, games), branch mispredictions can dominate execution time.

Branchless programming replaces conditional jumps with arithmetic: boolean conditions are converted to 0/1 integers, and the desired value is selected via multiplication or bitwise masking. The CPU computes both outcomes and selects one without speculating. The technique is standard in constant-time cryptographic code (preventing timing side channels), SIMD kernels, and performance-critical inner loops.

🎯 Learning Outcomes

• Explain branch prediction, misprediction penalties, and pipeline stalls

• Convert conditional expressions to arithmetic using sign-extension masks

• Implement branchless min, max, abs, clamp, and select for integers

• Recognize when the compiler already generates branchless code (check with objdump)

• Understand constant-time programming and why branches leak secret data

Code Example

#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }

#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }

#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }

(* Branchless min: arithmetic right-shift produces 0 or all-ones mask *)
let min_branchless (a : int) (b : int) =
  let diff = a - b in
  b + (diff land (diff asr 62))

(* Branchless abs using sign-mask technique *)
let abs_branchless (x : int) =
  let mask = x asr 62 in
  (x + mask) lxor mask

(* Clamp via composition — no branches *)
let clamp_branchless lo hi x =
  min_branchless hi (max_branchless lo x)

Key Differences

Aspect	Rust	OCaml
Integer representation	Untagged `i64`/`i32`	Tagged 63-bit int
Arithmetic right shift	`>> 63` fills all bits	`asr 62` needed (tag bit)
Compiler branchless output	`a.min(b)` often branchless	Standard `min` may branch
Constant-time stdlib	`subtle` crate	External C (`Hacl-star`)
SIMD branchless blend	`_mm_blendv_epi8` via `std::arch`	Requires C FFI

OCaml Approach

OCaml integers are tagged (63-bit on 64-bit systems), so arithmetic right-shift tricks require care with the tag bit. The standard library uses branching min/max. For constant-time crypto, OCaml code typically relies on external C via FFI:

(* Tagged int: arithmetic shift includes tag bit — avoid for branchless tricks *)
let branchless_min_unsafe a b =
  (* Only safe if a and b are untagged ints (Bigarray, Bytes) *)
  let diff = a - b in
  let mask = diff asr 62 in   (* asr 62 to avoid tag bit *)
  b + (mask land diff)

(* Safe OCaml — uses compare, may branch *)
let safe_min a b = if a < b then a else b

For constant-time code, OCaml projects use Hacl-star bindings or C stubs.

Full Source

//! # Branchless Programming
//!
//! Demonstrates replacing conditional branches with arithmetic operations to
//! avoid branch misprediction penalties on modern out-of-order CPUs.
//!
//! The core technique: arithmetic right-shift on signed integers sign-extends
//! the MSB to all bits, producing a bitmask of 0 (if positive) or -1 (if negative).
//! This mask selects between two precomputed values without a branch.

// ── Branchless integer min/max ────────────────────────────────────────────────

/// Branchless min using arithmetic right-shift bitmask.
///
/// `(a - b) >> 63` produces 0x0000...0000 when a >= b,
/// or 0xFFFF...FFFF when a < b. ANDing `diff` with this mask
/// gives either 0 or `diff`, which we add to `b`.
///
/// **Note:** inputs must not differ by more than `i64::MAX` (i.e., both values
/// should lie in the same signed half-range) to avoid wrapping in the final add.
/// For general use, prefer `i64::min` which LLVM compiles to a CMOV.
#[inline(always)]
pub fn min_branchless(a: i64, b: i64) -> i64 {
    let diff = a.wrapping_sub(b);
    let mask = diff >> 63; // arithmetic shift: 0 or -1 (all bits set)
    b.wrapping_add(diff & mask)
}

/// Branchless max using arithmetic right-shift bitmask.
///
/// Same overflow caveat as `min_branchless`.
#[inline(always)]
pub fn max_branchless(a: i64, b: i64) -> i64 {
    let diff = a.wrapping_sub(b);
    let mask = diff >> 63;
    a.wrapping_sub(diff & mask)
}

/// Idiomatic Rust min — LLVM compiles this to a CMOV (conditional move)
/// instruction on x86-64, which is already branchless at the hardware level.
#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 {
    a.min(b)
}

/// Idiomatic Rust max — same CMOV story as min_idiomatic.
#[inline(always)]
pub fn max_idiomatic(a: i64, b: i64) -> i64 {
    a.max(b)
}

// ── Clamp ─────────────────────────────────────────────────────────────────────

/// Branchless clamp: compose min/max without any conditional jumps.
#[inline(always)]
pub fn clamp_branchless(lo: i64, hi: i64, x: i64) -> i64 {
    min_branchless(hi, max_branchless(lo, x))
}

/// Idiomatic Rust clamp — also CMOV-friendly via LLVM.
#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 {
    x.clamp(lo, hi)
}

// ── Absolute value ────────────────────────────────────────────────────────────

/// Branchless absolute value using the sign-mask technique.
///
/// `mask = x >> 63` is 0 for non-negative, -1 for negative.
/// `(x + mask) ^ mask` negates x when negative, leaves it alone otherwise.
#[inline(always)]
pub fn abs_branchless(x: i64) -> i64 {
    let mask = x >> 63;
    (x + mask) ^ mask
}

/// Idiomatic Rust abs — wrapping_abs avoids UB on i64::MIN.
#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 {
    x.wrapping_abs()
}

// ── Select without branch ─────────────────────────────────────────────────────

/// Return `a` if `cond` is true, `b` otherwise — no branch.
///
/// `cond as i64` is 0 or 1; negating gives 0 or -1 (the mask).
#[inline(always)]
pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
    let mask = -(cond as i64); // 0 or -1
    (a & mask) | (b & !mask)
}

// ── Sign function ─────────────────────────────────────────────────────────────

/// Returns -1, 0, or 1 depending on the sign of x, branchlessly.
#[inline(always)]
pub fn sign_branchless(x: i64) -> i64 {
    // positive: (x > 0) as i64 = 1, -(x < 0) as i64 = 0  → 1
    // negative: (x > 0) as i64 = 0, -(x < 0) as i64 = -1 → -1
    // zero:     both 0 → 0
    (x > 0) as i64 - (x < 0) as i64
}

#[cfg(test)]
mod tests {
    use super::*;

    // ── min ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_min_branchless_basic() {
        assert_eq!(min_branchless(3, 7), 3);
        assert_eq!(min_branchless(7, 3), 3);
        assert_eq!(min_branchless(5, 5), 5);
    }

    #[test]
    fn test_min_branchless_negatives() {
        assert_eq!(min_branchless(-10, -3), -10);
        assert_eq!(min_branchless(-3, -10), -10);
        assert_eq!(min_branchless(-1, 1), -1);
    }

    #[test]
    fn test_min_matches_idiomatic() {
        // Pairs must not differ by more than i64::MAX (documented limitation of
        // the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
        let pairs = [
            (0_i64, 0),
            (42, -42),
            (-100, 100),
            (i64::MAX, i64::MAX - 1),
            (i64::MIN + 1, i64::MIN),
        ];
        for (a, b) in pairs {
            assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
        }
    }

    // ── max ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_max_branchless_basic() {
        assert_eq!(max_branchless(3, 7), 7);
        assert_eq!(max_branchless(7, 3), 7);
        assert_eq!(max_branchless(5, 5), 5);
    }

    #[test]
    fn test_max_matches_idiomatic() {
        // Same range restriction as min_branchless.
        let pairs = [
            (0_i64, 0),
            (42, -42),
            (i64::MAX, i64::MAX - 1),
            (i64::MIN + 1, i64::MIN),
        ];
        for (a, b) in pairs {
            assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
        }
    }

    // ── clamp ─────────────────────────────────────────────────────────────────

    #[test]
    fn test_clamp_within_range() {
        assert_eq!(clamp_branchless(0, 10, 5), 5);
    }

    #[test]
    fn test_clamp_below_lo() {
        assert_eq!(clamp_branchless(0, 10, -5), 0);
    }

    #[test]
    fn test_clamp_above_hi() {
        assert_eq!(clamp_branchless(0, 10, 15), 10);
    }

    #[test]
    fn test_clamp_matches_idiomatic() {
        let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
        for (lo, hi, x) in cases {
            assert_eq!(
                clamp_branchless(lo, hi, x),
                clamp_idiomatic(lo, hi, x),
                "clamp({lo}, {hi}, {x})"
            );
        }
    }

    // ── abs ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_abs_branchless_positive() {
        assert_eq!(abs_branchless(42), 42);
    }

    #[test]
    fn test_abs_branchless_negative() {
        assert_eq!(abs_branchless(-42), 42);
    }

    #[test]
    fn test_abs_branchless_zero() {
        assert_eq!(abs_branchless(0), 0);
    }

    #[test]
    fn test_abs_matches_idiomatic() {
        for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
            assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
        }
    }

    // ── select ────────────────────────────────────────────────────────────────

    #[test]
    fn test_select_true() {
        assert_eq!(select_branchless(true, 100, 200), 100);
    }

    #[test]
    fn test_select_false() {
        assert_eq!(select_branchless(false, 100, 200), 200);
    }

    // ── sign ──────────────────────────────────────────────────────────────────

    #[test]
    fn test_sign_positive() {
        assert_eq!(sign_branchless(42), 1);
    }

    #[test]
    fn test_sign_negative() {
        assert_eq!(sign_branchless(-42), -1);
    }

    #[test]
    fn test_sign_zero() {
        assert_eq!(sign_branchless(0), 0);
    }
}

(* OCaml: Branchless programming patterns *)

(* --- min/max --- *)
(* OCaml's `min` uses polymorphic comparison which can branch.
   For ints, the compiler may or may not emit CMOV. *)

let min_branch a b = if a < b then a else b
let max_branch a b = if a > b then a else b

(* Branchless integer min using arithmetic *)
let min_branchless (a : int) (b : int) =
  (* b + ((a - b) land ((a - b) asr 62)) — uses arithmetic right shift *)
  let diff = a - b in
  b + (diff land (diff asr 62))

let max_branchless (a : int) (b : int) =
  let diff = a - b in
  a - (diff land (diff asr 62))

(* --- clamp --- *)
let clamp_branch lo hi x =
  if x < lo then lo else if x > hi then hi else x

let clamp_branchless lo hi x =
  min_branchless hi (max_branchless lo x)

(* --- absolute value --- *)
let abs_branchless (x : int) =
  let mask = x asr 62 in  (* all-ones if negative, all-zeros if positive *)
  (x + mask) lxor mask

(* --- select without branch --- *)
(* Select a if cond else b, where cond is 0 or 1 *)
let select cond a b =
  (* Branchless: (a - b) * cond + b *)
  let mask = -cond in   (* 0 -> 0, 1 -> -1 (all-ones) *)
  (a land mask) lor (b land (lnot mask))

(* --- Benchmark --- *)
let time_it label f =
  let t0 = Sys.time () in
  let r = f () in
  Printf.printf "%s: %.6fs result=%d\n" label (Sys.time () -. t0) r;
  r

let () =
  let n = 10_000_000 in
  let data = Array.init n (fun i -> (i * 1234567 + 89) mod 1000) in

  let _r1 = time_it "min_branch" (fun () ->
    Array.fold_left (fun acc x -> min_branch acc x) max_int data) in
  let _r2 = time_it "min_branchless" (fun () ->
    Array.fold_left (fun acc x -> min_branchless acc x) max_int data) in

  (* Basic correctness *)
  assert (min_branchless 3 5 = 3);
  assert (min_branchless 7 2 = 2);
  assert (max_branchless 3 5 = 5);
  assert (abs_branchless (-42) = 42);
  assert (abs_branchless 42 = 42);
  assert (clamp_branchless 0 100 (-5) = 0);
  assert (clamp_branchless 0 100 150 = 100);
  assert (clamp_branchless 0 100 50 = 50);
  assert (select 1 10 20 = 10);
  assert (select 0 10 20 = 20);
  Printf.printf "All assertions passed.\n"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    // ── min ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_min_branchless_basic() {
        assert_eq!(min_branchless(3, 7), 3);
        assert_eq!(min_branchless(7, 3), 3);
        assert_eq!(min_branchless(5, 5), 5);
    }

    #[test]
    fn test_min_branchless_negatives() {
        assert_eq!(min_branchless(-10, -3), -10);
        assert_eq!(min_branchless(-3, -10), -10);
        assert_eq!(min_branchless(-1, 1), -1);
    }

    #[test]
    fn test_min_matches_idiomatic() {
        // Pairs must not differ by more than i64::MAX (documented limitation of
        // the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
        let pairs = [
            (0_i64, 0),
            (42, -42),
            (-100, 100),
            (i64::MAX, i64::MAX - 1),
            (i64::MIN + 1, i64::MIN),
        ];
        for (a, b) in pairs {
            assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
        }
    }

    // ── max ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_max_branchless_basic() {
        assert_eq!(max_branchless(3, 7), 7);
        assert_eq!(max_branchless(7, 3), 7);
        assert_eq!(max_branchless(5, 5), 5);
    }

    #[test]
    fn test_max_matches_idiomatic() {
        // Same range restriction as min_branchless.
        let pairs = [
            (0_i64, 0),
            (42, -42),
            (i64::MAX, i64::MAX - 1),
            (i64::MIN + 1, i64::MIN),
        ];
        for (a, b) in pairs {
            assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
        }
    }

    // ── clamp ─────────────────────────────────────────────────────────────────

    #[test]
    fn test_clamp_within_range() {
        assert_eq!(clamp_branchless(0, 10, 5), 5);
    }

    #[test]
    fn test_clamp_below_lo() {
        assert_eq!(clamp_branchless(0, 10, -5), 0);
    }

    #[test]
    fn test_clamp_above_hi() {
        assert_eq!(clamp_branchless(0, 10, 15), 10);
    }

    #[test]
    fn test_clamp_matches_idiomatic() {
        let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
        for (lo, hi, x) in cases {
            assert_eq!(
                clamp_branchless(lo, hi, x),
                clamp_idiomatic(lo, hi, x),
                "clamp({lo}, {hi}, {x})"
            );
        }
    }

    // ── abs ───────────────────────────────────────────────────────────────────

    #[test]
    fn test_abs_branchless_positive() {
        assert_eq!(abs_branchless(42), 42);
    }

    #[test]
    fn test_abs_branchless_negative() {
        assert_eq!(abs_branchless(-42), 42);
    }

    #[test]
    fn test_abs_branchless_zero() {
        assert_eq!(abs_branchless(0), 0);
    }

    #[test]
    fn test_abs_matches_idiomatic() {
        for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
            assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
        }
    }

    // ── select ────────────────────────────────────────────────────────────────

    #[test]
    fn test_select_true() {
        assert_eq!(select_branchless(true, 100, 200), 100);
    }

    #[test]
    fn test_select_false() {
        assert_eq!(select_branchless(false, 100, 200), 200);
    }

    // ── sign ──────────────────────────────────────────────────────────────────

    #[test]
    fn test_sign_positive() {
        assert_eq!(sign_branchless(42), 1);
    }

    #[test]
    fn test_sign_negative() {
        assert_eq!(sign_branchless(-42), -1);
    }

    #[test]
    fn test_sign_zero() {
        assert_eq!(sign_branchless(0), 0);
    }
}

Deep Comparison

OCaml vs Rust: Branchless Programming

Side-by-Side Code

OCaml

(* Branchless min: arithmetic right-shift produces 0 or all-ones mask *)
let min_branchless (a : int) (b : int) =
  let diff = a - b in
  b + (diff land (diff asr 62))

(* Branchless abs using sign-mask technique *)
let abs_branchless (x : int) =
  let mask = x asr 62 in
  (x + mask) lxor mask

(* Clamp via composition — no branches *)
let clamp_branchless lo hi x =
  min_branchless hi (max_branchless lo x)

Rust (idiomatic — LLVM emits CMOV)

#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }

#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }

#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }

Rust (explicit branchless bitmask)

#[inline(always)]
pub fn min_branchless(a: i64, b: i64) -> i64 {
    let diff = a.wrapping_sub(b);
    let mask = diff >> 63; // 0 or -1 (all bits set)
    b + (diff & mask)
}

#[inline(always)]
pub fn abs_branchless(x: i64) -> i64 {
    let mask = x >> 63;
    (x + mask) ^ mask
}

#[inline(always)]
pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
    let mask = -(cond as i64); // 0 or -1
    (a & mask) | (b & !mask)
}

Type Signatures

Concept	OCaml	Rust
Integer type	`int` (63-bit on 64-bit platform)	`i64` (explicit 64-bit)
Arithmetic right-shift	`asr` operator	`>> 63` (arithmetic on signed)
Bitwise AND	`land`	`&`
Bitwise XOR	`lxor`	`^`
Safe subtraction	`-` (may overflow on edge cases)	`wrapping_sub` (defined overflow)
Abs	`abs` (stdlib, may branch)	`wrapping_abs()`
Clamp	custom composition	`x.clamp(lo, hi)`

Key Insights

Arithmetic right-shift is the core primitive. Both OCaml (asr) and Rust (>> 63 on i64) sign-extend the MSB to produce a bitmask of 0 or -1. The difference: OCaml's int is 63-bit, so asr 62 is used; Rust's i64 is 64-bit, so >> 63 is correct.

Rust's idiomatic path is already branchless. LLVM compiles a.min(b), x.clamp(lo, hi), and x.wrapping_abs() to CMOV (conditional move) instructions on x86-64 — no conditional jump, no pipeline flush. The explicit bitmask technique is a teaching tool and a fallback for targets where CMOV is unavailable.

**wrapping_sub vs bare subtraction.** OCaml integers are 63-bit and overflow is technically undefined in some contexts; Rust makes the overflow semantics explicit with wrapping_sub, which is essential for the bitmask trick to be well-defined.

**select_branchless generalises the pattern.** Casting bool to i64 gives 0 or 1; negating gives 0 or -1. This is a portable, safe way to implement conditional selection without a branch — useful when the compiler fails to emit a CMOV on its own.

When branchy code wins. Branchless code always executes both computation paths. For data where one branch dominates (e.g., 99% positive inputs for abs), the CPU branch predictor wins and branchless is slower. For truly unpredictable data (random min/max of arbitrary inputs), branchless eliminates misprediction stalls and wins decisively.

When to Use Each Style

**Use idiomatic Rust (a.min(b), x.clamp(lo, hi)):** Always as the default — readable, safe, and LLVM produces CMOV. Profile before reaching for explicit bitmasks.

Use explicit branchless bitmasks when: Profiling confirms misprediction stalls, the target architecture lacks CMOV, or you need a select primitive that the compiler fails to optimise into a conditional move.

Exercises

Verify that i64::min(a, b) compiles to a cmov (conditional move) instruction

by inspecting cargo rustc --release -- --emit asm output. Does branchless change with -C target-cpu=native?

Implement a branchless clamp(x: i64, lo: i64, hi: i64) -> i64 using two

applications of branchless_min/branchless_max.

Benchmark sorting an array of 10,000 random i32 values using sort_unstable vs

a hand-written sorting network for N=8 elements. Measure branch mispredictions with perf stat -e branch-misses.

Implement ct_select from the subtle crate interface: a Choice type wrapping

a secret boolean and a ConditionallySelectable trait for u64.

Apply the arithmetic right-shift min to a scalar loop over Vec<i32> and compare

the generated assembly with the LLVM auto-vectorized version.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust