🦀 Functional Rust

453: Memory Ordering — Acquire, Release, SeqCst

Difficulty: 3 Level: Intermediate Control how atomic operations synchronise visibility across CPU cores — the difference between a correct lock-free algorithm and one that silently fails on ARM.

The Problem This Solves

Atomic operations prevent torn reads and lost writes on a single memory location. But they don't, by themselves, say anything about other memory. If thread A writes data to a slice and then sets a flag with `store(true, Relaxed)`, thread B might see the flag as `true` but still see the old (pre-write) data. This isn't a bug in the atomic — it's the CPU reordering memory operations for performance. Modern CPUs (especially ARM) reorder reads and writes aggressively. This breaks the pattern at the core of every lock-free algorithm: "write data first, then publish a flag; reader sees flag, then reads data." Without ordering constraints, the reader might observe the flag before the data writes are visible. On x86 this is rare because x86 has a relatively strong memory model. On ARM (your phone, your Raspberry Pi, AWS Graviton) it happens regularly. The fix is not a Mutex — the whole point of atomics is to avoid locks. The fix is `Release`/`Acquire` ordering: `store(true, Ordering::Release)` says "all my writes before this store are visible to any thread that does `load(Ordering::Acquire)` on the same location and sees `true`". This pair is the fundamental synchronisation primitive underneath every lock in every language.

The Intuition

Think of memory ordering as a contract between threads: A useful mental model: `Release` publishes. `Acquire` subscribes. Once you see the published value, you're guaranteed to see all the data that was prepared before publication.

How It Works in Rust

use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

let data: Arc<Vec<AtomicUsize>> =
 Arc::new((0..10).map(|_| AtomicUsize::new(0)).collect());
let ready = Arc::new(AtomicBool::new(false));

// Producer: write data, then publish with Release
let (dw, rw) = (Arc::clone(&data), Arc::clone(&ready));
let producer = thread::spawn(move || {
 for (i, cell) in dw.iter().enumerate() {
     cell.store(i * i, Ordering::Relaxed); // fine — data writes are Relaxed
 }
 // Release: all writes above are now visible to any Acquire load that sees true
 rw.store(true, Ordering::Release);
});

// Consumer: spin on Acquire, then read data
let (dr, rr) = (Arc::clone(&data), Arc::clone(&ready));
let consumer = thread::spawn(move || {
 while !rr.load(Ordering::Acquire) { // Acquire: if true, data is visible
     thread::yield_now();
 }
 // Safe to read — Release/Acquire pair guarantees data visibility
 let sum: usize = dr.iter().map(|c| c.load(Ordering::Relaxed)).sum();
 println!("sum = {}", sum);
});

producer.join().unwrap();
consumer.join().unwrap();
For most application-level code: use `SeqCst` everywhere and never think about ordering. The performance cost of `SeqCst` over `Release`/`Acquire` is small in practice and the correctness benefit is large. Switch to `Acquire`/`Release` only when profiling shows atomic contention as a bottleneck.

What This Unlocks

Key Differences

ConceptOCamlRust
Relaxedno direct equivalent (OCaml 5 is sequentially consistent)`Ordering::Relaxed` — fastest, no cross-thread guarantees
Releaseimplicit in OCaml 5`Ordering::Release` on stores — publishes prior writes
Acquireimplicit in OCaml 5`Ordering::Acquire` on loads — sees all writes before Release
SeqCstOCaml 5 default`Ordering::SeqCst` — total order, safest, ~same cost as Acquire on x86
When to use RelaxedN/Aisolated counters with no data dependency
When to use SeqCstalwaysdefault for correctness; optimise to Acq/Rel only if needed
// 453. Memory ordering: Relaxed, Acquire, Release
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    // Release/Acquire: guard data with a flag
    let data: Arc<Vec<AtomicUsize>> = Arc::new((0..10).map(|_| AtomicUsize::new(0)).collect());
    let ready = Arc::new(AtomicBool::new(false));

    let (dw,rw) = (Arc::clone(&data), Arc::clone(&ready));
    let producer = thread::spawn(move || {
        for (i,c) in dw.iter().enumerate() { c.store(i*i, Ordering::Relaxed); }
        rw.store(true, Ordering::Release);  // everything above is now visible
    });

    let (dr,rr) = (Arc::clone(&data), Arc::clone(&ready));
    let consumer = thread::spawn(move || {
        while !rr.load(Ordering::Acquire) { thread::yield_now(); }
        let sum: usize = dr.iter().map(|c| c.load(Ordering::Relaxed)).sum();
        println!("sum = {} (expected {})", sum, (0..10usize).map(|i| i*i).sum::<usize>());
    });

    producer.join().unwrap();
    consumer.join().unwrap();

    // Relaxed counter — no ordering needed between threads
    let c = Arc::new(AtomicUsize::new(0));
    let hs: Vec<_> = (0..4).map(|_|{let c=Arc::clone(&c); thread::spawn(move || { for _ in 0..1000 { c.fetch_add(1,Ordering::Relaxed); } })}).collect();
    for h in hs { h.join().unwrap(); }
    println!("relaxed counter: {}", c.load(Ordering::SeqCst));
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn test_release_acquire() {
        let d = Arc::new(AtomicUsize::new(0));
        let f = Arc::new(AtomicBool::new(false));
        let (dc,fc) = (Arc::clone(&d),Arc::clone(&f));
        thread::spawn(move || { dc.store(42,Ordering::Relaxed); fc.store(true,Ordering::Release); }).join().unwrap();
        assert!(f.load(Ordering::Acquire));
        assert_eq!(d.load(Ordering::Relaxed), 42);
    }
}
(* 453. Memory ordering – OCaml 5 note *)
(* All OCaml 5 atomics are sequentially consistent *)
let data  = Array.make 10 0
let ready = Atomic.make false

let producer () =
  Array.iteri (fun i _ -> data.(i) <- i*i) data;
  Atomic.set ready true  (* implicit Release *)

let consumer () =
  while not (Atomic.get ready) do () done;  (* implicit Acquire *)
  Printf.printf "sum=%d\n" (Array.fold_left (+) 0 data)

let () =
  let p = Domain.spawn producer in
  let c = Domain.spawn consumer in
  Domain.join p; Domain.join c