453 Fundamental

453: Memory Ordering

Functional Programming

Tutorial

The Problem

Modern CPUs and compilers reorder instructions for performance. On a multi-core system, one thread's operations may appear in a different order to another thread. Memory ordering specifies the synchronization guarantees: Relaxed (no ordering guarantees), Acquire/Release (synchronized handoff between writer and reader), AcqRel (both acquire and release), SeqCst (total global order). Choosing the wrong ordering causes data races or needless performance loss. The Release-Acquire pair is the key idiom: a Release store "publishes" writes; an Acquire load "subscribes" to them.

Memory ordering is foundational to all lock-free programming, Arc's reference counting, message passing channel internals, and spinlock implementations.

🎯 Learning Outcomes

• Understand the five memory ordering modes: Relaxed, Acquire, Release, AcqRel, SeqCst

• Learn the Release-Acquire pattern: store(..., Release) and load(..., Acquire) form a happens-before edge

• See how Relaxed is sufficient for independent counters where ordering doesn't matter

• Understand why SeqCst is the safest default but has the highest cost

• Learn the C++11/20 memory model that Rust's atomics are based on

Code Example

#![allow(clippy::all)]
// 453. Memory ordering: Relaxed, Acquire, Release
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_release_acquire() {
        let d = Arc::new(AtomicUsize::new(0));
        let f = Arc::new(AtomicBool::new(false));
        let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
        thread::spawn(move || {
            dc.store(42, Ordering::Relaxed);
            fc.store(true, Ordering::Release);
        })
        .join()
        .unwrap();
        assert!(f.load(Ordering::Acquire));
        assert_eq!(d.load(Ordering::Relaxed), 42);
    }
}

(* 453. Memory ordering – OCaml 5 note *)
(* All OCaml 5 atomics are sequentially consistent *)
let data  = Array.make 10 0
let ready = Atomic.make false

let producer () =
  Array.iteri (fun i _ -> data.(i) <- i*i) data;
  Atomic.set ready true  (* implicit Release *)

let consumer () =
  while not (Atomic.get ready) do () done;  (* implicit Acquire *)
  Printf.printf "sum=%d\n" (Array.fold_left (+) 0 data)

let () =
  let p = Domain.spawn producer in
  let c = Domain.spawn consumer in
  Domain.join p; Domain.join c

Key Differences

Explicit control: Rust exposes all five ordering modes; OCaml's atomics are always SeqCst.

Complexity: Rust's ordering flexibility enables optimization but requires expertise; OCaml's simplicity trades performance for safety.

C11 correspondence: Rust's orderings map directly to C11/C++11 orderings; OCaml has its own memory model.

Non-atomic accesses: Rust's non-atomic accesses are data races if unsynchronized; OCaml's GC values have special rules in OCaml 5.x.

OCaml Approach

OCaml 5.x's Atomic module uses sequential consistency for all operations — there is no explicit ordering control. The simplicity reduces bug potential but prevents optimizations that weaker orderings enable. OCaml's memory model is based on the "OCaml Memory Model" paper (2022), which is weaker than C11's sequentially consistent model in some edge cases involving non-atomic accesses.

Full Source

#![allow(clippy::all)]
// 453. Memory ordering: Relaxed, Acquire, Release
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_release_acquire() {
        let d = Arc::new(AtomicUsize::new(0));
        let f = Arc::new(AtomicBool::new(false));
        let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
        thread::spawn(move || {
            dc.store(42, Ordering::Relaxed);
            fc.store(true, Ordering::Release);
        })
        .join()
        .unwrap();
        assert!(f.load(Ordering::Acquire));
        assert_eq!(d.load(Ordering::Relaxed), 42);
    }
}

(* 453. Memory ordering – OCaml 5 note *)
(* All OCaml 5 atomics are sequentially consistent *)
let data  = Array.make 10 0
let ready = Atomic.make false

let producer () =
  Array.iteri (fun i _ -> data.(i) <- i*i) data;
  Atomic.set ready true  (* implicit Release *)

let consumer () =
  while not (Atomic.get ready) do () done;  (* implicit Acquire *)
  Printf.printf "sum=%d\n" (Array.fold_left (+) 0 data)

let () =
  let p = Domain.spawn producer in
  let c = Domain.spawn consumer in
  Domain.join p; Domain.join c

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_release_acquire() {
        let d = Arc::new(AtomicUsize::new(0));
        let f = Arc::new(AtomicBool::new(false));
        let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
        thread::spawn(move || {
            dc.store(42, Ordering::Relaxed);
            fc.store(true, Ordering::Release);
        })
        .join()
        .unwrap();
        assert!(f.load(Ordering::Acquire));
        assert_eq!(d.load(Ordering::Relaxed), 42);
    }
}

Exercises

Spinlock: Implement a spinlock using AtomicBool with compare_exchange(false, true, Acquire, Relaxed) for lock and store(false, Release) for unlock. Explain in a comment why these orderings are sufficient.

Seqlock: Implement a sequence lock (seqlock) — a writer increments a counter (odd = writing), copies data, increments again (even = done). A reader reads the counter (must be even and Acquire), reads data, reads counter again, retries if different. Use correct orderings.

Ordering violation: Write a test that demonstrates what can go wrong with Relaxed on a flag without the Release-Acquire pattern: have one thread write data then set a Relaxed flag, another spin on the Relaxed flag then read data. Document what incorrect result the reader might observe on weakly-ordered CPUs (ARM/POWER).

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust