455 Expert

455: Lock-Free Stack

Functional Programming

Tutorial

The Problem

A mutex-based stack serializes all push/pop operations — a contention bottleneck under high concurrency. A lock-free stack uses compare_and_swap on the head pointer: push allocates a new node, sets its next to the current head, then CAS-swaps the head; pop reads the head, follows next, then CAS-swaps the head to next. Retrying on CAS failure makes it correct without locks. Treiber's lock-free stack (1986) is the canonical algorithm and remains relevant in high-frequency trading, lock-free allocators, and concurrent data structures research.

Lock-free stacks appear in memory allocators (free list), lock-free work queues, concurrent GC systems, and any high-throughput concurrent LIFO structure.

🎯 Learning Outcomes

• Understand Treiber's lock-free stack algorithm using CAS on the head pointer

• Learn how AtomicPtr<Node<T>> stores the stack head with atomic pointer swaps

• See how the push CAS loop handles concurrent modifications correctly

• Understand the ABA problem: node reuse can fool CAS even with correct values

• Learn why unsafe Rust is needed for raw pointer manipulation in lock-free structures

Code Example

#![allow(clippy::all)]
// 455. Lock-free stack with atomics
use std::ptr;
use std::sync::atomic::{AtomicPtr, Ordering};

struct Node<T> {
    value: T,
    next: *mut Node<T>,
}
pub struct Stack<T> {
    head: AtomicPtr<Node<T>>,
}
unsafe impl<T: Send> Send for Stack<T> {}
unsafe impl<T: Send> Sync for Stack<T> {}

impl<T> Stack<T> {
    pub fn new() -> Self {
        Stack {
            head: AtomicPtr::new(ptr::null_mut()),
        }
    }
    pub fn push(&self, v: T) {
        let n = Box::into_raw(Box::new(Node {
            value: v,
            next: ptr::null_mut(),
        }));
        loop {
            let h = self.head.load(Ordering::Relaxed);
            unsafe {
                (*n).next = h;
            }
            match self
                .head
                .compare_exchange_weak(h, n, Ordering::Release, Ordering::Relaxed)
            {
                Ok(_) => break,
                Err(_) => {}
            }
        }
    }
    pub fn pop(&self) -> Option<T> {
        loop {
            let h = self.head.load(Ordering::Acquire);
            if h.is_null() {
                return None;
            }
            let next = unsafe { (*h).next };
            match self
                .head
                .compare_exchange_weak(h, next, Ordering::AcqRel, Ordering::Relaxed)
            {
                Ok(_) => {
                    let v = unsafe { ptr::read(&(*h).value) };
                    unsafe {
                        drop(Box::from_raw(h));
                    }
                    return Some(v);
                }
                Err(_) => {}
            }
        }
    }
}
impl<T> Drop for Stack<T> {
    fn drop(&mut self) {
        while self.pop().is_some() {}
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::sync::Arc;
    use std::thread;
    #[test]
    fn test_lifo() {
        let s = Stack::new();
        s.push(1);
        s.push(2);
        s.push(3);
        assert_eq!(s.pop(), Some(3));
        assert_eq!(s.pop(), Some(2));
        assert_eq!(s.pop(), None.or(Some(1)));
        assert_eq!(s.pop(), None);
    }
    #[test]
    fn test_concurrent() {
        let s = Arc::new(Stack::<u32>::new());
        let hs: Vec<std::thread::JoinHandle<()>> = (0..4)
            .map(|_| {
                let s = Arc::clone(&s);
                thread::spawn(move || {
                    for i in 0..100 {
                        s.push(i);
                    }
                })
            })
            .collect();
        for h in hs {
            h.join().unwrap();
        }
        let mut c = 0;
        while s.pop().is_some() {
            c += 1;
        }
        assert_eq!(c, 400);
    }
}

(* 455. Lock-free stack – OCaml with Mutex (stdlib) *)
type 'a t = { mutable head: 'a list; m: Mutex.t }
let create () = { head=[]; m=Mutex.create () }
let push s v = Mutex.lock s.m; s.head <- v::s.head; Mutex.unlock s.m
let pop  s   = Mutex.lock s.m; let r = match s.head with []->None|x::t->s.head<-t;Some x in Mutex.unlock s.m; r
let () =
  let s = create () in
  let ps = List.init 4 (fun id ->
    Thread.create (fun () -> for i=0 to 9 do push s (id*10+i) done) ()
  ) in
  List.iter Thread.join ps;
  let n = ref 0 in while Option.is_some (pop s) do incr n done;
  Printf.printf "popped %d\n" !n

Key Differences

Memory safety: Rust requires unsafe for pointer manipulation; OCaml's GC provides automatic memory management for lock-free structures.

ABA problem: Rust's lock-free stack is susceptible to ABA (need hazard pointers or epoch GC); OCaml's GC prevents ABA for pointer-based structures.

Type safety: Rust's AtomicPtr<T> is typed; OCaml's Atomic.t holds any value.

Unsafe boundary: Rust isolates unsafe code; OCaml's lock-free code is safe but depends on GC memory model guarantees.

OCaml Approach

OCaml's lock-free stack for OCaml 5.x uses Atomic.t for the head pointer combined with OCaml's GC handling memory. The GC eliminates ABA problems for pointer-based structures since allocated nodes are never reused at the same address while still referenced. A lock-free stack: let push s v = let n = { v; next = Atomic.get s } in while not (Atomic.compare_and_set s n.next n) do n.next <- Atomic.get s done.

Full Source

#![allow(clippy::all)]
// 455. Lock-free stack with atomics
use std::ptr;
use std::sync::atomic::{AtomicPtr, Ordering};

struct Node<T> {
    value: T,
    next: *mut Node<T>,
}
pub struct Stack<T> {
    head: AtomicPtr<Node<T>>,
}
unsafe impl<T: Send> Send for Stack<T> {}
unsafe impl<T: Send> Sync for Stack<T> {}

impl<T> Stack<T> {
    pub fn new() -> Self {
        Stack {
            head: AtomicPtr::new(ptr::null_mut()),
        }
    }
    pub fn push(&self, v: T) {
        let n = Box::into_raw(Box::new(Node {
            value: v,
            next: ptr::null_mut(),
        }));
        loop {
            let h = self.head.load(Ordering::Relaxed);
            unsafe {
                (*n).next = h;
            }
            match self
                .head
                .compare_exchange_weak(h, n, Ordering::Release, Ordering::Relaxed)
            {
                Ok(_) => break,
                Err(_) => {}
            }
        }
    }
    pub fn pop(&self) -> Option<T> {
        loop {
            let h = self.head.load(Ordering::Acquire);
            if h.is_null() {
                return None;
            }
            let next = unsafe { (*h).next };
            match self
                .head
                .compare_exchange_weak(h, next, Ordering::AcqRel, Ordering::Relaxed)
            {
                Ok(_) => {
                    let v = unsafe { ptr::read(&(*h).value) };
                    unsafe {
                        drop(Box::from_raw(h));
                    }
                    return Some(v);
                }
                Err(_) => {}
            }
        }
    }
}
impl<T> Drop for Stack<T> {
    fn drop(&mut self) {
        while self.pop().is_some() {}
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::sync::Arc;
    use std::thread;
    #[test]
    fn test_lifo() {
        let s = Stack::new();
        s.push(1);
        s.push(2);
        s.push(3);
        assert_eq!(s.pop(), Some(3));
        assert_eq!(s.pop(), Some(2));
        assert_eq!(s.pop(), None.or(Some(1)));
        assert_eq!(s.pop(), None);
    }
    #[test]
    fn test_concurrent() {
        let s = Arc::new(Stack::<u32>::new());
        let hs: Vec<std::thread::JoinHandle<()>> = (0..4)
            .map(|_| {
                let s = Arc::clone(&s);
                thread::spawn(move || {
                    for i in 0..100 {
                        s.push(i);
                    }
                })
            })
            .collect();
        for h in hs {
            h.join().unwrap();
        }
        let mut c = 0;
        while s.pop().is_some() {
            c += 1;
        }
        assert_eq!(c, 400);
    }
}

(* 455. Lock-free stack – OCaml with Mutex (stdlib) *)
type 'a t = { mutable head: 'a list; m: Mutex.t }
let create () = { head=[]; m=Mutex.create () }
let push s v = Mutex.lock s.m; s.head <- v::s.head; Mutex.unlock s.m
let pop  s   = Mutex.lock s.m; let r = match s.head with []->None|x::t->s.head<-t;Some x in Mutex.unlock s.m; r
let () =
  let s = create () in
  let ps = List.init 4 (fun id ->
    Thread.create (fun () -> for i=0 to 9 do push s (id*10+i) done) ()
  ) in
  List.iter Thread.join ps;
  let n = ref 0 in while Option.is_some (pop s) do incr n done;
  Printf.printf "popped %d\n" !n

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;
    use std::sync::Arc;
    use std::thread;
    #[test]
    fn test_lifo() {
        let s = Stack::new();
        s.push(1);
        s.push(2);
        s.push(3);
        assert_eq!(s.pop(), Some(3));
        assert_eq!(s.pop(), Some(2));
        assert_eq!(s.pop(), None.or(Some(1)));
        assert_eq!(s.pop(), None);
    }
    #[test]
    fn test_concurrent() {
        let s = Arc::new(Stack::<u32>::new());
        let hs: Vec<std::thread::JoinHandle<()>> = (0..4)
            .map(|_| {
                let s = Arc::clone(&s);
                thread::spawn(move || {
                    for i in 0..100 {
                        s.push(i);
                    }
                })
            })
            .collect();
        for h in hs {
            h.join().unwrap();
        }
        let mut c = 0;
        while s.pop().is_some() {
            c += 1;
        }
        assert_eq!(c, 400);
    }
}

Exercises

Correct stack test: Write tests for the lock-free stack with 8 concurrent producer threads pushing 1000 items each, and 8 consumer threads popping until empty. Verify no items are lost or duplicated.

ABA demonstration: Devise a scenario where the ABA problem could manifest in the current implementation. Propose a fix using an epoch counter packed with the pointer (if on a 64-bit platform with spare pointer bits).

Lock-free queue: Extend the lock-free stack concept to implement a lock-free FIFO queue (Michael-Scott queue). The key difference: maintain both head and tail pointers, each protected by separate CAS operations.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust