924 Advanced

924-work-stealing — Work Stealing

Functional Programming

Tutorial

The Problem

A fixed thread pool with a single shared work queue can become a bottleneck: all workers contend for the same lock. Work-stealing distributes this: each worker has its own local queue; when idle, a worker "steals" tasks from another worker's queue. This approach was invented for the Cilk parallel runtime at MIT and is used in Java's ForkJoinPool, .NET's ThreadPool, Go's goroutine scheduler, and Rust's rayon. Local queues use lock-free algorithms for pushes; stealing requires only occasional locking. The result is near-linear scaling on multi-core processors for divide-and-conquer workloads.

🎯 Learning Outcomes

• Understand the work-stealing algorithm and why it scales better than a shared queue

• Recognize divide-and-conquer as the canonical work-stealing workload

• Understand why local queues use LIFO (stack) order — better cache locality

• Understand why stealing takes from the other end (FIFO) — avoids the biggest tasks being stolen

• Compare with OCaml's Domainslib and Moonpool for parallel work stealing

Code Example

#![allow(clippy::all)]
// Placeholder — Work Stealing

(* Work Stealing *)

Key Differences

Lock-free deques: Rust work-stealing implementations use lock-free double-ended queues for near-contention-free local operations; OCaml requires explicit domain synchronization.

rayon integration: Rust's rayon provides work-stealing transparently via par_iter() — no manual thread management; OCaml requires explicit Domainslib API calls.

Recursion and fork-join: Rust rayon::join(f, g) is the idiomatic parallel divide-and-conquer entry point; OCaml Task.async is the equivalent.

Production readiness: rayon is the production Rust solution for CPU-bound parallel work; OCaml's parallel ecosystem is still maturing (OCaml 5 is relatively new).

OCaml Approach

OCaml 5 introduces Domain.spawn for true parallelism. Domainslib builds work-stealing on top of domains. Task.pool creates a pool; Task.async pool f submits a task; Task.await retrieves the result. Moonpool provides an alternative with configurable scheduling. OCaml's cooperative (Lwt, Eio) runtimes do not benefit from work-stealing (they run on one domain unless using Eio.Fiber). The biggest difference: OCaml 5 domains are heavyweight compared to Rust threads.

Full Source

#![allow(clippy::all)]
// Placeholder — Work Stealing

(* Work Stealing *)

Deep Comparison

924-work-stealing — Language Comparison

std vs tokio

Aspect	std version	tokio version
Runtime	OS threads via `std::thread`	Async tasks on tokio runtime
Synchronization	`std::sync::Mutex`, `Condvar`	`tokio::sync::Mutex`, channels
Channels	`std::sync::mpsc` (unbounded)	`tokio::sync::mpsc` (bounded, async)
Blocking	Thread blocks on lock/recv	Task yields, runtime switches tasks
Overhead	One OS thread per task	Many tasks per thread (M:N)
Best for	CPU-bound, simple concurrency	I/O-bound, high-concurrency servers

Exercises

Implement parallel merge sort using a work-stealing approach: spawn two recursive tasks for the two halves.

Write a parallel find_any that searches for a value across N worker threads, cancelling others when found.

Measure the speedup of a parallel sum over a large array using 1, 2, 4, and 8 workers with different chunk sizes.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust