734: Typestate Pattern: Encode State in the Type
Difficulty: 4 Level: Expert Use zero-sized type parameters to make invalid state transitions a compile error โ the type system enforces your protocol, with zero runtime overhead.The Problem This Solves
State machines are everywhere: a traffic light that can only go Red โ Green โ Yellow โ Red. A TCP connection that must Connect before it can Send. A file handle that must be opened before it can be read. Traditionally, you enforce these transitions at runtime: check a flag, return an error, maybe panic. This means bugs are caught at runtime โ or worse, in production. With the typestate pattern, the state is encoded in the type, not in a field. `Light<Red>` and `Light<Green>` are different types. The `go()` method only exists on `Light<Red>`, not on `Light<Green>`. You can't call `.stop()` on a green light because there is no `stop` method on `Light<Green>` โ the compiler simply doesn't know what you're talking about. The result: invalid state transitions don't compile. There's no runtime check, no error handling, no test needed for the "impossible" case. The bug cannot exist in a compiled binary.The Intuition
In Python, you'd use an enum for the state and check it in every method: `if self.state != State.RED: raise ValueError("not red")`. This works but requires defensive programming everywhere and can only catch errors at runtime. The typestate pattern moves that check to the type system. Instead of `self.state == State.RED`, the type `Light<Red>` itself is the check. If you have a `Light<Red>`, it's red โ the type guarantees it. No flag needed. `PhantomData<State>` is the key: it's a zero-sized field that holds the type parameter without occupying any bytes. `Light<Red>` and `Light<Green>` are distinct types despite being structurally identical at the byte level.How It Works in Rust
use std::marker::PhantomData;
// Zero-sized state marker types โ no data, just type identity
pub struct Red;
pub struct Green;
pub struct Yellow;
// The light struct carries the state in its type parameter
// PhantomData<State> is 0 bytes at runtime
pub struct Light<State> {
_state: PhantomData<State>,
}
// Methods only defined on specific states
impl Light<Red> {
pub fn new() -> Self { // can only start Red
Light { _state: PhantomData }
}
pub fn go(self) -> Light<Green> { // consumes self: Red is GONE
Light { _state: PhantomData } // returns a new Light<Green>
}
// No `slow()` or `stop()` methods โ can't call them on Red
}
impl Light<Green> {
pub fn slow(self) -> Light<Yellow> {
Light { _state: PhantomData }
}
// No `go()` โ can't transition Green โ Green
}
impl Light<Yellow> {
pub fn stop(self) -> Light<Red> {
Light { _state: PhantomData }
}
}
// Valid โ the full cycle compiles
let red = Light::<Red>::new();
let green = red.go();
let yellow = green.slow();
let red2 = yellow.stop();
// These would NOT compile:
// red.slow(); // ERROR: no method `slow` found for `Light<Red>`
// green.go(); // ERROR: no method `go` found for `Light<Green>`
// red.stop(); // ERROR: no method `stop` found for `Light<Red>`
// Size check โ PhantomData is zero bytes
assert_eq!(std::mem::size_of::<Light<Red>>(), 0);
Key points:
- State transition methods take `self` (not `&self`) โ they consume the old state and return the new type
- Once you call `red.go()`, the `red` variable is moved and cannot be used again โ the old state is gone
- `PhantomData<T>` tells the compiler "this type owns or is associated with T" without storing any T data
- State marker types (`Red`, `Green`, `Yellow`) are zero-sized โ no memory cost at all
- The pattern composes: a file `handle` can be `Open`, `Locked`, or `Closed`, with reads only on `Open` handles
What This Unlocks
- Protocol enforcement without runtime cost: HTTP clients, database connections, file handles, network sockets โ all can use typestate to enforce correct usage order
- Impossible states are unrepresentable: "opened-twice" or "closed-before-opened" bugs literally cannot compile โ no test needed for those cases
- Self-documenting APIs: the type signature `fn connect(self: Session<Disconnected>) -> Session<Connected>` tells users the protocol without reading documentation
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| State in type | Phantom types with GADTs | `PhantomData<State>` with zero-sized markers |
| Invalid transitions | Compile error via type mismatch | Compile error โ method doesn't exist on that type |
| State consumption | Functional style: return new state | Move semantics: `self` is consumed, new type returned |
| Runtime overhead | Zero with phantom types | Zero โ `PhantomData` and markers are erased |
| State marker type | `type 'red light` phantom | `struct Red;` zero-sized struct |
| Protocol machines | Type-indexed by phantom | Generic struct `Session<State>` + per-state `impl` |
/// 734: Typestate Basics โ compile-time state machine
/// Invalid transitions DO NOT compile. No runtime checks needed.
use std::marker::PhantomData;
// โโ State marker types (zero-sized) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
pub struct Red;
pub struct Green;
pub struct Yellow;
// โโ Traffic Light โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
/// `PhantomData<State>` is zero bytes at runtime.
/// The type parameter encodes which state we're in.
pub struct Light<State> {
_state: PhantomData<State>,
}
impl Light<Red> {
/// Only way to create a light โ must start Red.
pub fn new() -> Self {
println!("Light: Red");
Light { _state: PhantomData }
}
/// Red โ Green (consumes self, returns different type)
pub fn go(self) -> Light<Green> {
println!("Light: Green");
Light { _state: PhantomData }
}
}
impl Light<Green> {
/// Green โ Yellow
pub fn slow(self) -> Light<Yellow> {
println!("Light: Yellow");
Light { _state: PhantomData }
}
}
impl Light<Yellow> {
/// Yellow โ Red
pub fn stop(self) -> Light<Red> {
println!("Light: Red");
Light { _state: PhantomData }
}
}
// โโ Size check โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fn main() {
println!("Size of Light<Red> = {} bytes", std::mem::size_of::<Light<Red>>());
println!("Size of Light<Green> = {} bytes", std::mem::size_of::<Light<Green>>());
println!();
// Valid state machine
let red = Light::<Red>::new();
let green = red.go();
let yellow = green.slow();
let red2 = yellow.stop();
// Cycle again
let _green2 = red2.go();
// โโ The following would NOT compile: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// let r = Light::<Red>::new();
// let _ = r.slow(); // ERROR: no method `slow` on `Light<Red>`
// let _ = r.stop(); // ERROR: no method `stop` on `Light<Red>`
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn full_cycle_compiles() {
let red = Light::<Red>::new();
let green = red.go();
let yellow = green.slow();
let _red2 = yellow.stop();
}
#[test]
fn light_is_zero_sized() {
assert_eq!(std::mem::size_of::<Light<Red>>(), 0);
assert_eq!(std::mem::size_of::<Light<Green>>(), 0);
assert_eq!(std::mem::size_of::<Light<Yellow>>(), 0);
}
// Uncommenting the test below would fail to COMPILE โ which is the point:
//
// #[test]
// fn invalid_transition_compile_error() {
// let r = Light::<Red>::new();
// let _ = r.slow(); // compile error: method not found
// }
}
(* 734: Typestate Basics โ OCaml GADT approach
OCaml doesn't have zero-cost phantom types like Rust, but GADTs let us
encode state in the type safely. *)
(* State phantom types *)
type red = Red
type green = Green
type yellow = Yellow
(* Traffic light parameterized by phantom state *)
(* We use a simple record + witness value *)
type 'state light = {
state_name: string;
_phantom: 'state; (* phantom field to carry state type *)
}
(* Type-safe constructors and transitions *)
let red_light () : red light = { state_name = "Red"; _phantom = Red }
let green_of_red (l : red light) : green light =
let _ = l in
{ state_name = "Green"; _phantom = Green }
let yellow_of_green (l : green light) : yellow light =
let _ = l in
{ state_name = "Yellow"; _phantom = Yellow }
let red_of_yellow (l : yellow light) : red light =
let _ = l in
{ state_name = "Red"; _phantom = Red }
let print_light l = Printf.printf "Light is: %s\n" l.state_name
let () =
let r = red_light () in
print_light r;
let g = green_of_red r in
print_light g;
let y = yellow_of_green g in
print_light y;
let r2 = red_of_yellow y in
print_light r2;
(* Invalid: green_of_red y โ would be a type error in OCaml too! *)
ignore r2