433: State Machine via Macro
Difficulty: 4 Level: Expert Generate typestate state machine boilerplate โ states as zero-sized types, transitions as consuming methods โ so that invalid sequences are compile errors, not runtime panics.The Problem This Solves
Protocol implementations โ network connections, file handles, authentication flows, payment processors โ have strict sequencing rules. You can only `send()` after `connect()`. You can only `commit()` inside a transaction. Calling methods out of order is a logic error that should be caught immediately, not at runtime in production when a user hits an unexpected state. Encoding these rules as runtime enums plus guards (`if self.state != State::Connected { panic!() }`) is fragile: the guard is easy to forget, every method has boilerplate, and the compiler won't help you find callers that got the sequence wrong. The typestate pattern moves state into the type system: each state is a zero-sized type, and methods are only defined on the correct state. Call `send()` on a `Connection<Disconnected>` and the compiler refuses. The compiler is your state machine validator. Implementing this by hand requires: state structs, a generic struct, `PhantomData`, and an `impl` block per state. A state machine macro generates all of this from a concise declaration.The Intuition
Each state (`Disconnected`, `Connected`, `Authenticated`) is a zero-sized struct โ it carries no data, costs nothing at runtime. The connection struct `Connection<S>` is generic over state `S`. Methods exist only on specific `Connection<ConcreteState>` types:Connection<Disconnected> โ .connect() โ Connection<Connected>
Connection<Connected> โ .authenticate() โ Connection<Authenticated>
Connection<Authenticated> โ .send() โ same state
Connection<Authenticated> โ .disconnect() โ Connection<Closed>
Each transition consumes `self` and returns a new typed value. Once consumed, you can't call the old state's methods โ the original variable is moved. Impossible sequences are literally unrepresentable.
How It Works in Rust
use std::marker::PhantomData;
// State marker types โ zero bytes at runtime
struct Disconnected;
struct Connected;
struct Authenticated;
struct Closed;
// The state machine struct โ generic over current state
struct Connection<State> {
host: String,
port: u16,
messages_sent: u32,
_state: PhantomData<State>, // zero cost, carries type info
}
// Methods only on Disconnected
impl Connection<Disconnected> {
fn new(host: &str, port: u16) -> Self {
Connection { host: host.into(), port, messages_sent: 0, _state: PhantomData }
}
// Consuming transition: Disconnected โ Connected
fn connect(self) -> Connection<Connected> {
println!("Connecting to {}:{}", self.host, self.port);
Connection { _state: PhantomData, ..self } // state type changes, data moves
}
}
// Methods only on Connected
impl Connection<Connected> {
// Consuming transition: Connected โ Authenticated
fn authenticate(self, _token: &str) -> Connection<Authenticated> {
Connection { _state: PhantomData, ..self }
}
}
// Methods only on Authenticated
impl Connection<Authenticated> {
fn send(&mut self, msg: &str) { // &mut self โ doesn't transition
println!("โ {}", msg);
self.messages_sent += 1;
}
fn disconnect(self) -> Connection<Closed> {
Connection { _state: PhantomData, ..self }
}
}
// Valid sequence โ compiles
let conn = Connection::new("api.example.com", 443)
.connect()
.authenticate("secret-token");
// ... use conn.send() ...
let closed = conn.disconnect();
// These DO NOT COMPILE:
// Connection::new("h", 80).send("x"); // Disconnected has no send()
// closed.send("x"); // Closed has no send()
The state machine macro (`state_machine!` in the example) generates the state structs, the generic struct with `PhantomData`, and the `impl` blocks for each transition from the same concise DSL.
What This Unlocks
- Protocol correctness at compile time โ TCP, TLS, HTTP, OAuth, database transactions โ any multi-step protocol can be modelled so misuse is a build failure.
- Self-documenting APIs โ the type signature of a function that accepts `Connection<Authenticated>` is documentation that cannot go stale.
- Macro-generated machines โ a `state_machine!` DSL lets you declare dozens of states and transitions concisely without hand-writing every `PhantomData` impl.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| State encoding | Variant in a GADT or module type | Zero-sized struct as phantom type parameter |
| Invalid transitions | Runtime exception or explicit result type | Compile error โ method doesn't exist on the wrong type |
| Transition cost | Pattern match + allocation if boxing | Zero cost โ state type is erased, only the generic changes |
| State machine DSL | GADT or first-class modules | `macro_rules!` generating `impl` blocks per state |
| Mutating within a state | Record update syntax | `&mut self` methods on specific `impl Connection<State>` |
// State machine via macro in Rust โ typestate pattern
// Macro that generates a typestate state machine
macro_rules! state_machine {
(
struct $name:ident<$state_param:ident> {
$($field:ident : $fty:ty),* $(,)?
}
states { $($state:ident),* $(,)? }
transitions {
$( $from:ident => $method:ident => $to:ident { $($body:tt)* } )*
}
) => {
// State marker types
$(
#[derive(Debug)]
struct $state;
)*
// The state machine struct
#[derive(Debug)]
struct $name<S> {
$($field: $fty,)*
_state: std::marker::PhantomData<S>,
}
// Transition impls
$(
impl $name<$from> {
fn $method(self) -> $name<$to> {
$name {
$($field: self.$field,)*
_state: std::marker::PhantomData,
}
}
}
)*
};
}
// Define a Connection state machine
#[derive(Debug)]
struct Disconnected;
#[derive(Debug)]
struct Connected;
#[derive(Debug)]
struct Authenticated;
#[derive(Debug)]
struct Closed;
#[derive(Debug)]
struct Connection<State> {
host: String,
port: u16,
messages_sent: u32,
_state: std::marker::PhantomData<State>,
}
impl Connection<Disconnected> {
fn new(host: &str, port: u16) -> Self {
Connection {
host: host.to_string(),
port,
messages_sent: 0,
_state: std::marker::PhantomData,
}
}
fn connect(self) -> Connection<Connected> {
println!("Connecting to {}:{}", self.host, self.port);
Connection { _state: std::marker::PhantomData, ..self }
}
}
impl Connection<Connected> {
fn authenticate(self, _token: &str) -> Connection<Authenticated> {
println!("Authenticating...");
Connection { _state: std::marker::PhantomData, ..self }
}
fn disconnect(self) -> Connection<Closed> {
println!("Disconnecting (unauthenticated)");
Connection { _state: std::marker::PhantomData, ..self }
}
}
impl Connection<Authenticated> {
fn send(&mut self, message: &str) {
println!("Sending: {}", message);
self.messages_sent += 1;
}
fn disconnect(self) -> Connection<Closed> {
println!("Disconnecting (sent {} messages)", self.messages_sent);
Connection { _state: std::marker::PhantomData, ..self }
}
}
impl Connection<Closed> {
fn stats(&self) {
println!("Connection to {} closed. Messages sent: {}",
self.host, self.messages_sent);
}
}
fn main() {
let conn = Connection::new("api.example.com", 443);
let conn = conn.connect();
let mut conn = conn.authenticate("secret-token");
conn.send("GET /users HTTP/1.1");
conn.send("Host: api.example.com");
let closed = conn.disconnect();
closed.stats();
// Type safety: these would NOT compile:
// conn.send("too late!"); // Connection<Closed> has no send()
// let unauthenticated = Connection::new("h", 80).connect();
// unauthenticated.send("no auth!"); // Connected has no send()
println!("
State machine enforced at compile time!");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_state_machine() {
let conn = Connection::new("test.com", 80)
.connect()
.authenticate("token");
let closed = conn.disconnect();
// Can only call stats() on Closed
closed.stats();
}
fn assert_send_type<T: Send>() {}
#[test]
fn test_connection_types_distinct() {
// These are different types at compile time:
let _disc: Connection<Disconnected> = Connection::new("h", 80);
let _conn: Connection<Connected> = Connection::new("h", 80).connect();
// They don't have the same methods โ enforced by type system
}
}
(* State machine via macros in OCaml with phantom types *)
(* Phantom types for state tracking *)
type unconnected
type connected
type closed
type 'state connection = {
host: string;
port: int;
mutable data: string list;
}
(* Each function returns the "next state" *)
let connect host port : connected connection =
Printf.printf "Connecting to %s:%d\n" host port;
{ host; port; data = [] }
let send (conn : connected connection) msg : connected connection =
Printf.printf "Sending: %s\n" msg;
{ conn with data = msg :: conn.data }
let disconnect (conn : connected connection) : closed connection =
Printf.printf "Disconnecting from %s\n" conn.host;
{ conn with data = [] }
(* Cannot send on closed connection โ type error! *)
(* let bad = send (disconnect (connect "localhost" 80)) "oops" *)
let () =
let conn = connect "api.example.com" 443 in
let conn2 = send conn "GET / HTTP/1.1" in
let conn3 = send conn2 "Host: api.example.com" in
let _closed = disconnect conn3 in
Printf.printf "Protocol followed correctly\n"