176: Introduction to GADTs
Difficulty: โญโญโญ Level: Advanced Simulate Generalized Algebraic Data Types in Rust using phantom types, so different enum variants can carry provably different return types.The Problem This Solves
In a normal Rust enum, every variant collapses to the same type. If you build an expression tree with `enum Expr { Int(i64), Bool(bool), Add(Box<Expr>, Box<Expr>) }`, the `eval` function must return `Box<dyn Any>` โ you've thrown away type information and now need runtime casts. This means `Add(Bool(true), Int(5))` is representable, and its wrongness is only caught at runtime. Generalized Algebraic Data Types (GADTs) solve this by letting each constructor carry its own type index. In OCaml, `Int : int -> int expr` is a constructor whose type says "I produce an `int expr`". The type system can then prove that `eval : 'a expr -> 'a` โ call `eval` on an `int expr` and you get an `int`; call it on a `bool expr` and you get a `bool`. No casts, no `Any`. The payoff is an expression tree where the type of the result is part of the node's type. You literally cannot write `Add(Bool(true), Int(5))` โ the compiler rejects it before your program ever runs.The Intuition
Think of a vending machine with typed buttons. A standard enum is like a machine where all buttons return a bag of "something" โ you have to reach in and hope it's what you wanted. A GADT is like a machine where button A returns a `Drink` and button B returns a `Snack`, guaranteed by the machine's own type. The return type of each button is built into the button's type signature. In Rust we don't have native GADTs, but we can get close with phantom types: `Expr<T>` is a struct that carries a `PhantomData<T>`. The type `T` is never stored โ it takes zero bytes โ but the compiler tracks it statically. Smart constructors enforce the rules: `Expr::<i64>::int(5)` only compiles when building an integer node. `Expr::<bool>::bool_val(true)` only compiles for booleans. The type parameter flows through the tree and prevents mixing.How It Works in Rust
use std::marker::PhantomData;
// The phantom type T says "what this expression evaluates to"
struct Expr<T> {
inner: ExprInner, // actual data, untyped
_phantom: PhantomData<T>, // type tag โ zero bytes
}
// Only specific impl blocks exist:
impl Expr<i64> {
fn int(n: i64) -> Self { /* only callable for i64 */ }
fn add(a: Expr<i64>, b: Expr<i64>) -> Expr<i64> { /* enforces both i64 */ }
fn eval(&self) -> i64 { /* return type matches phantom */ }
}
impl Expr<bool> {
fn bool_val(b: bool) -> Self { /* only callable for bool */ }
fn eval(&self) -> bool { /* return type matches phantom */ }
}
// This compiles:
let e: Expr<i64> = Expr::add(Expr::int(3), Expr::int(4));
let result: i64 = e.eval(); // statically known: returns i64
// This does NOT compile โ types mismatch:
// Expr::add(Expr::int(3), Expr::bool_val(true));
// ^^^^^^^^^^^^^^^^^ expected Expr<i64>, found Expr<bool>
The key insight: `PhantomData<T>` lets you brand a struct with a type without storing any value of that type. Different `impl` blocks on different `Expr<T>` specializations enforce GADT-like rules.
What This Unlocks
- Type-safe ASTs for compilers, query builders, template engines โ where you need to know the output type of each node statically.
- Typed serialization/deserialization โ a schema type parameterized by its decoded value (`Schema<User>` decodes to `User`).
- Embedded DSLs where invalid programs are impossible to represent โ the host language's type system rejects nonsense programs at the DSL level.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| GADT syntax | `type _ expr = Int : int -> int expr` | `struct Expr<T>` + `PhantomData<T>` + separate `impl` blocks |
| Type refinement in match | Automatic per branch โ compiler knows `a = int` inside `Int` branch | No refinement; must use `unreachable!()` for provably-impossible arms |
| Enforcing constructor rules | Built into the type system via GADT constructor return types | Enforced by only exposing smart constructors (no direct struct construction) |
| Runtime overhead | None | None โ `PhantomData` is zero-sized |
| Exhaustiveness | Full GADT checking | Pattern matching on inner untyped enum; type safety at construction time only |
// Example 176: Introduction to GADTs
// Rust doesn't have native GADTs, but we can simulate them with
// PhantomData, traits, and sealed type-level markers.
use std::marker::PhantomData;
// === Approach 1: PhantomData + sealed traits to simulate GADTs ===
mod sealed {
pub trait ExprType {}
impl ExprType for i64 {}
impl ExprType for bool {}
}
#[derive(Debug)]
enum ExprInner {
Int(i64),
Bool(bool),
Add(Box<ExprInner>, Box<ExprInner>),
If(Box<ExprInner>, Box<ExprInner>, Box<ExprInner>),
}
#[derive(Debug)]
struct Expr<T: sealed::ExprType> {
inner: ExprInner,
_phantom: PhantomData<T>,
}
impl Expr<i64> {
fn int(n: i64) -> Self {
Expr { inner: ExprInner::Int(n), _phantom: PhantomData }
}
fn add(a: Expr<i64>, b: Expr<i64>) -> Self {
Expr { inner: ExprInner::Add(Box::new(a.inner), Box::new(b.inner)), _phantom: PhantomData }
}
fn eval(&self) -> i64 {
match &self.inner {
ExprInner::Int(n) => *n,
ExprInner::Add(a, b) => {
let a = Expr::<i64> { inner: *a.clone(), _phantom: PhantomData };
let b = Expr::<i64> { inner: *b.clone(), _phantom: PhantomData };
a.eval() + b.eval()
}
ExprInner::If(c, t, f) => {
let c = Expr::<bool> { inner: *c.clone(), _phantom: PhantomData };
let t = Expr::<i64> { inner: *t.clone(), _phantom: PhantomData };
let f = Expr::<i64> { inner: *f.clone(), _phantom: PhantomData };
if c.eval() { t.eval() } else { f.eval() }
}
_ => unreachable!(),
}
}
}
impl Expr<bool> {
fn bool_val(b: bool) -> Self {
Expr { inner: ExprInner::Bool(b), _phantom: PhantomData }
}
fn eval(&self) -> bool {
match &self.inner {
ExprInner::Bool(b) => *b,
_ => unreachable!(),
}
}
}
impl Clone for ExprInner {
fn clone(&self) -> Self {
match self {
ExprInner::Int(n) => ExprInner::Int(*n),
ExprInner::Bool(b) => ExprInner::Bool(*b),
ExprInner::Add(a, b) => ExprInner::Add(Box::new((**a).clone()), Box::new((**b).clone())),
ExprInner::If(a, b, c) => ExprInner::If(Box::new((**a).clone()), Box::new((**b).clone()), Box::new((**c).clone())),
}
}
}
fn if_expr(cond: Expr<bool>, then: Expr<i64>, else_: Expr<i64>) -> Expr<i64> {
Expr {
inner: ExprInner::If(Box::new(cond.inner), Box::new(then.inner), Box::new(else_.inner)),
_phantom: PhantomData,
}
}
// === Approach 2: Enum-based with trait for evaluation ===
trait Eval {
type Output;
fn eval(&self) -> Self::Output;
}
struct IntLit(i64);
struct BoolLit(bool);
struct AddExpr(Box<dyn Eval<Output = i64>>, Box<dyn Eval<Output = i64>>);
impl Eval for IntLit {
type Output = i64;
fn eval(&self) -> i64 { self.0 }
}
impl Eval for BoolLit {
type Output = bool;
fn eval(&self) -> bool { self.0 }
}
impl Eval for AddExpr {
type Output = i64;
fn eval(&self) -> i64 { self.0.eval() + self.1.eval() }
}
// === Approach 3: Type-safe heterogeneous list with tuples ===
// Rust's type system naturally supports this via nested tuples
// (42, ("hello", (true, ()))) is the HList equivalent
trait HList {}
impl HList for () {}
impl<H, T: HList> HList for (H, T) {}
trait Head {
type Item;
fn head(&self) -> &Self::Item;
}
impl<H, T: HList> Head for (H, T) {
type Item = H;
fn head(&self) -> &H { &self.0 }
}
fn main() {
// Approach 1
let e = Expr::int(42);
println!("Int 42 = {}", e.eval());
let sum = Expr::add(Expr::int(1), Expr::int(2));
println!("1 + 2 = {}", sum.eval());
let cond = if_expr(Expr::bool_val(true), Expr::int(10), Expr::int(20));
println!("if true then 10 else 20 = {}", cond.eval());
// Approach 2
let expr = AddExpr(Box::new(IntLit(10)), Box::new(IntLit(32)));
println!("10 + 32 = {}", expr.eval());
// Approach 3
let hlist = (42, ("hello", (true, ())));
println!("head = {}", hlist.head());
println!("โ All examples running");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_phantom_gadt_int() {
assert_eq!(Expr::int(42).eval(), 42);
}
#[test]
fn test_phantom_gadt_add() {
assert_eq!(Expr::add(Expr::int(1), Expr::int(2)).eval(), 3);
}
#[test]
fn test_phantom_gadt_bool() {
assert_eq!(Expr::bool_val(true).eval(), true);
}
#[test]
fn test_phantom_gadt_if() {
let e = if_expr(Expr::bool_val(true), Expr::int(10), Expr::int(20));
assert_eq!(e.eval(), 10);
let e2 = if_expr(Expr::bool_val(false), Expr::int(10), Expr::int(20));
assert_eq!(e2.eval(), 20);
}
#[test]
fn test_trait_eval() {
assert_eq!(IntLit(5).eval(), 5);
assert_eq!(BoolLit(false).eval(), false);
assert_eq!(AddExpr(Box::new(IntLit(3)), Box::new(IntLit(4))).eval(), 7);
}
#[test]
fn test_hlist() {
let hlist = (42, ("hello", (true, ())));
assert_eq!(*hlist.head(), 42);
assert_eq!(*hlist.1.head(), "hello");
assert_eq!(*hlist.1 .1.head(), true);
}
}
(* Example 176: Introduction to GADTs *)
(* GADTs (Generalized Algebraic Data Types) allow constructors to return
different type instantiations of the same type *)
(* Approach 1: Basic GADT with type-safe expressions *)
type _ expr =
| Int : int -> int expr
| Bool : bool -> bool expr
| Add : int expr * int expr -> int expr
| If : bool expr * 'a expr * 'a expr -> 'a expr
let rec eval : type a. a expr -> a = function
| Int n -> n
| Bool b -> b
| Add (a, b) -> eval a + eval b
| If (cond, t, f) -> if eval cond then eval t else eval f
(* Approach 2: GADT for type-safe formatting *)
type _ fmt =
| Lit : string -> string fmt
| FInt : int fmt
| FStr : string fmt
| Cat : 'a fmt * 'b fmt -> ('a * 'b) fmt
let rec format_to_string : type a. a fmt -> a -> string = fun fmt v ->
match fmt with
| Lit s -> s
| FInt -> string_of_int v
| FStr -> v
| Cat (a, b) ->
let (va, vb) = v in
format_to_string a va ^ format_to_string b vb
(* Approach 3: GADT for type-safe heterogeneous lists *)
type _ hlist =
| HNil : unit hlist
| HCons : 'a * 'b hlist -> ('a * 'b) hlist
let example_list = HCons (42, HCons ("hello", HCons (true, HNil)))
let hd : type a b. (a * b) hlist -> a = function
| HCons (x, _) -> x
(* Tests *)
let () =
(* Test Approach 1 *)
assert (eval (Int 42) = 42);
assert (eval (Bool true) = true);
assert (eval (Add (Int 1, Int 2)) = 3);
assert (eval (If (Bool true, Int 10, Int 20)) = 10);
assert (eval (If (Bool false, Int 10, Int 20)) = 20);
(* Test Approach 2 *)
assert (format_to_string (Lit "hello") "hello" = "hello");
assert (format_to_string FInt 42 = "42");
assert (format_to_string FStr "world" = "world");
assert (format_to_string (Cat (Lit "n=", FInt)) ("n=", 42) = "n=42");
(* Test Approach 3 *)
assert (hd example_list = 42);
print_endline "โ All tests passed"
๐ Detailed Comparison
Comparison: Example 176 โ Introduction to GADTs
GADT Type Definition
OCaml
๐ช Show OCaml equivalent
type _ expr =
| Int : int -> int expr
| Bool : bool -> bool expr
| Add : int expr * int expr -> int expr
| If : bool expr * 'a expr * 'a expr -> 'a exprRust
use std::marker::PhantomData;
enum ExprInner {
Int(i64),
Bool(bool),
Add(Box<ExprInner>, Box<ExprInner>),
If(Box<ExprInner>, Box<ExprInner>, Box<ExprInner>),
}
struct Expr<T> {
inner: ExprInner,
_phantom: PhantomData<T>,
}Type-Safe Evaluation
OCaml
๐ช Show OCaml equivalent
let rec eval : type a. a expr -> a = function
| Int n -> n
| Bool b -> b
| Add (a, b) -> eval a + eval b
| If (cond, t, f) -> if eval cond then eval t else eval fRust
impl Expr<i64> {
fn eval(&self) -> i64 {
match &self.inner {
ExprInner::Int(n) => *n,
ExprInner::Add(a, b) => { /* reconstruct typed wrappers */ }
_ => unreachable!(),
}
}
}
impl Expr<bool> {
fn eval(&self) -> bool {
match &self.inner {
ExprInner::Bool(b) => *b,
_ => unreachable!(),
}
}
}Trait-Based Alternative (Rust)
Rust
trait Eval {
type Output;
fn eval(&self) -> Self::Output;
}
struct IntLit(i64);
impl Eval for IntLit {
type Output = i64;
fn eval(&self) -> i64 { self.0 }
}
struct AddExpr(Box<dyn Eval<Output = i64>>, Box<dyn Eval<Output = i64>>);
impl Eval for AddExpr {
type Output = i64;
fn eval(&self) -> i64 { self.0.eval() + self.1.eval() }
}Heterogeneous Lists
OCaml
๐ช Show OCaml equivalent
type _ hlist =
| HNil : unit hlist
| HCons : 'a * 'b hlist -> ('a * 'b) hlist
let hd : type a b. (a * b) hlist -> a = function
| HCons (x, _) -> xRust
trait HList {}
impl HList for () {}
impl<H, T: HList> HList for (H, T) {}
trait Head {
type Item;
fn head(&self) -> &Self::Item;
}
impl<H, T: HList> Head for (H, T) {
type Item = H;
fn head(&self) -> &H { &self.0 }
}
// Usage: (42, ("hello", (true, ())))