709 Fundamental

unions in rust

Functional Programming

Tutorial

The Problem

This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming — writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.

🎯 Learning Outcomes

• The specific unsafe feature demonstrated: unions in rust

• When this feature is necessary vs when safe alternatives exist

• How to use it correctly with appropriate SAFETY documentation

• The invariants that must be maintained for the operation to be sound

• Real-world contexts: embedded systems, OS kernels, C FFI, performance-critical code

Code Example

/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
    Int(i64),
    Float(f64),
    Bool(bool),
}

impl ValueEnum {
    pub fn describe(&self) -> String {
        match self {
            ValueEnum::Int(n)   => format!("Int({n})"),
            ValueEnum::Float(f) => format!("Float({f})"),
            ValueEnum::Bool(b)  => format!("Bool({b})"),
        }
    }
}

(* OCaml: algebraic variants ARE safe tagged unions.
   The compiler tracks the discriminant and guarantees exhaustive matching. *)

type value =
  | Int   of int
  | Float of float
  | Bool  of bool

let describe (v : value) : string =
  match v with
  | Int   n -> Printf.sprintf "Int(%d)"   n
  | Float f -> Printf.sprintf "Float(%g)" f
  | Bool  b -> Printf.sprintf "Bool(%b)"  b

let size_of_value (v : value) : int =
  match v with
  | Int   _ -> 8
  | Float _ -> 8
  | Bool  _ -> 1

let () =
  let vals = [Int 42; Float 3.14; Bool true; Int (-7)] in
  List.iter (fun v ->
    Printf.printf "%s (size=%d)\n" (describe v) (size_of_value v)
  ) vals

Key Differences

Safety model: Rust requires explicit unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.

FFI approach: Rust uses raw C types directly with extern "C"; OCaml uses ctypes which wraps C types in OCaml values.

Memory control: Rust allows complete control over memory layout (#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.

Auditability: Rust unsafe regions are syntactically visible and toolable; OCaml unsafe operations (Obj.magic, direct C calls) are also explicit but less common.

OCaml Approach

OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:

• C FFI via the ctypes library for external function calls

• Bigarray for controlled raw memory access

• The GC for memory management (no manual allocators needed)

• Bytes.t for mutable byte sequences

OCaml programs rarely need operations equivalent to these Rust unsafe patterns.

Full Source

#![allow(clippy::all)]
//! 709 — Unions in Rust: C-style Tagged Unions
//!
//! Raw `union` + enum tag = safe tagged union.
//! This is exactly what OCaml's algebraic data types are at the hardware level,
//! except OCaml hides the tag and dispatch from you. Here we write it explicitly.

// ---------------------------------------------------------------------------
// Raw union — all fields overlap at the same memory address.
// Only usable inside `unsafe` blocks.
// ---------------------------------------------------------------------------

/// Untagged union: all fields share the same memory location.
/// Reading the wrong field after writing another is undefined behaviour.
#[repr(C)]
union RawValue {
    int_val: i64,
    float_val: f64,
    bool_val: u8,
}

// ---------------------------------------------------------------------------
// Tag enum — tracks which field of the union is currently valid.
// ---------------------------------------------------------------------------

/// Discriminant tracking which field is active.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Tag {
    Int,
    Float,
    Bool,
}

// ---------------------------------------------------------------------------
// Safe tagged union — pairs the raw union with its discriminant.
// All unsafe access is confined to these methods.
// ---------------------------------------------------------------------------

/// Safe tagged union: an enum tag guards all reads of the raw union.
pub struct Value {
    tag: Tag,
    data: RawValue,
}

impl Value {
    /// Construct a `Value` holding an integer.
    pub fn int(n: i64) -> Self {
        Value {
            tag: Tag::Int,
            data: RawValue { int_val: n },
        }
    }

    /// Construct a `Value` holding a float.
    pub fn float(f: f64) -> Self {
        Value {
            tag: Tag::Float,
            data: RawValue { float_val: f },
        }
    }

    /// Construct a `Value` holding a boolean.
    pub fn bool(b: bool) -> Self {
        Value {
            tag: Tag::Bool,
            data: RawValue { bool_val: b as u8 },
        }
    }

    /// Return the integer if the tag is `Int`, otherwise `None`.
    pub fn as_int(&self) -> Option<i64> {
        if self.tag == Tag::Int {
            // SAFETY: we just checked the tag is Int, so int_val was the last
            // field written and its bits are valid for i64.
            Some(unsafe { self.data.int_val })
        } else {
            None
        }
    }

    /// Return the float if the tag is `Float`, otherwise `None`.
    pub fn as_float(&self) -> Option<f64> {
        if self.tag == Tag::Float {
            // SAFETY: tag is Float, so float_val is the active field.
            Some(unsafe { self.data.float_val })
        } else {
            None
        }
    }

    /// Return the bool if the tag is `Bool`, otherwise `None`.
    pub fn as_bool(&self) -> Option<bool> {
        if self.tag == Tag::Bool {
            // SAFETY: tag is Bool; u8 non-zero → true, zero → false.
            Some(unsafe { self.data.bool_val != 0 })
        } else {
            None
        }
    }

    /// The active tag for this value.
    pub fn tag(&self) -> Tag {
        self.tag
    }

    /// Human-readable description — mirrors the OCaml `describe` function.
    pub fn describe(&self) -> String {
        match self.tag {
            Tag::Int => format!("Int({})", unsafe { self.data.int_val }),
            Tag::Float => format!("Float({})", unsafe { self.data.float_val }),
            Tag::Bool => format!("Bool({})", unsafe { self.data.bool_val != 0 }),
        }
    }

    /// Size in bytes of the stored value — mirrors OCaml `size_of_value`.
    pub fn size_of_stored(&self) -> usize {
        match self.tag {
            Tag::Int => 8,
            Tag::Float => 8,
            Tag::Bool => 1,
        }
    }
}

// ---------------------------------------------------------------------------
// Idiomatic Rust equivalent: just use an enum.
// In most Rust code you would never touch a raw union directly.
// ---------------------------------------------------------------------------

/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
    Int(i64),
    Float(f64),
    Bool(bool),
}

impl ValueEnum {
    pub fn describe(&self) -> String {
        match self {
            ValueEnum::Int(n) => format!("Int({n})"),
            ValueEnum::Float(f) => format!("Float({f})"),
            ValueEnum::Bool(b) => format!("Bool({b})"),
        }
    }

    pub fn size_of_stored(&self) -> usize {
        match self {
            ValueEnum::Int(_) => 8,
            ValueEnum::Float(_) => 8,
            ValueEnum::Bool(_) => 1,
        }
    }
}

// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------

#[cfg(test)]
mod tests {
    use super::*;

    // --- Tagged-union (manual) tests ---

    #[test]
    fn test_int_value_round_trip() {
        let v = Value::int(42);
        assert_eq!(v.tag(), Tag::Int);
        assert_eq!(v.as_int(), Some(42));
        assert_eq!(v.as_float(), None);
        assert_eq!(v.as_bool(), None);
    }

    #[test]
    fn test_float_value_round_trip() {
        let v = Value::float(3.14);
        assert_eq!(v.tag(), Tag::Float);
        assert!(v.as_float().is_some());
        assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
        assert_eq!(v.as_int(), None);
        assert_eq!(v.as_bool(), None);
    }

    #[test]
    fn test_bool_value_round_trip() {
        let t = Value::bool(true);
        assert_eq!(t.tag(), Tag::Bool);
        assert_eq!(t.as_bool(), Some(true));

        let f = Value::bool(false);
        assert_eq!(f.as_bool(), Some(false));
        assert_eq!(f.as_int(), None);
    }

    #[test]
    fn test_negative_int() {
        let v = Value::int(-7);
        assert_eq!(v.as_int(), Some(-7));
        assert_eq!(v.describe(), "Int(-7)");
    }

    #[test]
    fn test_describe_and_size() {
        let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
        let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
        assert_eq!(descriptions[0], "Int(42)");
        assert!(descriptions[1].starts_with("Float("));
        assert_eq!(descriptions[2], "Bool(true)");

        assert_eq!(vals[0].size_of_stored(), 8);
        assert_eq!(vals[1].size_of_stored(), 8);
        assert_eq!(vals[2].size_of_stored(), 1);
    }

    #[test]
    fn test_cross_field_isolation() {
        // Writing int then reading float must return None (tag guard prevents it).
        let v = Value::int(100);
        assert_eq!(v.as_float(), None);
        assert_eq!(v.as_bool(), None);
    }

    // --- Idiomatic enum tests ---

    #[test]
    fn test_enum_describe() {
        assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
        assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
    }

    #[test]
    fn test_enum_size_of_stored() {
        assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
        assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
        assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
    }

    #[test]
    fn test_enum_equality() {
        assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
        assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
    }
}

(* OCaml: algebraic variants ARE safe tagged unions. The compiler
   tracks the discriminant and guarantees exhaustive matching. *)

type value =
  | Int   of int
  | Float of float
  | Bool  of bool

let describe (v : value) : string =
  match v with
  | Int   n -> Printf.sprintf "Int(%d)"   n
  | Float f -> Printf.sprintf "Float(%g)" f
  | Bool  b -> Printf.sprintf "Bool(%b)"  b

let size_of_value (v : value) : int =
  match v with
  | Int   _ -> 8
  | Float _ -> 8
  | Bool  _ -> 1

let () =
  let vals = [Int 42; Float 3.14; Bool true; Int (-7)] in
  List.iter (fun v ->
    Printf.printf "%s (size=%d)\n" (describe v) (size_of_value v)
  ) vals

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    // --- Tagged-union (manual) tests ---

    #[test]
    fn test_int_value_round_trip() {
        let v = Value::int(42);
        assert_eq!(v.tag(), Tag::Int);
        assert_eq!(v.as_int(), Some(42));
        assert_eq!(v.as_float(), None);
        assert_eq!(v.as_bool(), None);
    }

    #[test]
    fn test_float_value_round_trip() {
        let v = Value::float(3.14);
        assert_eq!(v.tag(), Tag::Float);
        assert!(v.as_float().is_some());
        assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
        assert_eq!(v.as_int(), None);
        assert_eq!(v.as_bool(), None);
    }

    #[test]
    fn test_bool_value_round_trip() {
        let t = Value::bool(true);
        assert_eq!(t.tag(), Tag::Bool);
        assert_eq!(t.as_bool(), Some(true));

        let f = Value::bool(false);
        assert_eq!(f.as_bool(), Some(false));
        assert_eq!(f.as_int(), None);
    }

    #[test]
    fn test_negative_int() {
        let v = Value::int(-7);
        assert_eq!(v.as_int(), Some(-7));
        assert_eq!(v.describe(), "Int(-7)");
    }

    #[test]
    fn test_describe_and_size() {
        let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
        let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
        assert_eq!(descriptions[0], "Int(42)");
        assert!(descriptions[1].starts_with("Float("));
        assert_eq!(descriptions[2], "Bool(true)");

        assert_eq!(vals[0].size_of_stored(), 8);
        assert_eq!(vals[1].size_of_stored(), 8);
        assert_eq!(vals[2].size_of_stored(), 1);
    }

    #[test]
    fn test_cross_field_isolation() {
        // Writing int then reading float must return None (tag guard prevents it).
        let v = Value::int(100);
        assert_eq!(v.as_float(), None);
        assert_eq!(v.as_bool(), None);
    }

    // --- Idiomatic enum tests ---

    #[test]
    fn test_enum_describe() {
        assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
        assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
    }

    #[test]
    fn test_enum_size_of_stored() {
        assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
        assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
        assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
    }

    #[test]
    fn test_enum_equality() {
        assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
        assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
    }
}

Deep Comparison

OCaml vs Rust: Unions / Tagged Unions

Side-by-Side Code

OCaml

(* OCaml: algebraic variants ARE safe tagged unions.
   The compiler tracks the discriminant and guarantees exhaustive matching. *)

type value =
  | Int   of int
  | Float of float
  | Bool  of bool

let describe (v : value) : string =
  match v with
  | Int   n -> Printf.sprintf "Int(%d)"   n
  | Float f -> Printf.sprintf "Float(%g)" f
  | Bool  b -> Printf.sprintf "Bool(%b)"  b

let size_of_value (v : value) : int =
  match v with
  | Int   _ -> 8
  | Float _ -> 8
  | Bool  _ -> 1

let () =
  let vals = [Int 42; Float 3.14; Bool true; Int (-7)] in
  List.iter (fun v ->
    Printf.printf "%s (size=%d)\n" (describe v) (size_of_value v)
  ) vals

Rust — idiomatic enum (OCaml-equivalent)

/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
    Int(i64),
    Float(f64),
    Bool(bool),
}

impl ValueEnum {
    pub fn describe(&self) -> String {
        match self {
            ValueEnum::Int(n)   => format!("Int({n})"),
            ValueEnum::Float(f) => format!("Float({f})"),
            ValueEnum::Bool(b)  => format!("Bool({b})"),
        }
    }
}

Rust — explicit tagged union (raw `union` + enum tag)

#[repr(C)]
union RawValue {
    int_val:   i64,
    float_val: f64,
    bool_val:  u8,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Tag { Int, Float, Bool }

pub struct Value {
    tag:  Tag,
    data: RawValue,
}

impl Value {
    pub fn int(n: i64) -> Self {
        Value { tag: Tag::Int, data: RawValue { int_val: n } }
    }

    pub fn as_int(&self) -> Option<i64> {
        if self.tag == Tag::Int {
            // SAFETY: tag confirmed, int_val is the active field.
            Some(unsafe { self.data.int_val })
        } else {
            None
        }
    }
}

Type Signatures

Concept	OCaml	Rust (enum)	Rust (raw union)
Variant type	`type value = Int of int \\| Float of float \\| Bool of bool`	`enum ValueEnum { Int(i64), Float(f64), Bool(bool) }`	`union RawValue { int_val: i64, float_val: f64, bool_val: u8 }`
Accessor	pattern match	pattern match	`unsafe { union.int_val }` guarded by tag
Tag tracking	implicit (compiler)	implicit (compiler)	explicit `enum Tag` field
Safety	always safe	always safe	requires `unsafe`
C-ABI compatible	no	no	yes (with `#[repr(C)]`)

Key Insights

OCaml variants = tagged unions under the hood. Every OCaml algebraic type is represented as a tag word plus a payload. The compiler manages both invisibly; you only see safe pattern matching.

**Rust enum is the idiomatic equivalent.** For almost all Rust code, enum is the right choice — the compiler handles the tag, guarantees exhaustive matching, and the code is always safe.

**Raw union exists for C interop.** When you need a repr(C) struct that maps byte-for-byte to a C union definition, you use Rust's union. Every field access requires unsafe because the compiler cannot know which field is live.

The safe-wrapper pattern. Pair the raw union with an enum discriminant in an outer struct and expose Option-returning methods. All unsafe stays inside these methods; callers never see it. This is the Rust analogue of what OCaml's runtime does automatically.

Memory layout control. #[repr(C)] unions guarantee a specific layout, enabling zero-cost FFI with C libraries that use union fields — something OCaml variants cannot provide directly.

When to Use Each Style

**Use enum (idiomatic Rust) when:** you are writing pure Rust and need a type-safe sum type. This is the default and the right choice 99 % of the time.

**Use raw union when:** you are writing FFI bindings that must match a C union layout exactly, or building low-level data structures (e.g., a JIT compiler's value representation) where you need to control every byte of memory and are prepared to manage the tag yourself.

Exercises

Minimize unsafe: Find the smallest possible unsafe region in the source and verify that all safe code is outside the unsafe block.

Safe alternative: Identify if a safe alternative exists for the demonstrated technique (e.g., bytemuck for transmute, CString for FFI strings) and implement it.

SAFETY documentation: Write a complete SAFETY comment for each unsafe block listing preconditions, invariants, and what would break if violated.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust