๐Ÿฆ€ Functional Rust

368: Persistent Data Structures

Difficulty: 4 Level: Expert Preserve previous versions of a data structure after modification via structural sharing โ€” without copying the whole structure.

The Problem This Solves

You're building an editor with undo history, a version-controlled key-value store, or a functional language interpreter. Every modification creates a new "version" of the data. Copying the entire structure on every change is O(n) per operation โ€” prohibitively slow for large structures. Persistent data structures solve this with structural sharing. When you modify a node in a tree, you create a new path from root to the modified node โ€” but all other subtrees are shared (via reference counting) between the old and new versions. A persistent binary tree of n nodes produces a new version in O(log n) by creating O(log n) new nodes and reusing the other n - log n nodes unchanged. In functional languages like OCaml or Haskell, all data structures are persistent by default โ€” immutability is the norm. In Rust, mutation is the norm and persistence is opt-in via explicit `Rc` (single-threaded) or `Arc` (multi-threaded) shared ownership.

The Intuition

Imagine a git commit tree. Each commit shares history with its parent; branching doesn't copy all files. Persistent data structures work the same way: a new "version" shares unmodified subtrees with the previous version. The key tool in Rust is `Rc<T>` (or `Arc<T>` for thread safety). Cloning an `Rc` is O(1) โ€” it just increments a reference count. The underlying data isn't copied. When all `Rc` handles to a node go away, the node is freed. This gives you structural sharing automatically. The cost: every node access goes through a pointer (cache miss potential), and mutating shared data requires `Rc::make_mut` which may clone if the reference count > 1 โ€” the copy-on-write semantic.

How It Works in Rust

use std::rc::Rc;

// A persistent cons-list (the classic functional data structure)
// Each node is either empty or a value + shared tail
#[derive(Clone)]
enum List<T: Clone> {
 Nil,
 Cons(T, Rc<List<T>>),
}

impl<T: Clone> List<T> {
 fn empty() -> Rc<Self> { Rc::new(List::Nil) }

 // Prepend: O(1), shares the entire original list
 fn cons(head: T, tail: Rc<Self>) -> Rc<Self> {
     Rc::new(List::Cons(head, tail))
 }

 fn head(&self) -> Option<&T> {
     match self { List::Cons(v, _) => Some(v), List::Nil => None }
 }

 fn tail(&self) -> Option<Rc<Self>> {
     match self { List::Cons(_, t) => Some(Rc::clone(t)), List::Nil => None }
 }
}

// Two lists can share a common tail โ€” no copying
let shared_tail = List::cons(3, List::cons(4, List::empty()));
let list_a = List::cons(1, Rc::clone(&shared_tail)); // [1,3,4]
let list_b = List::cons(2, Rc::clone(&shared_tail)); // [2,3,4]
// shared_tail's memory is kept alive by both list_a and list_b

// Persistent binary tree (immutable BST)
#[derive(Clone)]
enum Tree<T: Clone + Ord> {
 Leaf,
 Node { val: T, left: Rc<Tree<T>>, right: Rc<Tree<T>> },
}

impl<T: Clone + Ord> Tree<T> {
 fn insert(&self, v: T) -> Rc<Tree<T>> {
     match self {
         Tree::Leaf => Rc::new(Tree::Node {
             val: v,
             left: Rc::new(Tree::Leaf),
             right: Rc::new(Tree::Leaf),
         }),
         Tree::Node { val, left, right } => {
             if v < *val {
                 // Only creates new nodes along the left path โ€” O(log n) new nodes
                 // The right subtree is shared unchanged
                 Rc::new(Tree::Node {
                     val: val.clone(),
                     left: left.insert(v),
                     right: Rc::clone(right), // shared, not copied
                 })
             } else {
                 Rc::new(Tree::Node {
                     val: val.clone(),
                     left: Rc::clone(left),  // shared, not copied
                     right: right.insert(v),
                 })
             }
         }
     }
 }
}

// Version history: each insert produces a new root, old roots still valid
let v0 = Rc::new(Tree::Leaf);
let v1 = v0.insert(5);
let v2 = v1.insert(3);
let v3 = v2.insert(7);
// v0, v1, v2, v3 all valid simultaneously; shared subtrees not duplicated

What This Unlocks

Key Differences

ConceptOCamlRust
Immutable by defaultyes โ€” all values are immutableno โ€” mutation is the default
Structural sharingautomatic (GC manages lifetime)explicit `Rc<T>` or `Arc<T>`
Persistent listbuilt-in `list` type`Rc<List<T>>` (custom)
Copy-on-writeN/A (immutable = always shared)`Rc::make_mut` clones if rc > 1
Thread-safe sharingGC-managed`Arc<T>` instead of `Rc<T>`
Memory reclaimgarbage collectorreference count drops to zero
use std::rc::Rc;

// Persistent linked list: structural sharing via Rc
#[derive(Clone)]
enum PList<T> {
    Nil,
    Cons(T, Rc<PList<T>>),
}

impl<T: Clone + std::fmt::Debug> PList<T> {
    fn nil() -> Rc<Self> { Rc::new(Self::Nil) }
    fn cons(head: T, tail: Rc<Self>) -> Rc<Self> { Rc::new(Self::Cons(head, tail)) }

    fn to_vec(list: &Rc<Self>) -> Vec<T> {
        let mut v = Vec::new();
        let mut cur = Rc::clone(list);
        loop {
            match cur.as_ref() {
                Self::Nil => break,
                Self::Cons(x, next) => { v.push(x.clone()); cur = Rc::clone(next); }
            }
        }
        v
    }
    fn len(list: &Rc<Self>) -> usize { Self::to_vec(list).len() }
}

// Persistent vector using path copying
#[derive(Clone, Debug)]
struct PVec<T: Clone> {
    data: Rc<Vec<T>>,
}

impl<T: Clone + std::fmt::Debug> PVec<T> {
    fn new() -> Self { Self { data: Rc::new(Vec::new()) } }
    fn push(&self, val: T) -> Self {
        let mut new_data = (*self.data).clone(); // copy-on-write
        new_data.push(val);
        Self { data: Rc::new(new_data) }
    }
    fn set(&self, i: usize, val: T) -> Self {
        let mut new_data = (*self.data).clone();
        new_data[i] = val;
        Self { data: Rc::new(new_data) }
    }
    fn get(&self, i: usize) -> Option<&T> { self.data.get(i) }
    fn len(&self) -> usize { self.data.len() }
}

fn main() {
    // Persistent list
    let list1 = PList::cons(3, PList::cons(2, PList::cons(1, PList::nil())));
    let list2 = PList::cons(4, Rc::clone(&list1)); // shares tail
    let list3 = if let PList::Cons(_, tail) = list1.as_ref() { Rc::clone(tail) } else { PList::nil() };

    println!("list1: {:?}", PList::to_vec(&list1));
    println!("list2: {:?}", PList::to_vec(&list2));
    println!("list3: {:?}", PList::to_vec(&list3));

    // Persistent vector
    let v0: PVec<i32> = PVec::new();
    let v1 = v0.push(1);
    let v2 = v1.push(2);
    let v3 = v2.set(0, 99); // v1 is unchanged
    println!("v1: {:?}", v1.data);
    println!("v2: {:?}", v2.data);
    println!("v3: {:?}", v3.data); // [99, 2]
    println!("v1 unchanged: {:?}", v1.data); // [1]
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn persistent_list() {
        let l1 = PList::cons(2, PList::cons(1, PList::nil()));
        let l2 = PList::cons(3, Rc::clone(&l1));
        assert_eq!(PList::to_vec(&l1), vec![2,1]);
        assert_eq!(PList::to_vec(&l2), vec![3,2,1]);
        assert_eq!(PList::to_vec(&l1), vec![2,1]); // l1 unchanged
    }
    #[test] fn persistent_vec() {
        let v0: PVec<i32> = PVec::new();
        let v1 = v0.push(1); let v2 = v1.push(2);
        let v3 = v2.set(0, 99);
        assert_eq!(*v1.data, vec![1]);
        assert_eq!(*v3.data, vec![99,2]);
    }
}
(* OCaml: persistent list โ€” natural! *)

let lst1 = [1;2;3;4;5]
let lst2 = 0 :: lst1  (* shares tail with lst1 *)
let lst3 = List.tl lst1  (* shares almost all of lst1 *)

let () =
  Printf.printf "lst1: [%s]\n" (String.concat ";" (List.map string_of_int lst1));
  Printf.printf "lst2: [%s]\n" (String.concat ";" (List.map string_of_int lst2));
  Printf.printf "lst3: [%s]\n" (String.concat ";" (List.map string_of_int lst3))