368: Persistent Data Structures
Difficulty: 4 Level: Expert Preserve previous versions of a data structure after modification via structural sharing โ without copying the whole structure.The Problem This Solves
You're building an editor with undo history, a version-controlled key-value store, or a functional language interpreter. Every modification creates a new "version" of the data. Copying the entire structure on every change is O(n) per operation โ prohibitively slow for large structures. Persistent data structures solve this with structural sharing. When you modify a node in a tree, you create a new path from root to the modified node โ but all other subtrees are shared (via reference counting) between the old and new versions. A persistent binary tree of n nodes produces a new version in O(log n) by creating O(log n) new nodes and reusing the other n - log n nodes unchanged. In functional languages like OCaml or Haskell, all data structures are persistent by default โ immutability is the norm. In Rust, mutation is the norm and persistence is opt-in via explicit `Rc` (single-threaded) or `Arc` (multi-threaded) shared ownership.The Intuition
Imagine a git commit tree. Each commit shares history with its parent; branching doesn't copy all files. Persistent data structures work the same way: a new "version" shares unmodified subtrees with the previous version. The key tool in Rust is `Rc<T>` (or `Arc<T>` for thread safety). Cloning an `Rc` is O(1) โ it just increments a reference count. The underlying data isn't copied. When all `Rc` handles to a node go away, the node is freed. This gives you structural sharing automatically. The cost: every node access goes through a pointer (cache miss potential), and mutating shared data requires `Rc::make_mut` which may clone if the reference count > 1 โ the copy-on-write semantic.How It Works in Rust
use std::rc::Rc;
// A persistent cons-list (the classic functional data structure)
// Each node is either empty or a value + shared tail
#[derive(Clone)]
enum List<T: Clone> {
Nil,
Cons(T, Rc<List<T>>),
}
impl<T: Clone> List<T> {
fn empty() -> Rc<Self> { Rc::new(List::Nil) }
// Prepend: O(1), shares the entire original list
fn cons(head: T, tail: Rc<Self>) -> Rc<Self> {
Rc::new(List::Cons(head, tail))
}
fn head(&self) -> Option<&T> {
match self { List::Cons(v, _) => Some(v), List::Nil => None }
}
fn tail(&self) -> Option<Rc<Self>> {
match self { List::Cons(_, t) => Some(Rc::clone(t)), List::Nil => None }
}
}
// Two lists can share a common tail โ no copying
let shared_tail = List::cons(3, List::cons(4, List::empty()));
let list_a = List::cons(1, Rc::clone(&shared_tail)); // [1,3,4]
let list_b = List::cons(2, Rc::clone(&shared_tail)); // [2,3,4]
// shared_tail's memory is kept alive by both list_a and list_b
// Persistent binary tree (immutable BST)
#[derive(Clone)]
enum Tree<T: Clone + Ord> {
Leaf,
Node { val: T, left: Rc<Tree<T>>, right: Rc<Tree<T>> },
}
impl<T: Clone + Ord> Tree<T> {
fn insert(&self, v: T) -> Rc<Tree<T>> {
match self {
Tree::Leaf => Rc::new(Tree::Node {
val: v,
left: Rc::new(Tree::Leaf),
right: Rc::new(Tree::Leaf),
}),
Tree::Node { val, left, right } => {
if v < *val {
// Only creates new nodes along the left path โ O(log n) new nodes
// The right subtree is shared unchanged
Rc::new(Tree::Node {
val: val.clone(),
left: left.insert(v),
right: Rc::clone(right), // shared, not copied
})
} else {
Rc::new(Tree::Node {
val: val.clone(),
left: Rc::clone(left), // shared, not copied
right: right.insert(v),
})
}
}
}
}
}
// Version history: each insert produces a new root, old roots still valid
let v0 = Rc::new(Tree::Leaf);
let v1 = v0.insert(5);
let v2 = v1.insert(3);
let v3 = v2.insert(7);
// v0, v1, v2, v3 all valid simultaneously; shared subtrees not duplicated
What This Unlocks
- Undo/redo systems: keep a `Vec<Rc<State>>` as your undo stack; each state shares structure with its neighbors โ memory usage grows logarithmically, not linearly.
- Functional language runtimes: implement immutable collections (list, map, set) with O(log n) update and automatic memory management via `Rc`.
- Concurrent snapshots: use `Arc<T>` instead of `Rc<T>` to share structure across threads without locking โ readers hold their own version while writers produce new roots.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Immutable by default | yes โ all values are immutable | no โ mutation is the default |
| Structural sharing | automatic (GC manages lifetime) | explicit `Rc<T>` or `Arc<T>` |
| Persistent list | built-in `list` type | `Rc<List<T>>` (custom) |
| Copy-on-write | N/A (immutable = always shared) | `Rc::make_mut` clones if rc > 1 |
| Thread-safe sharing | GC-managed | `Arc<T>` instead of `Rc<T>` |
| Memory reclaim | garbage collector | reference count drops to zero |
use std::rc::Rc;
// Persistent linked list: structural sharing via Rc
#[derive(Clone)]
enum PList<T> {
Nil,
Cons(T, Rc<PList<T>>),
}
impl<T: Clone + std::fmt::Debug> PList<T> {
fn nil() -> Rc<Self> { Rc::new(Self::Nil) }
fn cons(head: T, tail: Rc<Self>) -> Rc<Self> { Rc::new(Self::Cons(head, tail)) }
fn to_vec(list: &Rc<Self>) -> Vec<T> {
let mut v = Vec::new();
let mut cur = Rc::clone(list);
loop {
match cur.as_ref() {
Self::Nil => break,
Self::Cons(x, next) => { v.push(x.clone()); cur = Rc::clone(next); }
}
}
v
}
fn len(list: &Rc<Self>) -> usize { Self::to_vec(list).len() }
}
// Persistent vector using path copying
#[derive(Clone, Debug)]
struct PVec<T: Clone> {
data: Rc<Vec<T>>,
}
impl<T: Clone + std::fmt::Debug> PVec<T> {
fn new() -> Self { Self { data: Rc::new(Vec::new()) } }
fn push(&self, val: T) -> Self {
let mut new_data = (*self.data).clone(); // copy-on-write
new_data.push(val);
Self { data: Rc::new(new_data) }
}
fn set(&self, i: usize, val: T) -> Self {
let mut new_data = (*self.data).clone();
new_data[i] = val;
Self { data: Rc::new(new_data) }
}
fn get(&self, i: usize) -> Option<&T> { self.data.get(i) }
fn len(&self) -> usize { self.data.len() }
}
fn main() {
// Persistent list
let list1 = PList::cons(3, PList::cons(2, PList::cons(1, PList::nil())));
let list2 = PList::cons(4, Rc::clone(&list1)); // shares tail
let list3 = if let PList::Cons(_, tail) = list1.as_ref() { Rc::clone(tail) } else { PList::nil() };
println!("list1: {:?}", PList::to_vec(&list1));
println!("list2: {:?}", PList::to_vec(&list2));
println!("list3: {:?}", PList::to_vec(&list3));
// Persistent vector
let v0: PVec<i32> = PVec::new();
let v1 = v0.push(1);
let v2 = v1.push(2);
let v3 = v2.set(0, 99); // v1 is unchanged
println!("v1: {:?}", v1.data);
println!("v2: {:?}", v2.data);
println!("v3: {:?}", v3.data); // [99, 2]
println!("v1 unchanged: {:?}", v1.data); // [1]
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn persistent_list() {
let l1 = PList::cons(2, PList::cons(1, PList::nil()));
let l2 = PList::cons(3, Rc::clone(&l1));
assert_eq!(PList::to_vec(&l1), vec![2,1]);
assert_eq!(PList::to_vec(&l2), vec![3,2,1]);
assert_eq!(PList::to_vec(&l1), vec![2,1]); // l1 unchanged
}
#[test] fn persistent_vec() {
let v0: PVec<i32> = PVec::new();
let v1 = v0.push(1); let v2 = v1.push(2);
let v3 = v2.set(0, 99);
assert_eq!(*v1.data, vec![1]);
assert_eq!(*v3.data, vec![99,2]);
}
}
(* OCaml: persistent list โ natural! *)
let lst1 = [1;2;3;4;5]
let lst2 = 0 :: lst1 (* shares tail with lst1 *)
let lst3 = List.tl lst1 (* shares almost all of lst1 *)
let () =
Printf.printf "lst1: [%s]\n" (String.concat ";" (List.map string_of_int lst1));
Printf.printf "lst2: [%s]\n" (String.concat ";" (List.map string_of_int lst2));
Printf.printf "lst3: [%s]\n" (String.concat ";" (List.map string_of_int lst3))