Trie-based Persistent Vector for O(log n) Random Access
Tutorial
The Problem
Purely functional data structures must never mutate in place — every update produces a new version while preserving the old one. Naive approaches copy the entire structure on each update, giving O(n) time. A trie-based persistent vector solves this by sharing unchanged subtrees between versions, achieving O(log n) reads and writes while remaining fully immutable. This structure underpins Clojure's persistent vector and any system requiring cheap snapshots or time-travel debugging.
🎯 Learning Outcomes
Arc in Rust enables safe structural sharing without a garbage collectorCode Example
#[derive(Debug, Clone, PartialEq)]
pub enum PVec<T> {
Nil,
One(T),
Two(Rc<PVec<T>>, Rc<PVec<T>>),
}
impl<T: Clone> PVec<T> {
pub fn set(&self, i: usize, v: T) -> Option<Self> {
match self {
PVec::Two(l, r) => {
let ls = l.size();
if i < ls {
// Clone only the path; share the unchanged subtree via Rc::clone.
l.set(i, v).map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
} else {
r.set(i - ls, v).map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
}
}
// ...
}
}
}Key Differences
Arc<T> with explicit reference counting and no GC pauses.Arc wrapping to opt into sharing.Arc has atomic overhead on clone/drop; OCaml's GC has periodic pause overhead — different latency trade-offs.Box or Arc to break recursive type size cycles.OCaml Approach
OCaml's garbage collector handles structural sharing automatically — the GC keeps shared nodes alive as long as any version references them. Persistent vectors are typically implemented as trees of arrays (type 'a t = Leaf of 'a array | Node of 'a t array). Because OCaml values are immutable by default, sharing is the natural result of returning old sub-nodes without copying.
Full Source
#![allow(dead_code)]
//! Trie-based Persistent Vector for O(log n) Random Access
//! See example.ml for OCaml reference
//!
//! A purely functional persistent vector backed by a balanced binary tree.
//! Structural sharing via `Rc` means `set` only clones the O(log n) path nodes.
use std::rc::Rc;
/// Persistent vector backed by a balanced binary tree.
///
/// Each `set` shares unchanged subtrees via `Rc`, so old versions remain fully
/// accessible after an update — purely functional update semantics.
#[derive(Debug, Clone, PartialEq)]
pub enum PVec<T> {
Nil,
One(T),
Two(Rc<PVec<T>>, Rc<PVec<T>>),
}
impl<T: Clone> PVec<T> {
/// Create an empty persistent vector.
pub fn empty() -> Self {
PVec::Nil
}
/// Number of elements in the vector (traverses the tree: O(n)).
pub fn size(&self) -> usize {
match self {
PVec::Nil => 0,
PVec::One(_) => 1,
PVec::Two(l, r) => l.size() + r.size(),
}
}
/// Get the element at index `i`. Returns `None` if out of bounds.
pub fn get(&self, i: usize) -> Option<&T> {
match self {
PVec::Nil => None,
PVec::One(x) => (i == 0).then_some(x),
PVec::Two(l, r) => {
let ls = l.size();
if i < ls {
l.get(i)
} else {
r.get(i - ls)
}
}
}
}
/// Return a new vector with the element at index `i` replaced by `v`.
/// Unchanged subtrees are shared with the original via `Rc` (structural sharing).
/// Returns `None` if `i` is out of bounds.
pub fn set(&self, i: usize, v: T) -> Option<Self> {
match self {
PVec::Nil => None,
PVec::One(_) => (i == 0).then(|| PVec::One(v)),
PVec::Two(l, r) => {
let ls = l.size();
if i < ls {
// Update left subtree; share right subtree as-is.
l.set(i, v)
.map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
} else {
// Update right subtree; share left subtree as-is.
r.set(i - ls, v)
.map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
}
}
}
}
/// Build a persistent vector from a slice by halving recursively.
/// Produces a balanced tree with O(log n) depth.
pub fn from_slice(items: &[T]) -> Self {
match items {
[] => PVec::Nil,
[x] => PVec::One(x.clone()),
_ => {
let mid = items.len() / 2;
PVec::Two(
Rc::new(Self::from_slice(&items[..mid])),
Rc::new(Self::from_slice(&items[mid..])),
)
}
}
}
}
/// Simple recursive version using `Box` — mirrors OCaml style directly.
/// No structural sharing: `set` deep-clones unchanged subtrees.
#[derive(Debug, Clone, PartialEq)]
pub enum PVecBox<T> {
Nil,
One(T),
Two(Box<PVecBox<T>>, Box<PVecBox<T>>),
}
impl<T: Clone> PVecBox<T> {
pub fn size(&self) -> usize {
match self {
PVecBox::Nil => 0,
PVecBox::One(_) => 1,
PVecBox::Two(l, r) => l.size() + r.size(),
}
}
pub fn get(&self, i: usize) -> Option<&T> {
match self {
PVecBox::Nil => None,
PVecBox::One(x) => (i == 0).then_some(x),
PVecBox::Two(l, r) => {
let ls = l.size();
if i < ls {
l.get(i)
} else {
r.get(i - ls)
}
}
}
}
pub fn set(&self, i: usize, v: T) -> Option<Self> {
match self {
PVecBox::Nil => None,
PVecBox::One(_) => (i == 0).then(|| PVecBox::One(v)),
PVecBox::Two(l, r) => {
let ls = l.size();
if i < ls {
l.set(i, v)
.map(|new_l| PVecBox::Two(Box::new(new_l), r.clone()))
} else {
r.set(i - ls, v)
.map(|new_r| PVecBox::Two(l.clone(), Box::new(new_r)))
}
}
}
}
pub fn from_slice(items: &[T]) -> Self {
match items {
[] => PVecBox::Nil,
[x] => PVecBox::One(x.clone()),
_ => {
let mid = items.len() / 2;
PVecBox::Two(
Box::new(Self::from_slice(&items[..mid])),
Box::new(Self::from_slice(&items[mid..])),
)
}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_empty_vec() {
let v: PVec<i32> = PVec::empty();
assert_eq!(v.size(), 0);
assert_eq!(v.get(0), None);
assert_eq!(v.set(0, 42), None);
}
#[test]
fn test_single_element() {
let v = PVec::from_slice(&[42_i32]);
assert_eq!(v.size(), 1);
assert_eq!(v.get(0), Some(&42));
assert_eq!(v.get(1), None);
}
#[test]
fn test_multiple_get() {
let v = PVec::from_slice(&[10_i32, 20, 30, 40, 50]);
assert_eq!(v.size(), 5);
assert_eq!(v.get(0), Some(&10));
assert_eq!(v.get(2), Some(&30));
assert_eq!(v.get(4), Some(&50));
assert_eq!(v.get(5), None);
}
#[test]
fn test_set_returns_new_version_old_unchanged() {
let v = PVec::from_slice(&[10_i32, 20, 30, 40, 50]);
let v2 = v.set(2, 99).expect("index 2 is valid");
// New version has updated value.
assert_eq!(v2.get(2), Some(&99));
// Old version unchanged — persistence guarantees.
assert_eq!(v.get(2), Some(&30));
}
#[test]
fn test_set_out_of_bounds_returns_none() {
let v = PVec::from_slice(&[1_i32, 2, 3]);
assert_eq!(v.set(10, 99), None);
}
#[test]
fn test_multiple_independent_versions() {
let v0 = PVec::from_slice(&[1_i32, 2, 3]);
let v1 = v0.set(0, 10).unwrap();
let v2 = v0.set(1, 20).unwrap();
// v0 is unchanged.
assert_eq!(v0.get(0), Some(&1));
// v1 and v2 diverged from v0 independently.
assert_eq!(v1.get(0), Some(&10));
assert_eq!(v2.get(1), Some(&20));
assert_eq!(v1.get(1), Some(&2));
}
#[test]
fn test_box_version_matches_rc_version() {
let data = [10_i32, 20, 30, 40, 50];
let rc_v = PVec::from_slice(&data);
let box_v = PVecBox::from_slice(&data);
for i in 0..data.len() {
assert_eq!(rc_v.get(i), box_v.get(i));
}
}
#[test]
fn test_all_indices_accessible() {
let data: Vec<i32> = (0..8).collect();
let v = PVec::from_slice(&data);
assert_eq!(v.size(), 8);
for (i, &expected) in data.iter().enumerate() {
assert_eq!(v.get(i), Some(&expected));
}
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_empty_vec() {
let v: PVec<i32> = PVec::empty();
assert_eq!(v.size(), 0);
assert_eq!(v.get(0), None);
assert_eq!(v.set(0, 42), None);
}
#[test]
fn test_single_element() {
let v = PVec::from_slice(&[42_i32]);
assert_eq!(v.size(), 1);
assert_eq!(v.get(0), Some(&42));
assert_eq!(v.get(1), None);
}
#[test]
fn test_multiple_get() {
let v = PVec::from_slice(&[10_i32, 20, 30, 40, 50]);
assert_eq!(v.size(), 5);
assert_eq!(v.get(0), Some(&10));
assert_eq!(v.get(2), Some(&30));
assert_eq!(v.get(4), Some(&50));
assert_eq!(v.get(5), None);
}
#[test]
fn test_set_returns_new_version_old_unchanged() {
let v = PVec::from_slice(&[10_i32, 20, 30, 40, 50]);
let v2 = v.set(2, 99).expect("index 2 is valid");
// New version has updated value.
assert_eq!(v2.get(2), Some(&99));
// Old version unchanged — persistence guarantees.
assert_eq!(v.get(2), Some(&30));
}
#[test]
fn test_set_out_of_bounds_returns_none() {
let v = PVec::from_slice(&[1_i32, 2, 3]);
assert_eq!(v.set(10, 99), None);
}
#[test]
fn test_multiple_independent_versions() {
let v0 = PVec::from_slice(&[1_i32, 2, 3]);
let v1 = v0.set(0, 10).unwrap();
let v2 = v0.set(1, 20).unwrap();
// v0 is unchanged.
assert_eq!(v0.get(0), Some(&1));
// v1 and v2 diverged from v0 independently.
assert_eq!(v1.get(0), Some(&10));
assert_eq!(v2.get(1), Some(&20));
assert_eq!(v1.get(1), Some(&2));
}
#[test]
fn test_box_version_matches_rc_version() {
let data = [10_i32, 20, 30, 40, 50];
let rc_v = PVec::from_slice(&data);
let box_v = PVecBox::from_slice(&data);
for i in 0..data.len() {
assert_eq!(rc_v.get(i), box_v.get(i));
}
}
#[test]
fn test_all_indices_accessible() {
let data: Vec<i32> = (0..8).collect();
let v = PVec::from_slice(&data);
assert_eq!(v.size(), 8);
for (i, &expected) in data.iter().enumerate() {
assert_eq!(v.get(i), Some(&expected));
}
}
}
Deep Comparison
OCaml vs Rust: Trie-based Persistent Vector
Side-by-Side Code
OCaml
type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec
let rec size = function
| Nil -> 0 | One _ -> 1
| Two (l, r) -> size l + size r
let rec get i = function
| One x -> if i = 0 then x else failwith "index"
| Two (l, r) ->
let ls = size l in
if i < ls then get i l else get (i - ls) r
| Nil -> failwith "empty"
let rec set i v = function
| One _ -> if i = 0 then One v else failwith "index"
| Two (l, r) ->
let ls = size l in
if i < ls then Two (set i v l, r)
else Two (l, set (i - ls) v r)
| Nil -> failwith "empty"
Rust (idiomatic — Rc for structural sharing)
#[derive(Debug, Clone, PartialEq)]
pub enum PVec<T> {
Nil,
One(T),
Two(Rc<PVec<T>>, Rc<PVec<T>>),
}
impl<T: Clone> PVec<T> {
pub fn set(&self, i: usize, v: T) -> Option<Self> {
match self {
PVec::Two(l, r) => {
let ls = l.size();
if i < ls {
// Clone only the path; share the unchanged subtree via Rc::clone.
l.set(i, v).map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
} else {
r.set(i - ls, v).map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
}
}
// ...
}
}
}
Rust (functional/recursive — Box, no sharing)
#[derive(Debug, Clone, PartialEq)]
pub enum PVecBox<T> {
Nil,
One(T),
Two(Box<PVecBox<T>>, Box<PVecBox<T>>),
}
// set() must deep-clone unchanged subtrees — no structural sharing.
Type Signatures
| Concept | OCaml | Rust |
|---|---|---|
| Tree type | type 'a pvec = Nil \| One of 'a \| Two of 'a pvec * 'a pvec | enum PVec<T> { Nil, One(T), Two(Rc<PVec<T>>, Rc<PVec<T>>) } |
| Structural sharing | GC (automatic) | Rc<T> (reference counting) |
| set | returns new tree, shares old branches | returns Option<PVec<T>>, shares via Rc::clone |
| get | raises exception on out-of-bounds | returns Option<&T> |
| Recursive type | automatic | requires Rc<PVec<T>> or Box<PVecBox<T>> |
Key Insights
Rc<T> (reference counting) for the same effect — but with explicit Rc::clone and deterministic drop.set updates index i, only the O(log n) nodes on the path from root to leaf are re-allocated; unchanged subtrees are shared via Rc::clone (just increments a counter).Rc value — you cannot obtain a mutable reference to a shared node, so the persistence invariant cannot be violated even accidentally.Box<T> gives unique ownership (no sharing, must deep-clone on update); Rc<T> allows shared ownership (structural sharing, O(log n) updates). OCaml uses the same node for both via GC.failwith "index" for out-of-bounds; Rust returns Option<&T> or Option<Self>, making the caller handle the error at compile time.When to Use Each Style
**Use Rc-based PVec when:** you need genuine structural sharing for time-travel debugging, undo/redo stacks, or persistent data structure semantics in production.
**Use Box-based PVecBox when:** teaching the OCaml parallel without the complexity of reference counting, or in cases where versions are never shared simultaneously.
Exercises
get(index) on a simple binary trie (branching factor 2) and verify O(log n) node visits with a counter.set(index, value) returning a new root, sharing all unchanged subtrees via Arc::clone.push_back that grows the trie by one level when the current depth is exhausted.