1190 Advanced

Persistent Vector — Functional Array

Persistent Data StructuresTreesFunctional Patterns

Tutorial

The Problem

Implement a persistent (immutable) vector backed by a balanced binary tree. get reads by index; set returns a new version with one element replaced while the original remains unchanged.

🎯 Learning Outcomes

• How structural sharing with Rc enables persistent data structures without full copies

• Why Option<T> is idiomatic over panic!/failwith for out-of-bounds access

• How slice patterns ([], [x], _) replace OCaml list/variant pattern matching

• The difference between Box (exclusive ownership, deep clone on update) and Rc (shared ownership, O(log n) update cost)

🦀 The Rust Way

Rust uses an enum matching OCaml's ADT. For the idiomatic version, Rc<PVec<T>> provides reference-counted shared ownership so unchanged subtrees are genuinely shared (not copied) between versions. For the recursive version, Box<PVecRec<T>> gives simple tree ownership but requires cloning unchanged subtrees on each set. Both use Option<T> instead of panics.

Code Example

#[derive(Debug, Clone)]
pub enum PVec<T> {
    Nil,
    One(T),
    Two(Rc<PVec<T>>, Rc<PVec<T>>),
}

pub fn set(&self, i: usize, v: T) -> Option<Self> {
    match self {
        PVec::Nil => None,
        PVec::One(_) => (i == 0).then(|| PVec::One(v)),
        PVec::Two(l, r) => {
            let ls = l.size();
            if i < ls {
                l.set(i, v)
                    .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
            } else {
                r.set(i - ls, v)
                    .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
            }
        }
    }
}

type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec

let rec set i v = function
  | One _ -> if i = 0 then One v else failwith "index"
  | Two (l, r) ->
    let ls = size l in
    if i < ls then Two (set i v l, r)
    else Two (l, set (i - ls) v r)
  | Nil -> failwith "empty"

Key Differences

Memory management: OCaml GC automatically shares subtrees; Rust uses Rc for explicit reference-counted sharing or Box for exclusive ownership with cloning.

Error handling: OCaml failwith "index" raises an exception; Rust returns Option<T> — the caller decides how to handle out-of-bounds.

Type parameters: OCaml 'a is implicitly polymorphic; Rust requires explicit <T> with trait bounds (T: Clone for from_slice).

Pattern matching on slices: OCaml matches on list constructors [] | [x] | _; Rust uses slice patterns [] | [x] | _ on &[T] — same shape, different memory model.

OCaml Approach

OCaml uses an algebraic type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec. Since OCaml values are immutable by default, set naturally returns a new tree sharing the unchanged subtree — the GC handles lifetime. failwith signals out-of-bounds errors.

Full Source

#![allow(dead_code)]

use std::rc::Rc;

/// Persistent vector backed by a balanced binary tree.
///
/// `set` shares all unchanged subtrees via `Rc`, so old versions remain fully
/// accessible after an update — purely functional update semantics.
#[derive(Debug, Clone, PartialEq)]
pub enum PVec<T> {
    Nil,
    One(T),
    Two(Rc<PVec<T>>, Rc<PVec<T>>),
}

impl<T: Clone> PVec<T> {
    /// Construct an empty vector.
    pub fn empty() -> Self {
        PVec::Nil
    }

    /// Number of elements (O(n) — traverses the tree).
    pub fn size(&self) -> usize {
        match self {
            PVec::Nil => 0,
            PVec::One(_) => 1,
            PVec::Two(l, r) => l.size() + r.size(),
        }
    }

    /// Get element at index `i`. Returns `None` if out of bounds.
    pub fn get(&self, i: usize) -> Option<&T> {
        match self {
            PVec::Nil => None,
            PVec::One(x) => (i == 0).then_some(x),
            PVec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.get(i)
                } else {
                    r.get(i - ls)
                }
            }
        }
    }

    /// Return a new vector with element at index `i` replaced by `v`.
    /// Unchanged subtrees are shared with the original via `Rc` (structural sharing).
    /// Returns `None` if `i` is out of bounds.
    pub fn set(&self, i: usize, v: T) -> Option<Self> {
        match self {
            PVec::Nil => None,
            PVec::One(_) => (i == 0).then(|| PVec::One(v)),
            PVec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.set(i, v)
                        .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
                } else {
                    r.set(i - ls, v)
                        .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
                }
            }
        }
    }

    /// Build from a slice by halving recursively — produces a balanced tree.
    pub fn from_slice(items: &[T]) -> Self {
        match items {
            [] => PVec::Nil,
            [x] => PVec::One(x.clone()),
            _ => {
                let mid = items.len() / 2;
                PVec::Two(
                    Rc::new(Self::from_slice(&items[..mid])),
                    Rc::new(Self::from_slice(&items[mid..])),
                )
            }
        }
    }
}

/// Recursive implementation using `Box` — mirrors OCaml style directly.
/// No structural sharing: `set` deep-clones unchanged subtrees.
/// Simpler ownership model; illustrates the OCaml→Rust pattern without `Rc`.
#[derive(Debug, Clone, PartialEq)]
pub enum PVecRec<T> {
    Nil,
    One(T),
    Two(Box<PVecRec<T>>, Box<PVecRec<T>>),
}

impl<T: Clone> PVecRec<T> {
    pub fn size(&self) -> usize {
        match self {
            PVecRec::Nil => 0,
            PVecRec::One(_) => 1,
            PVecRec::Two(l, r) => l.size() + r.size(),
        }
    }

    pub fn get(&self, i: usize) -> Option<&T> {
        match self {
            PVecRec::Nil => None,
            PVecRec::One(x) => (i == 0).then_some(x),
            PVecRec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.get(i)
                } else {
                    r.get(i - ls)
                }
            }
        }
    }

    pub fn set(&self, i: usize, v: T) -> Option<Self> {
        match self {
            PVecRec::Nil => None,
            PVecRec::One(_) => (i == 0).then(|| PVecRec::One(v)),
            PVecRec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.set(i, v)
                        .map(|new_l| PVecRec::Two(Box::new(new_l), r.clone()))
                } else {
                    r.set(i - ls, v)
                        .map(|new_r| PVecRec::Two(l.clone(), Box::new(new_r)))
                }
            }
        }
    }

    pub fn from_slice(items: &[T]) -> Self {
        match items {
            [] => PVecRec::Nil,
            [x] => PVecRec::One(x.clone()),
            _ => {
                let mid = items.len() / 2;
                PVecRec::Two(
                    Box::new(Self::from_slice(&items[..mid])),
                    Box::new(Self::from_slice(&items[mid..])),
                )
            }
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty() {
        let v: PVec<i32> = PVec::empty();
        assert_eq!(v.size(), 0);
        assert_eq!(v.get(0), None);
        assert_eq!(v.set(0, 42), None);
    }

    #[test]
    fn test_single() {
        let v = PVec::from_slice(&[42]);
        assert_eq!(v.size(), 1);
        assert_eq!(v.get(0), Some(&42));
        assert_eq!(v.get(1), None);
    }

    #[test]
    fn test_multiple_get() {
        let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
        assert_eq!(v.size(), 5);
        assert_eq!(v.get(0), Some(&10));
        assert_eq!(v.get(2), Some(&30));
        assert_eq!(v.get(4), Some(&50));
        assert_eq!(v.get(5), None);
    }

    #[test]
    fn test_set_returns_new_version() {
        let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
        let v2 = v.set(2, 99).expect("index 2 is valid");
        // New version has updated value
        assert_eq!(v2.get(2), Some(&99));
        // Old version unchanged — persistence!
        assert_eq!(v.get(2), Some(&30));
    }

    #[test]
    fn test_set_out_of_bounds() {
        let v = PVec::from_slice(&[1, 2, 3]);
        assert_eq!(v.set(10, 99), None);
    }

    #[test]
    fn test_multiple_sets_are_independent() {
        let v0 = PVec::from_slice(&[1, 2, 3]);
        let v1 = v0.set(0, 10).unwrap();
        let v2 = v0.set(1, 20).unwrap();
        // v0 unchanged
        assert_eq!(v0.get(0), Some(&1));
        assert_eq!(v0.get(1), Some(&2));
        // v1 and v2 diverged independently from v0
        assert_eq!(v1.get(0), Some(&10));
        assert_eq!(v1.get(1), Some(&2));
        assert_eq!(v2.get(0), Some(&1));
        assert_eq!(v2.get(1), Some(&20));
    }

    #[test]
    fn test_recursive_version_matches_behavior() {
        let v = PVecRec::from_slice(&[10, 20, 30, 40, 50]);
        assert_eq!(v.get(2), Some(&30));
        let v2 = v.set(2, 99).unwrap();
        assert_eq!(v2.get(2), Some(&99));
        assert_eq!(v.get(2), Some(&30));
    }

    #[test]
    fn test_from_slice_all_indices() {
        let data = vec![1, 2, 3, 4, 5, 6, 7, 8];
        let v = PVec::from_slice(&data);
        assert_eq!(v.size(), 8);
        for (i, &expected) in data.iter().enumerate() {
            assert_eq!(v.get(i), Some(&expected));
        }
    }
}

(* Simplified persistent vector using balanced binary tree *)
type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec

let rec size = function
  | Nil -> 0 | One _ -> 1
  | Two (l, r) -> size l + size r

let rec get i = function
  | One x -> if i = 0 then x else failwith "index"
  | Two (l, r) ->
    let ls = size l in
    if i < ls then get i l else get (i - ls) r
  | Nil -> failwith "empty"

let rec set i v = function
  | One _ -> if i = 0 then One v else failwith "index"
  | Two (l, r) ->
    let ls = size l in
    if i < ls then Two (set i v l, r)
    else Two (l, set (i - ls) v r)
  | Nil -> failwith "empty"

let of_list lst =
  let rec build = function
    | [] -> Nil | [x] -> One x
    | lst ->
      let n = List.length lst in
      let left = List.filteri (fun i _ -> i < n/2) lst in
      let right = List.filteri (fun i _ -> i >= n/2) lst in
      Two (build left, build right)
  in build lst

let () =
  let v = of_list [10;20;30;40;50] in
  assert (size v = 5);
  assert (get 0 v = 10);
  assert (get 2 v = 30);
  assert (get 4 v = 50);
  let v2 = set 2 99 v in
  (* Original unchanged — persistent semantics *)
  assert (get 2 v = 30);
  (* New version has updated element *)
  assert (get 2 v2 = 99);
  assert (of_list [] = Nil);
  print_endline "ok"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty() {
        let v: PVec<i32> = PVec::empty();
        assert_eq!(v.size(), 0);
        assert_eq!(v.get(0), None);
        assert_eq!(v.set(0, 42), None);
    }

    #[test]
    fn test_single() {
        let v = PVec::from_slice(&[42]);
        assert_eq!(v.size(), 1);
        assert_eq!(v.get(0), Some(&42));
        assert_eq!(v.get(1), None);
    }

    #[test]
    fn test_multiple_get() {
        let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
        assert_eq!(v.size(), 5);
        assert_eq!(v.get(0), Some(&10));
        assert_eq!(v.get(2), Some(&30));
        assert_eq!(v.get(4), Some(&50));
        assert_eq!(v.get(5), None);
    }

    #[test]
    fn test_set_returns_new_version() {
        let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
        let v2 = v.set(2, 99).expect("index 2 is valid");
        // New version has updated value
        assert_eq!(v2.get(2), Some(&99));
        // Old version unchanged — persistence!
        assert_eq!(v.get(2), Some(&30));
    }

    #[test]
    fn test_set_out_of_bounds() {
        let v = PVec::from_slice(&[1, 2, 3]);
        assert_eq!(v.set(10, 99), None);
    }

    #[test]
    fn test_multiple_sets_are_independent() {
        let v0 = PVec::from_slice(&[1, 2, 3]);
        let v1 = v0.set(0, 10).unwrap();
        let v2 = v0.set(1, 20).unwrap();
        // v0 unchanged
        assert_eq!(v0.get(0), Some(&1));
        assert_eq!(v0.get(1), Some(&2));
        // v1 and v2 diverged independently from v0
        assert_eq!(v1.get(0), Some(&10));
        assert_eq!(v1.get(1), Some(&2));
        assert_eq!(v2.get(0), Some(&1));
        assert_eq!(v2.get(1), Some(&20));
    }

    #[test]
    fn test_recursive_version_matches_behavior() {
        let v = PVecRec::from_slice(&[10, 20, 30, 40, 50]);
        assert_eq!(v.get(2), Some(&30));
        let v2 = v.set(2, 99).unwrap();
        assert_eq!(v2.get(2), Some(&99));
        assert_eq!(v.get(2), Some(&30));
    }

    #[test]
    fn test_from_slice_all_indices() {
        let data = vec![1, 2, 3, 4, 5, 6, 7, 8];
        let v = PVec::from_slice(&data);
        assert_eq!(v.size(), 8);
        for (i, &expected) in data.iter().enumerate() {
            assert_eq!(v.get(i), Some(&expected));
        }
    }
}

Deep Comparison

OCaml vs Rust: Persistent Vector — Functional Array

Side-by-Side Code

OCaml

type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec

let rec set i v = function
  | One _ -> if i = 0 then One v else failwith "index"
  | Two (l, r) ->
    let ls = size l in
    if i < ls then Two (set i v l, r)
    else Two (l, set (i - ls) v r)
  | Nil -> failwith "empty"

Rust (idiomatic — with `Rc` structural sharing)

#[derive(Debug, Clone)]
pub enum PVec<T> {
    Nil,
    One(T),
    Two(Rc<PVec<T>>, Rc<PVec<T>>),
}

pub fn set(&self, i: usize, v: T) -> Option<Self> {
    match self {
        PVec::Nil => None,
        PVec::One(_) => (i == 0).then(|| PVec::One(v)),
        PVec::Two(l, r) => {
            let ls = l.size();
            if i < ls {
                l.set(i, v)
                    .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
            } else {
                r.set(i - ls, v)
                    .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
            }
        }
    }
}

Rust (recursive — mirrors OCaml with `Box`)

#[derive(Debug, Clone)]
pub enum PVecRec<T> {
    Nil,
    One(T),
    Two(Box<PVecRec<T>>, Box<PVecRec<T>>),
}

pub fn set(&self, i: usize, v: T) -> Option<Self> {
    match self {
        PVecRec::Nil => None,
        PVecRec::One(_) => (i == 0).then(|| PVecRec::One(v)),
        PVecRec::Two(l, r) => {
            let ls = l.size();
            if i < ls {
                l.set(i, v)
                    .map(|new_l| PVecRec::Two(Box::new(new_l), r.clone()))
            } else {
                r.set(i - ls, v)
                    .map(|new_r| PVecRec::Two(l.clone(), Box::new(new_r)))
            }
        }
    }
}

Type Signatures

Concept	OCaml	Rust
ADT definition	`type 'a pvec = Nil \\| One of 'a \\| Two of 'a pvec * 'a pvec`	`enum PVec<T> { Nil, One(T), Two(Rc<PVec<T>>, Rc<PVec<T>>) }`
Get signature	`val get : int -> 'a pvec -> 'a` (raises)	`fn get(&self, i: usize) -> Option<&T>`
Set signature	`val set : int -> 'a -> 'a pvec -> 'a pvec` (raises)	`fn set(&self, i: usize, v: T) -> Option<Self>`
Shared subtree	implicit (GC)	`Rc::clone(r)` — O(1) ref-count bump
Error signal	`failwith "index"` (exception)	`None` (caller handles)

Key Insights

Structural sharing requires explicit smart pointers in Rust. OCaml's GC automatically shares any unchanged subtree; Rust needs Rc<T> (reference counting) to achieve the same. Without Rc, Box forces a deep clone of every unchanged subtree on each set, making the "persistence" expensive.

**Rc::clone is the idiomatic spelling.** Writing Rc::clone(r) instead of r.clone() makes it obvious that only the reference count is bumped — no heap allocation occurs. This is a critical performance distinction from cloning the contained value.

**Option<T> over panics makes callers composable.** OCaml's failwith tears down the call stack; Rust's Option lets callers chain with ?, .and_then(), or .map(). This is more idiomatic and avoids hidden control flow.

Slice patterns unify OCaml list and array matching. OCaml matches on linked-list constructors; Rust matches on &[T] slice patterns. Both look like [] | [x] | _ but Rust's operate on contiguous memory with O(1) length, making the mid = len / 2 split more efficient.

**T: Clone bound is explicit and minimal.** OCaml's 'a is unrestricted; Rust requires T: Clone only where cloning actually occurs (e.g., from_slice clones each element from the input slice). The set method works without cloning T itself — only the Rc wrapper is incremented.

When to Use Each Style

**Use idiomatic Rust (Rc) when:** you want true persistence with O(log n) updates and structural sharing between versions — e.g., implementing undo/redo, version history, or any functional data structure where multiple versions coexist.

**Use recursive Rust (Box) when:** you only need immutable tree traversal without multiple live versions, and simplicity of ownership is more important than sharing. The tree is functionally equivalent but each set clones the entire path.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust