ExamplesBy LevelBy TopicLearning Paths
1190 Advanced

Persistent Vector — Functional Array

Persistent Data StructuresTreesFunctional Patterns

Tutorial

The Problem

Implement a persistent (immutable) vector backed by a balanced binary tree. get reads by index; set returns a new version with one element replaced while the original remains unchanged.

🎯 Learning Outcomes

  • • How structural sharing with Rc enables persistent data structures without full copies
  • • Why Option<T> is idiomatic over panic!/failwith for out-of-bounds access
  • • How slice patterns ([], [x], _) replace OCaml list/variant pattern matching
  • • The difference between Box (exclusive ownership, deep clone on update) and Rc (shared ownership, O(log n) update cost)
  • 🦀 The Rust Way

    Rust uses an enum matching OCaml's ADT. For the idiomatic version, Rc<PVec<T>> provides reference-counted shared ownership so unchanged subtrees are genuinely shared (not copied) between versions. For the recursive version, Box<PVecRec<T>> gives simple tree ownership but requires cloning unchanged subtrees on each set. Both use Option<T> instead of panics.

    Code Example

    #[derive(Debug, Clone)]
    pub enum PVec<T> {
        Nil,
        One(T),
        Two(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    pub fn set(&self, i: usize, v: T) -> Option<Self> {
        match self {
            PVec::Nil => None,
            PVec::One(_) => (i == 0).then(|| PVec::One(v)),
            PVec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.set(i, v)
                        .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
                } else {
                    r.set(i - ls, v)
                        .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
                }
            }
        }
    }

    Key Differences

  • Memory management: OCaml GC automatically shares subtrees; Rust uses Rc for explicit reference-counted sharing or Box for exclusive ownership with cloning.
  • Error handling: OCaml failwith "index" raises an exception; Rust returns Option<T> — the caller decides how to handle out-of-bounds.
  • Type parameters: OCaml 'a is implicitly polymorphic; Rust requires explicit <T> with trait bounds (T: Clone for from_slice).
  • Pattern matching on slices: OCaml matches on list constructors [] | [x] | _; Rust uses slice patterns [] | [x] | _ on &[T] — same shape, different memory model.
  • OCaml Approach

    OCaml uses an algebraic type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec. Since OCaml values are immutable by default, set naturally returns a new tree sharing the unchanged subtree — the GC handles lifetime. failwith signals out-of-bounds errors.

    Full Source

    #![allow(dead_code)]
    
    use std::rc::Rc;
    
    /// Persistent vector backed by a balanced binary tree.
    ///
    /// `set` shares all unchanged subtrees via `Rc`, so old versions remain fully
    /// accessible after an update — purely functional update semantics.
    #[derive(Debug, Clone, PartialEq)]
    pub enum PVec<T> {
        Nil,
        One(T),
        Two(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    impl<T: Clone> PVec<T> {
        /// Construct an empty vector.
        pub fn empty() -> Self {
            PVec::Nil
        }
    
        /// Number of elements (O(n) — traverses the tree).
        pub fn size(&self) -> usize {
            match self {
                PVec::Nil => 0,
                PVec::One(_) => 1,
                PVec::Two(l, r) => l.size() + r.size(),
            }
        }
    
        /// Get element at index `i`. Returns `None` if out of bounds.
        pub fn get(&self, i: usize) -> Option<&T> {
            match self {
                PVec::Nil => None,
                PVec::One(x) => (i == 0).then_some(x),
                PVec::Two(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.get(i)
                    } else {
                        r.get(i - ls)
                    }
                }
            }
        }
    
        /// Return a new vector with element at index `i` replaced by `v`.
        /// Unchanged subtrees are shared with the original via `Rc` (structural sharing).
        /// Returns `None` if `i` is out of bounds.
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVec::Nil => None,
                PVec::One(_) => (i == 0).then(|| PVec::One(v)),
                PVec::Two(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.set(i, v)
                            .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
                    }
                }
            }
        }
    
        /// Build from a slice by halving recursively — produces a balanced tree.
        pub fn from_slice(items: &[T]) -> Self {
            match items {
                [] => PVec::Nil,
                [x] => PVec::One(x.clone()),
                _ => {
                    let mid = items.len() / 2;
                    PVec::Two(
                        Rc::new(Self::from_slice(&items[..mid])),
                        Rc::new(Self::from_slice(&items[mid..])),
                    )
                }
            }
        }
    }
    
    /// Recursive implementation using `Box` — mirrors OCaml style directly.
    /// No structural sharing: `set` deep-clones unchanged subtrees.
    /// Simpler ownership model; illustrates the OCaml→Rust pattern without `Rc`.
    #[derive(Debug, Clone, PartialEq)]
    pub enum PVecRec<T> {
        Nil,
        One(T),
        Two(Box<PVecRec<T>>, Box<PVecRec<T>>),
    }
    
    impl<T: Clone> PVecRec<T> {
        pub fn size(&self) -> usize {
            match self {
                PVecRec::Nil => 0,
                PVecRec::One(_) => 1,
                PVecRec::Two(l, r) => l.size() + r.size(),
            }
        }
    
        pub fn get(&self, i: usize) -> Option<&T> {
            match self {
                PVecRec::Nil => None,
                PVecRec::One(x) => (i == 0).then_some(x),
                PVecRec::Two(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.get(i)
                    } else {
                        r.get(i - ls)
                    }
                }
            }
        }
    
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVecRec::Nil => None,
                PVecRec::One(_) => (i == 0).then(|| PVecRec::One(v)),
                PVecRec::Two(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.set(i, v)
                            .map(|new_l| PVecRec::Two(Box::new(new_l), r.clone()))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVecRec::Two(l.clone(), Box::new(new_r)))
                    }
                }
            }
        }
    
        pub fn from_slice(items: &[T]) -> Self {
            match items {
                [] => PVecRec::Nil,
                [x] => PVecRec::One(x.clone()),
                _ => {
                    let mid = items.len() / 2;
                    PVecRec::Two(
                        Box::new(Self::from_slice(&items[..mid])),
                        Box::new(Self::from_slice(&items[mid..])),
                    )
                }
            }
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_empty() {
            let v: PVec<i32> = PVec::empty();
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
            assert_eq!(v.set(0, 42), None);
        }
    
        #[test]
        fn test_single() {
            let v = PVec::from_slice(&[42]);
            assert_eq!(v.size(), 1);
            assert_eq!(v.get(0), Some(&42));
            assert_eq!(v.get(1), None);
        }
    
        #[test]
        fn test_multiple_get() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(0), Some(&10));
            assert_eq!(v.get(2), Some(&30));
            assert_eq!(v.get(4), Some(&50));
            assert_eq!(v.get(5), None);
        }
    
        #[test]
        fn test_set_returns_new_version() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            let v2 = v.set(2, 99).expect("index 2 is valid");
            // New version has updated value
            assert_eq!(v2.get(2), Some(&99));
            // Old version unchanged — persistence!
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_set_out_of_bounds() {
            let v = PVec::from_slice(&[1, 2, 3]);
            assert_eq!(v.set(10, 99), None);
        }
    
        #[test]
        fn test_multiple_sets_are_independent() {
            let v0 = PVec::from_slice(&[1, 2, 3]);
            let v1 = v0.set(0, 10).unwrap();
            let v2 = v0.set(1, 20).unwrap();
            // v0 unchanged
            assert_eq!(v0.get(0), Some(&1));
            assert_eq!(v0.get(1), Some(&2));
            // v1 and v2 diverged independently from v0
            assert_eq!(v1.get(0), Some(&10));
            assert_eq!(v1.get(1), Some(&2));
            assert_eq!(v2.get(0), Some(&1));
            assert_eq!(v2.get(1), Some(&20));
        }
    
        #[test]
        fn test_recursive_version_matches_behavior() {
            let v = PVecRec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.get(2), Some(&30));
            let v2 = v.set(2, 99).unwrap();
            assert_eq!(v2.get(2), Some(&99));
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_from_slice_all_indices() {
            let data = vec![1, 2, 3, 4, 5, 6, 7, 8];
            let v = PVec::from_slice(&data);
            assert_eq!(v.size(), 8);
            for (i, &expected) in data.iter().enumerate() {
                assert_eq!(v.get(i), Some(&expected));
            }
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_empty() {
            let v: PVec<i32> = PVec::empty();
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
            assert_eq!(v.set(0, 42), None);
        }
    
        #[test]
        fn test_single() {
            let v = PVec::from_slice(&[42]);
            assert_eq!(v.size(), 1);
            assert_eq!(v.get(0), Some(&42));
            assert_eq!(v.get(1), None);
        }
    
        #[test]
        fn test_multiple_get() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(0), Some(&10));
            assert_eq!(v.get(2), Some(&30));
            assert_eq!(v.get(4), Some(&50));
            assert_eq!(v.get(5), None);
        }
    
        #[test]
        fn test_set_returns_new_version() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            let v2 = v.set(2, 99).expect("index 2 is valid");
            // New version has updated value
            assert_eq!(v2.get(2), Some(&99));
            // Old version unchanged — persistence!
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_set_out_of_bounds() {
            let v = PVec::from_slice(&[1, 2, 3]);
            assert_eq!(v.set(10, 99), None);
        }
    
        #[test]
        fn test_multiple_sets_are_independent() {
            let v0 = PVec::from_slice(&[1, 2, 3]);
            let v1 = v0.set(0, 10).unwrap();
            let v2 = v0.set(1, 20).unwrap();
            // v0 unchanged
            assert_eq!(v0.get(0), Some(&1));
            assert_eq!(v0.get(1), Some(&2));
            // v1 and v2 diverged independently from v0
            assert_eq!(v1.get(0), Some(&10));
            assert_eq!(v1.get(1), Some(&2));
            assert_eq!(v2.get(0), Some(&1));
            assert_eq!(v2.get(1), Some(&20));
        }
    
        #[test]
        fn test_recursive_version_matches_behavior() {
            let v = PVecRec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.get(2), Some(&30));
            let v2 = v.set(2, 99).unwrap();
            assert_eq!(v2.get(2), Some(&99));
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_from_slice_all_indices() {
            let data = vec![1, 2, 3, 4, 5, 6, 7, 8];
            let v = PVec::from_slice(&data);
            assert_eq!(v.size(), 8);
            for (i, &expected) in data.iter().enumerate() {
                assert_eq!(v.get(i), Some(&expected));
            }
        }
    }

    Deep Comparison

    OCaml vs Rust: Persistent Vector — Functional Array

    Side-by-Side Code

    OCaml

    type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec
    
    let rec set i v = function
      | One _ -> if i = 0 then One v else failwith "index"
      | Two (l, r) ->
        let ls = size l in
        if i < ls then Two (set i v l, r)
        else Two (l, set (i - ls) v r)
      | Nil -> failwith "empty"
    

    Rust (idiomatic — with Rc structural sharing)

    #[derive(Debug, Clone)]
    pub enum PVec<T> {
        Nil,
        One(T),
        Two(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    pub fn set(&self, i: usize, v: T) -> Option<Self> {
        match self {
            PVec::Nil => None,
            PVec::One(_) => (i == 0).then(|| PVec::One(v)),
            PVec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.set(i, v)
                        .map(|new_l| PVec::Two(Rc::new(new_l), Rc::clone(r)))
                } else {
                    r.set(i - ls, v)
                        .map(|new_r| PVec::Two(Rc::clone(l), Rc::new(new_r)))
                }
            }
        }
    }
    

    Rust (recursive — mirrors OCaml with Box)

    #[derive(Debug, Clone)]
    pub enum PVecRec<T> {
        Nil,
        One(T),
        Two(Box<PVecRec<T>>, Box<PVecRec<T>>),
    }
    
    pub fn set(&self, i: usize, v: T) -> Option<Self> {
        match self {
            PVecRec::Nil => None,
            PVecRec::One(_) => (i == 0).then(|| PVecRec::One(v)),
            PVecRec::Two(l, r) => {
                let ls = l.size();
                if i < ls {
                    l.set(i, v)
                        .map(|new_l| PVecRec::Two(Box::new(new_l), r.clone()))
                } else {
                    r.set(i - ls, v)
                        .map(|new_r| PVecRec::Two(l.clone(), Box::new(new_r)))
                }
            }
        }
    }
    

    Type Signatures

    ConceptOCamlRust
    ADT definitiontype 'a pvec = Nil \| One of 'a \| Two of 'a pvec * 'a pvecenum PVec<T> { Nil, One(T), Two(Rc<PVec<T>>, Rc<PVec<T>>) }
    Get signatureval get : int -> 'a pvec -> 'a (raises)fn get(&self, i: usize) -> Option<&T>
    Set signatureval set : int -> 'a -> 'a pvec -> 'a pvec (raises)fn set(&self, i: usize, v: T) -> Option<Self>
    Shared subtreeimplicit (GC)Rc::clone(r) — O(1) ref-count bump
    Error signalfailwith "index" (exception)None (caller handles)

    Key Insights

  • Structural sharing requires explicit smart pointers in Rust. OCaml's GC automatically shares any unchanged subtree; Rust needs Rc<T> (reference counting) to achieve the same. Without Rc, Box forces a deep clone of every unchanged subtree on each set, making the "persistence" expensive.
  • **Rc::clone is the idiomatic spelling.** Writing Rc::clone(r) instead of r.clone() makes it obvious that only the reference count is bumped — no heap allocation occurs. This is a critical performance distinction from cloning the contained value.
  • **Option<T> over panics makes callers composable.** OCaml's failwith tears down the call stack; Rust's Option lets callers chain with ?, .and_then(), or .map(). This is more idiomatic and avoids hidden control flow.
  • Slice patterns unify OCaml list and array matching. OCaml matches on linked-list constructors; Rust matches on &[T] slice patterns. Both look like [] | [x] | _ but Rust's operate on contiguous memory with O(1) length, making the mid = len / 2 split more efficient.
  • **T: Clone bound is explicit and minimal.** OCaml's 'a is unrestricted; Rust requires T: Clone only where cloning actually occurs (e.g., from_slice clones each element from the input slice). The set method works without cloning T itself — only the Rc wrapper is incremented.
  • When to Use Each Style

    **Use idiomatic Rust (Rc) when:** you want true persistence with O(log n) updates and structural sharing between versions — e.g., implementing undo/redo, version history, or any functional data structure where multiple versions coexist.

    **Use recursive Rust (Box) when:** you only need immutable tree traversal without multiple live versions, and simplicity of ownership is more important than sharing. The tree is functionally equivalent but each set clones the entire path.

    Open Source Repos