ExamplesBy LevelBy TopicLearning Paths
102 Advanced

Persistent Vector

Data Structures

Tutorial

The Problem

Implement a persistent (immutable) vector backed by a balanced binary tree, where set returns a new version sharing all unchanged subtrees with the original.

🎯 Learning Outcomes

  • • How algebraic data types model recursive tree structures in Rust
  • Rc<T> enables structural sharing — unchanged subtrees are pointer-aliased, not copied
  • • Functional updates: set is O(log n) new allocations, not O(n) full copy
  • • The difference between Rc<T> (shared ownership, no sharing guarantees in OCaml) and OCaml's GC (automatic sharing via value identity)
  • 🦀 The Rust Way

    Rust requires explicit ownership management. Rc<PVec<T>> gives shared ownership so unchanged subtrees can be aliased across versions. Rc::clone in the set path only bumps a reference count (O(1)), while only the nodes on the path from root to the updated leaf are newly allocated (O(log n) total).

    Code Example

    use std::rc::Rc;
    
    #[derive(Debug, Clone)]
    pub enum PVec<T> {
        Nil,
        Leaf(T),
        Branch(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    impl<T: Clone> PVec<T> {
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVec::Nil => None,
                PVec::Leaf(_) => (i == 0).then(|| PVec::Leaf(v)),
                PVec::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        // Rc::clone(r) = pointer bump, not deep copy
                        l.set(i, v)
                            .map(|new_l| PVec::Branch(Rc::new(new_l), Rc::clone(r)))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVec::Branch(Rc::clone(l), Rc::new(new_r)))
                    }
                }
            }
        }
    }

    Key Differences

  • Memory model: OCaml GC shares heap nodes automatically; Rust needs Rc<T> to express shared ownership explicitly.
  • Error handling: OCaml uses failwith (exceptions); Rust returns Option<T> — out-of-bounds is a value, not a control-flow exception.
  • Clone semantics: Rc::clone is a pointer bump (O(1)); cloning a Box<T> is a deep copy (O(subtree size)).
  • Type bounds: OCaml's parametric polymorphism is unconstrained; Rust's from_slice and set require T: Clone explicitly.
  • OCaml Approach

    OCaml uses a sum type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec. Since OCaml is garbage-collected, set naturally shares the unchanged branch — the GC handles aliasing automatically. The programmer writes purely functional code and the runtime provides structural sharing for free.

    Full Source

    #![allow(dead_code)]
    
    use std::rc::Rc;
    
    /// Persistent vector using a balanced binary tree with structural sharing via `Rc`.
    ///
    /// `set` returns a new `PVec` sharing all unchanged subtrees — O(log n) new allocations.
    /// The original vector is untouched, making this safe to use as an immutable value.
    #[derive(Debug, Clone)]
    pub enum PVec<T> {
        Nil,
        Leaf(T),
        Branch(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    impl<T> PVec<T> {
        pub fn empty() -> Self {
            PVec::Nil
        }
    
        pub fn size(&self) -> usize {
            match self {
                PVec::Nil => 0,
                PVec::Leaf(_) => 1,
                PVec::Branch(l, r) => l.size() + r.size(),
            }
        }
    
        /// Returns a reference to element at index `i`, or `None` if out of bounds.
        pub fn get(&self, i: usize) -> Option<&T> {
            match self {
                PVec::Nil => None,
                PVec::Leaf(x) => (i == 0).then_some(x),
                PVec::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.get(i)
                    } else {
                        r.get(i - ls)
                    }
                }
            }
        }
    }
    
    impl<T: Clone> PVec<T> {
        /// Returns a new `PVec` with index `i` replaced by `v`.
        ///
        /// Shares unchanged subtrees via `Rc::clone` — only the path from root to the
        /// modified leaf is newly allocated (O(log n) nodes). The original is unmodified.
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVec::Nil => None,
                PVec::Leaf(_) => (i == 0).then(|| PVec::Leaf(v)),
                PVec::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.set(i, v)
                            .map(|new_l| PVec::Branch(Rc::new(new_l), Rc::clone(r)))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVec::Branch(Rc::clone(l), Rc::new(new_r)))
                    }
                }
            }
        }
    
        /// Build a balanced `PVec` from a slice in O(n) time.
        pub fn from_slice(items: &[T]) -> Self {
            match items {
                [] => PVec::Nil,
                [x] => PVec::Leaf(x.clone()),
                _ => {
                    let mid = items.len() / 2;
                    PVec::Branch(
                        Rc::new(Self::from_slice(&items[..mid])),
                        Rc::new(Self::from_slice(&items[mid..])),
                    )
                }
            }
        }
    
        /// Flatten the tree back to a `Vec<T>` in index order.
        pub fn to_vec(&self) -> Vec<T> {
            match self {
                PVec::Nil => vec![],
                PVec::Leaf(x) => vec![x.clone()],
                PVec::Branch(l, r) => l.to_vec().into_iter().chain(r.to_vec()).collect(),
            }
        }
    }
    
    impl<T: Clone> Default for PVec<T> {
        fn default() -> Self {
            PVec::empty()
        }
    }
    
    /// Box-based persistent vector — mirrors the OCaml style more directly.
    ///
    /// No structural sharing: each `set` allocates the entire path from root to leaf
    /// and clones the unchanged subtrees. Simpler, but O(n) per update in the worst case.
    #[derive(Debug, Clone)]
    pub enum PVecBox<T> {
        Nil,
        Leaf(T),
        Branch(Box<PVecBox<T>>, Box<PVecBox<T>>),
    }
    
    impl<T> PVecBox<T> {
        pub fn size(&self) -> usize {
            match self {
                PVecBox::Nil => 0,
                PVecBox::Leaf(_) => 1,
                PVecBox::Branch(l, r) => l.size() + r.size(),
            }
        }
    
        pub fn get(&self, i: usize) -> Option<&T> {
            match self {
                PVecBox::Nil => None,
                PVecBox::Leaf(x) => (i == 0).then_some(x),
                PVecBox::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        l.get(i)
                    } else {
                        r.get(i - ls)
                    }
                }
            }
        }
    }
    
    impl<T: Clone> PVecBox<T> {
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVecBox::Nil => None,
                PVecBox::Leaf(_) => (i == 0).then(|| PVecBox::Leaf(v)),
                PVecBox::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        // Right subtree cloned wholesale (no sharing)
                        l.set(i, v)
                            .map(|new_l| PVecBox::Branch(Box::new(new_l), r.clone()))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVecBox::Branch(l.clone(), Box::new(new_r)))
                    }
                }
            }
        }
    
        pub fn from_slice(items: &[T]) -> Self {
            match items {
                [] => PVecBox::Nil,
                [x] => PVecBox::Leaf(x.clone()),
                _ => {
                    let mid = items.len() / 2;
                    PVecBox::Branch(
                        Box::new(Self::from_slice(&items[..mid])),
                        Box::new(Self::from_slice(&items[mid..])),
                    )
                }
            }
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- PVec (Rc-based) tests ---
    
        #[test]
        fn test_empty_size_and_get() {
            let v: PVec<i32> = PVec::empty();
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
        }
    
        #[test]
        fn test_single_element() {
            let v = PVec::from_slice(&[42]);
            assert_eq!(v.size(), 1);
            assert_eq!(v.get(0), Some(&42));
            assert_eq!(v.get(1), None);
        }
    
        #[test]
        fn test_from_slice_and_get() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(0), Some(&10));
            assert_eq!(v.get(2), Some(&30));
            assert_eq!(v.get(4), Some(&50));
            assert_eq!(v.get(5), None);
        }
    
        #[test]
        fn test_set_returns_new_version() {
            let v1 = PVec::from_slice(&[10, 20, 30, 40, 50]);
            let v2 = v1.set(2, 99).expect("valid index");
    
            // v2 has the update
            assert_eq!(v2.get(2), Some(&99));
            // v1 is unchanged (persistence)
            assert_eq!(v1.get(2), Some(&30));
            // Other positions preserved in v2
            assert_eq!(v2.get(0), Some(&10));
            assert_eq!(v2.get(4), Some(&50));
        }
    
        #[test]
        fn test_set_out_of_bounds_returns_none() {
            let v = PVec::from_slice(&[1, 2, 3]);
            assert!(v.set(3, 99).is_none());
            assert!(v.set(10, 0).is_none());
        }
    
        #[test]
        fn test_to_vec_round_trip() {
            let input = vec![1, 2, 3, 4, 5, 6, 7, 8];
            let pv = PVec::from_slice(&input);
            assert_eq!(pv.to_vec(), input);
        }
    
        #[test]
        fn test_multiple_persistent_versions() {
            let v0 = PVec::from_slice(&[1, 2, 3]);
            let v1 = v0.set(0, 10).unwrap();
            let v2 = v1.set(1, 20).unwrap();
            let v3 = v2.set(2, 30).unwrap();
    
            assert_eq!(v0.to_vec(), vec![1, 2, 3]);
            assert_eq!(v1.to_vec(), vec![10, 2, 3]);
            assert_eq!(v2.to_vec(), vec![10, 20, 3]);
            assert_eq!(v3.to_vec(), vec![10, 20, 30]);
        }
    
        // --- PVecBox (Box-based) tests ---
    
        #[test]
        fn test_box_empty() {
            let v: PVecBox<i32> = PVecBox::Nil;
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
        }
    
        #[test]
        fn test_box_from_slice_and_get() {
            let v = PVecBox::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_box_set_persistence() {
            let v1 = PVecBox::from_slice(&[10, 20, 30]);
            let v2 = v1.set(1, 99).expect("valid index");
            assert_eq!(v2.get(1), Some(&99));
            assert_eq!(v1.get(1), Some(&20)); // original unchanged
        }
    
        #[test]
        fn test_box_set_out_of_bounds() {
            let v = PVecBox::from_slice(&[1, 2]);
            assert!(v.set(5, 0).is_none());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- PVec (Rc-based) tests ---
    
        #[test]
        fn test_empty_size_and_get() {
            let v: PVec<i32> = PVec::empty();
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
        }
    
        #[test]
        fn test_single_element() {
            let v = PVec::from_slice(&[42]);
            assert_eq!(v.size(), 1);
            assert_eq!(v.get(0), Some(&42));
            assert_eq!(v.get(1), None);
        }
    
        #[test]
        fn test_from_slice_and_get() {
            let v = PVec::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(0), Some(&10));
            assert_eq!(v.get(2), Some(&30));
            assert_eq!(v.get(4), Some(&50));
            assert_eq!(v.get(5), None);
        }
    
        #[test]
        fn test_set_returns_new_version() {
            let v1 = PVec::from_slice(&[10, 20, 30, 40, 50]);
            let v2 = v1.set(2, 99).expect("valid index");
    
            // v2 has the update
            assert_eq!(v2.get(2), Some(&99));
            // v1 is unchanged (persistence)
            assert_eq!(v1.get(2), Some(&30));
            // Other positions preserved in v2
            assert_eq!(v2.get(0), Some(&10));
            assert_eq!(v2.get(4), Some(&50));
        }
    
        #[test]
        fn test_set_out_of_bounds_returns_none() {
            let v = PVec::from_slice(&[1, 2, 3]);
            assert!(v.set(3, 99).is_none());
            assert!(v.set(10, 0).is_none());
        }
    
        #[test]
        fn test_to_vec_round_trip() {
            let input = vec![1, 2, 3, 4, 5, 6, 7, 8];
            let pv = PVec::from_slice(&input);
            assert_eq!(pv.to_vec(), input);
        }
    
        #[test]
        fn test_multiple_persistent_versions() {
            let v0 = PVec::from_slice(&[1, 2, 3]);
            let v1 = v0.set(0, 10).unwrap();
            let v2 = v1.set(1, 20).unwrap();
            let v3 = v2.set(2, 30).unwrap();
    
            assert_eq!(v0.to_vec(), vec![1, 2, 3]);
            assert_eq!(v1.to_vec(), vec![10, 2, 3]);
            assert_eq!(v2.to_vec(), vec![10, 20, 3]);
            assert_eq!(v3.to_vec(), vec![10, 20, 30]);
        }
    
        // --- PVecBox (Box-based) tests ---
    
        #[test]
        fn test_box_empty() {
            let v: PVecBox<i32> = PVecBox::Nil;
            assert_eq!(v.size(), 0);
            assert_eq!(v.get(0), None);
        }
    
        #[test]
        fn test_box_from_slice_and_get() {
            let v = PVecBox::from_slice(&[10, 20, 30, 40, 50]);
            assert_eq!(v.size(), 5);
            assert_eq!(v.get(2), Some(&30));
        }
    
        #[test]
        fn test_box_set_persistence() {
            let v1 = PVecBox::from_slice(&[10, 20, 30]);
            let v2 = v1.set(1, 99).expect("valid index");
            assert_eq!(v2.get(1), Some(&99));
            assert_eq!(v1.get(1), Some(&20)); // original unchanged
        }
    
        #[test]
        fn test_box_set_out_of_bounds() {
            let v = PVecBox::from_slice(&[1, 2]);
            assert!(v.set(5, 0).is_none());
        }
    }

    Deep Comparison

    OCaml vs Rust: Persistent Vector

    Side-by-Side Code

    OCaml

    type 'a pvec = Nil | One of 'a | Two of 'a pvec * 'a pvec
    
    let rec get i = function
      | One x -> if i = 0 then x else failwith "index"
      | Two (l, r) ->
        let ls = size l in
        if i < ls then get i l else get (i - ls) r
      | Nil -> failwith "empty"
    
    let rec set i v = function
      | One _ -> if i = 0 then One v else failwith "index"
      | Two (l, r) ->
        let ls = size l in
        if i < ls then Two (set i v l, r)        (* right branch shared by GC *)
        else Two (l, set (i - ls) v r)           (* left branch shared by GC *)
      | Nil -> failwith "empty"
    

    Rust (idiomatic — explicit sharing with Rc)

    use std::rc::Rc;
    
    #[derive(Debug, Clone)]
    pub enum PVec<T> {
        Nil,
        Leaf(T),
        Branch(Rc<PVec<T>>, Rc<PVec<T>>),
    }
    
    impl<T: Clone> PVec<T> {
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVec::Nil => None,
                PVec::Leaf(_) => (i == 0).then(|| PVec::Leaf(v)),
                PVec::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        // Rc::clone(r) = pointer bump, not deep copy
                        l.set(i, v)
                            .map(|new_l| PVec::Branch(Rc::new(new_l), Rc::clone(r)))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVec::Branch(Rc::clone(l), Rc::new(new_r)))
                    }
                }
            }
        }
    }
    

    Rust (Box-based — mirrors OCaml pattern, no structural sharing)

    #[derive(Debug, Clone)]
    pub enum PVecBox<T> {
        Nil,
        Leaf(T),
        Branch(Box<PVecBox<T>>, Box<PVecBox<T>>),
    }
    
    impl<T: Clone> PVecBox<T> {
        pub fn set(&self, i: usize, v: T) -> Option<Self> {
            match self {
                PVecBox::Nil => None,
                PVecBox::Leaf(_) => (i == 0).then(|| PVecBox::Leaf(v)),
                PVecBox::Branch(l, r) => {
                    let ls = l.size();
                    if i < ls {
                        // r.clone() = deep copy of entire right subtree
                        l.set(i, v)
                            .map(|new_l| PVecBox::Branch(Box::new(new_l), r.clone()))
                    } else {
                        r.set(i - ls, v)
                            .map(|new_r| PVecBox::Branch(l.clone(), Box::new(new_r)))
                    }
                }
            }
        }
    }
    

    Type Signatures

    ConceptOCamlRust
    ADT definitiontype 'a pvec = Nil \| One of 'a \| Two of 'a pvec * 'a pvecenum PVec<T> { Nil, Leaf(T), Branch(Rc<PVec<T>>, Rc<PVec<T>>) }
    Get signatureval get : int -> 'a pvec -> 'afn get(&self, i: usize) -> Option<&T>
    Set signatureval set : int -> 'a -> 'a pvec -> 'a pvecfn set(&self, i: usize, v: T) -> Option<Self>
    Shared nodeimplicit (GC alias)Rc<PVec<T>>
    Error modefailwith "index" (exception)None (Option)

    Key Insights

  • GC vs explicit sharing: OCaml's GC handles structural sharing automatically — the runtime recognises that the unchanged branch in Two (set i v l, r) is aliased to the original r. In Rust, Rc<T> makes this explicit: Rc::clone(r) is a reference-count bump, not a value copy.
  • **Box<T> vs Rc<T> for trees:** Box<T> expresses unique ownership — updating a Box-tree requires deep-cloning all untouched subtrees. Rc<T> expresses shared ownership — unchanged subtrees can be aliased safely across versions. Choosing Rc is what makes this data structure genuinely persistent.
  • Error handling philosophy: OCaml raises exceptions for out-of-bounds access (failwith). Rust makes errors part of the type: Option<T> forces the caller to handle the absent case at compile time, not at runtime.
  • Clone bound propagation: OCaml's 'a pvec is polymorphic with no constraints — set works for any element type. Rust's set and from_slice require T: Clone explicitly because they copy values into new nodes. Operations that only traverse the tree (get, size) need no bound.
  • Recursive types: Both OCaml and Rust require indirection for recursive types (OCaml uses implicit heap boxing; Rust requires explicit Box<T> or Rc<T>). Without the pointer indirection, the compiler cannot determine the size of PVec<T> on the stack.
  • When to Use Each Style

    **Use Rc-based PVec when:** you need genuine persistence — multiple versions of a data structure coexisting, e.g. implementing undo history, a persistent map, or a CRDT.

    **Use Box-based PVecBox when:** you want the clearest structural analogy to OCaml for learning purposes, or you only ever need one version at a time and functional updates are infrequent.

    **Use Vec::clone() when:** you need "copy-on-write" semantics but don't care about structural sharing — simplest code, fine for small vectors or infrequent updates.

    Open Source Repos