๐Ÿฆ€ Functional Rust

361: Rope for Efficient String Operations

Difficulty: 4 Level: Expert A tree-based string representation enabling O(1) concatenation and O(log n) splits โ€” without copying.

The Problem This Solves

Strings are immutable byte arrays in most languages. Concatenating two strings means allocating a new buffer and copying both inputs โ€” `O(n)`. For small strings this is fine. For text editors handling megabytes of source code, this is catastrophic. Every keystroke, every undo, every paste would trigger a massive copy. Real text editors (Vim, VS Code, Emacs) use specialized data structures to avoid this. The rope is the classic solution. A rope represents a string as a binary tree of string slices. Concatenation just creates a new root node pointing to both trees โ€” no copying. Splitting walks the tree and restructures it โ€” `O(log n)`. Index lookup traverses down the tree by accumulated length โ€” also `O(log n)`. The tradeoff: random character access is slower than a flat string (`O(log n)` vs `O(1)`), and memory overhead is higher. But for workloads dominated by insertions, deletions, and concatenations of large strings, ropes win decisively.

The Intuition

Think of a rope as a linked list of string chunks, but organized as a balanced binary tree for efficient splitting and indexing. Each leaf holds a string slice. Each internal node holds the total length of its left subtree โ€” enough information to navigate to any character position in `O(log n)` steps. When you concatenate two ropes, you create a new root. When you split, you walk down and restructure. The key insight: you never modify existing nodes, so sharing subtrees between versions is safe โ€” this is how persistent text buffers work.

How It Works in Rust

#[derive(Clone)]
enum Rope {
 Leaf(String),
 Node {
     left: Box<Rope>,
     right: Box<Rope>,
     len: usize,  // total length for navigation
 },
}

impl Rope {
 fn concat(left: Rope, right: Rope) -> Rope {
     let len = left.len() + right.len();
     Rope::Node {
         left: Box::new(left),
         right: Box::new(right),
         len,
     }
 }

 fn len(&self) -> usize {
     match self {
         Rope::Leaf(s) => s.len(),
         Rope::Node { len, .. } => *len,
     }
 }
}
In practice, use the `ropey` crate for a production-quality implementation with Unicode support and rebalancing.

What This Unlocks

Key Differences

ConceptOCamlRust
String concat`^` operator, `O(n)` copy`+` operator (`O(n)`), or `Rope::concat` (`O(1)`)
Persistent stringImmutable strings, but still copies`Rope` with `Arc` subtree sharing
Text buffer`Buffer.t` (gap buffer in editors)`ropey::Rope` (tree-based)
Split/slice`String.sub`, `O(n)` copy`Rope::split` `O(log n)`, no copy
#[derive(Debug, Clone)]
enum Rope {
    Leaf(String),
    Node { left: Box<Rope>, right: Box<Rope>, length: usize },
}

impl Rope {
    fn leaf(s: impl Into<String>) -> Self { Self::Leaf(s.into()) }

    fn length(&self) -> usize {
        match self { Self::Leaf(s) => s.len(), Self::Node{length,..} => *length }
    }

    fn concat(left: Rope, right: Rope) -> Rope {
        if left.length() == 0 { return right; }
        if right.length() == 0 { return left; }
        let length = left.length() + right.length();
        Rope::Node { left: Box::new(left), right: Box::new(right), length }
    }

    fn to_string(&self) -> String {
        match self {
            Self::Leaf(s) => s.clone(),
            Self::Node{left,right,..} => left.to_string() + &right.to_string(),
        }
    }

    fn char_at(&self, idx: usize) -> Option<u8> {
        match self {
            Self::Leaf(s) => s.as_bytes().get(idx).copied(),
            Self::Node{left,right,..} => {
                let ll = left.length();
                if idx < ll { left.char_at(idx) } else { right.char_at(idx - ll) }
            }
        }
    }

    fn split_at(self, idx: usize) -> (Rope, Rope) {
        match self {
            Rope::Leaf(s) => {
                let (a,b) = s.split_at(idx.min(s.len()));
                (Rope::leaf(a), Rope::leaf(b))
            }
            Rope::Node{left,right,..} => {
                let ll = left.length();
                if idx <= ll {
                    let (la, lb) = left.split_at(idx);
                    (la, Rope::concat(lb, *right))
                } else {
                    let (ra, rb) = right.split_at(idx - ll);
                    (Rope::concat(*left, ra), rb)
                }
            }
        }
    }
}

fn main() {
    let r = Rope::concat(Rope::leaf("Hello, "), Rope::concat(Rope::leaf("World"), Rope::leaf("!")));
    println!("{}", r.to_string());
    println!("Length: {}", r.length());
    println!("Char at 7: {}", r.char_at(7).unwrap() as char);

    let (left, right) = r.clone().split_at(7);
    println!("Split: '{}' | '{}'", left.to_string(), right.to_string());
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn concat_and_string() {
        let r = Rope::concat(Rope::leaf("foo"), Rope::leaf("bar"));
        assert_eq!(r.to_string(), "foobar");
        assert_eq!(r.length(), 6);
    }
    #[test] fn char_at() {
        let r = Rope::concat(Rope::leaf("ab"), Rope::leaf("cd"));
        assert_eq!(r.char_at(2), Some(b'c'));
    }
    #[test] fn split() {
        let r = Rope::concat(Rope::leaf("hello"), Rope::leaf(" world"));
        let (l,rr) = r.split_at(5);
        assert_eq!(l.to_string(), "hello");
        assert_eq!(rr.to_string(), " world");
    }
}
(* OCaml: simple rope implementation *)

type rope =
  | Leaf of string
  | Node of rope * rope * int  (* left, right, total_len *)

let length = function
  | Leaf s -> String.length s
  | Node (_,_,n) -> n

let make_node l r = Node (l, r, length l + length r)

let concat a b = match a, b with
  | Leaf "", _ -> b | _, Leaf "" -> a
  | _ -> make_node a b

let rec to_string = function
  | Leaf s -> s
  | Node (l,r,_) -> to_string l ^ to_string r

let rec index_at rope i =
  match rope with
  | Leaf s -> s.[i]
  | Node (l,_,_) when i < length l -> index_at l i
  | Node (l,r,_) -> index_at r (i - length l)

let () =
  let r = concat (Leaf "Hello, ") (concat (Leaf "World") (Leaf "!")) in
  Printf.printf "%s\n" (to_string r);
  Printf.printf "Length: %d\n" (length r);
  Printf.printf "Char at 7: %c\n" (index_at r 7)