361: Rope for Efficient String Operations
Difficulty: 4 Level: Expert A tree-based string representation enabling O(1) concatenation and O(log n) splits โ without copying.The Problem This Solves
Strings are immutable byte arrays in most languages. Concatenating two strings means allocating a new buffer and copying both inputs โ `O(n)`. For small strings this is fine. For text editors handling megabytes of source code, this is catastrophic. Every keystroke, every undo, every paste would trigger a massive copy. Real text editors (Vim, VS Code, Emacs) use specialized data structures to avoid this. The rope is the classic solution. A rope represents a string as a binary tree of string slices. Concatenation just creates a new root node pointing to both trees โ no copying. Splitting walks the tree and restructures it โ `O(log n)`. Index lookup traverses down the tree by accumulated length โ also `O(log n)`. The tradeoff: random character access is slower than a flat string (`O(log n)` vs `O(1)`), and memory overhead is higher. But for workloads dominated by insertions, deletions, and concatenations of large strings, ropes win decisively.The Intuition
Think of a rope as a linked list of string chunks, but organized as a balanced binary tree for efficient splitting and indexing. Each leaf holds a string slice. Each internal node holds the total length of its left subtree โ enough information to navigate to any character position in `O(log n)` steps. When you concatenate two ropes, you create a new root. When you split, you walk down and restructure. The key insight: you never modify existing nodes, so sharing subtrees between versions is safe โ this is how persistent text buffers work.How It Works in Rust
#[derive(Clone)]
enum Rope {
Leaf(String),
Node {
left: Box<Rope>,
right: Box<Rope>,
len: usize, // total length for navigation
},
}
impl Rope {
fn concat(left: Rope, right: Rope) -> Rope {
let len = left.len() + right.len();
Rope::Node {
left: Box::new(left),
right: Box::new(right),
len,
}
}
fn len(&self) -> usize {
match self {
Rope::Leaf(s) => s.len(),
Rope::Node { len, .. } => *len,
}
}
}
In practice, use the `ropey` crate for a production-quality implementation with Unicode support and rebalancing.
What This Unlocks
- Text editors โ efficient insert/delete at arbitrary positions without copying the whole buffer.
- Persistent strings โ share subtrees between versions; structural sharing for undo/redo.
- Diff engines โ split and merge large texts efficiently during patch application.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| String concat | `^` operator, `O(n)` copy | `+` operator (`O(n)`), or `Rope::concat` (`O(1)`) |
| Persistent string | Immutable strings, but still copies | `Rope` with `Arc` subtree sharing |
| Text buffer | `Buffer.t` (gap buffer in editors) | `ropey::Rope` (tree-based) |
| Split/slice | `String.sub`, `O(n)` copy | `Rope::split` `O(log n)`, no copy |
#[derive(Debug, Clone)]
enum Rope {
Leaf(String),
Node { left: Box<Rope>, right: Box<Rope>, length: usize },
}
impl Rope {
fn leaf(s: impl Into<String>) -> Self { Self::Leaf(s.into()) }
fn length(&self) -> usize {
match self { Self::Leaf(s) => s.len(), Self::Node{length,..} => *length }
}
fn concat(left: Rope, right: Rope) -> Rope {
if left.length() == 0 { return right; }
if right.length() == 0 { return left; }
let length = left.length() + right.length();
Rope::Node { left: Box::new(left), right: Box::new(right), length }
}
fn to_string(&self) -> String {
match self {
Self::Leaf(s) => s.clone(),
Self::Node{left,right,..} => left.to_string() + &right.to_string(),
}
}
fn char_at(&self, idx: usize) -> Option<u8> {
match self {
Self::Leaf(s) => s.as_bytes().get(idx).copied(),
Self::Node{left,right,..} => {
let ll = left.length();
if idx < ll { left.char_at(idx) } else { right.char_at(idx - ll) }
}
}
}
fn split_at(self, idx: usize) -> (Rope, Rope) {
match self {
Rope::Leaf(s) => {
let (a,b) = s.split_at(idx.min(s.len()));
(Rope::leaf(a), Rope::leaf(b))
}
Rope::Node{left,right,..} => {
let ll = left.length();
if idx <= ll {
let (la, lb) = left.split_at(idx);
(la, Rope::concat(lb, *right))
} else {
let (ra, rb) = right.split_at(idx - ll);
(Rope::concat(*left, ra), rb)
}
}
}
}
}
fn main() {
let r = Rope::concat(Rope::leaf("Hello, "), Rope::concat(Rope::leaf("World"), Rope::leaf("!")));
println!("{}", r.to_string());
println!("Length: {}", r.length());
println!("Char at 7: {}", r.char_at(7).unwrap() as char);
let (left, right) = r.clone().split_at(7);
println!("Split: '{}' | '{}'", left.to_string(), right.to_string());
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn concat_and_string() {
let r = Rope::concat(Rope::leaf("foo"), Rope::leaf("bar"));
assert_eq!(r.to_string(), "foobar");
assert_eq!(r.length(), 6);
}
#[test] fn char_at() {
let r = Rope::concat(Rope::leaf("ab"), Rope::leaf("cd"));
assert_eq!(r.char_at(2), Some(b'c'));
}
#[test] fn split() {
let r = Rope::concat(Rope::leaf("hello"), Rope::leaf(" world"));
let (l,rr) = r.split_at(5);
assert_eq!(l.to_string(), "hello");
assert_eq!(rr.to_string(), " world");
}
}
(* OCaml: simple rope implementation *)
type rope =
| Leaf of string
| Node of rope * rope * int (* left, right, total_len *)
let length = function
| Leaf s -> String.length s
| Node (_,_,n) -> n
let make_node l r = Node (l, r, length l + length r)
let concat a b = match a, b with
| Leaf "", _ -> b | _, Leaf "" -> a
| _ -> make_node a b
let rec to_string = function
| Leaf s -> s
| Node (l,r,_) -> to_string l ^ to_string r
let rec index_at rope i =
match rope with
| Leaf s -> s.[i]
| Node (l,_,_) when i < length l -> index_at l i
| Node (l,r,_) -> index_at r (i - length l)
let () =
let r = concat (Leaf "Hello, ") (concat (Leaf "World") (Leaf "!")) in
Printf.printf "%s\n" (to_string r);
Printf.printf "Length: %d\n" (length r);
Printf.printf "Char at 7: %c\n" (index_at r 7)