๐Ÿฆ€ Functional Rust
๐ŸŽฌ Rust Ownership in 30 seconds Visual walkthrough of ownership, moves, and automatic memory management.
๐Ÿ“ Text version (for readers / accessibility)

โ€ข Each value in Rust has exactly one owner โ€” when the owner goes out of scope, the value is dropped

โ€ข Assignment moves ownership by default; the original binding becomes invalid

โ€ข Borrowing (&T / &mut T) lets you reference data without taking ownership

โ€ข The compiler enforces: many shared references OR one mutable reference, never both

โ€ข No garbage collector needed โ€” memory is freed deterministically at scope exit

556: Rental / Self-Referential Pattern

Difficulty: 4 Level: Intermediate-Advanced Store parsed tokens alongside their source string โ€” without self-referential pointers โ€” using byte-span indices or shared ownership.

The Problem This Solves

Parsers face a tension: they want to return borrowed `&str` slices pointing into the input buffer (zero-copy), but they also want to own that input buffer so the caller doesn't need to keep the original alive. This is the classic "self-referential struct" problem โ€” a struct that holds both data and a reference into that same data. The `rental` crate attempted to solve this with macros. The `ouroboros` crate does it more safely with proc-macros. But both add complexity and compile-time overhead. In most real code, the cleanest solution is the one demonstrated here: store byte spans instead of `&str` slices, reconstruct slices on demand. The `Arc<String>` alternative shown in `SharedDocument` is useful when you need to share the source across threads or return owned values cheaply โ€” but it trades zero-copy for the overhead of reference-counted cloning.

The Intuition

The borrow checker forbids storing a `&str` that points into a field of the same struct โ€” the struct would need to borrow from itself before it's finished being constructed. Byte indices solve this: `(usize, usize)` pairs are plain data with no lifetime, and `&self.source[s..e]` reconstructs the view at call time with the correct lifetime. Think of `ParsedDocument` as a database: the `source` field is the backing store, and `token_spans` is an index. Queries into the index (`get_token`, `tokens()`) produce `&str` results borrowed from `self`, not stored in `self`.

How It Works in Rust

Span-indexed document โ€” parse stores `(start, end)` pairs:
struct ParsedDocument {
 source: String,
 token_spans: Vec<(usize, usize)>,
}

impl ParsedDocument {
 fn tokens(&self) -> impl Iterator<Item = &str> {
     self.token_spans.iter().map(|&(s, e)| &self.source[s..e])
 }
 fn get_token(&self, i: usize) -> Option<&str> {
     self.token_spans.get(i).map(|&(s, e)| &self.source[s..e])
 }
}
The returned `&str` values borrow from `self.source` through `&self` โ€” the compiler sees this as a normal field borrow. No unsafe, no macros. Building the index during parsing:
for (i, b) in source.bytes().enumerate() {
 let is_space = b == b' ' || b == b'\n' || b == b'\t';
 if !is_space && !in_word { word_start = i; in_word = true; }
 else if is_space && in_word { token_spans.push((word_start, i)); in_word = false; }
}
if in_word { token_spans.push((word_start, source.len())); }
`Arc<String>` alternative โ€” when tokens need to outlive the parser:
struct SharedDocument {
 source: Arc<String>,
 tokens: Vec<Arc<String>>,  // cloned substrings, reference-counted
}
`Arc::clone` is cheap (atomic increment), but each token is an allocation. Use when you need `Send + Sync` or need tokens to outlive `self`. When to use `ouroboros` โ€” if you genuinely need `&str` fields (e.g., for a zero-copy `serde` deserializer that borrows from a parsed buffer), `ouroboros` generates safe self-referential structs with proc-macro magic. Reach for it only when index-based approaches don't fit.

What This Unlocks

Key Differences

ConceptOCamlRust
Self-referential structNatural (GC, everything is a ref)Forbidden without `Pin`/`unsafe`/`ouroboros`
Zero-copy sub-string`Bytes.sub` (O(1) but lib-specific)`&source[s..e]` โ€” zero-copy, lifetime-checked
Shared ownershipNatural (GC shares freely)`Arc<T>` โ€” reference counted, `clone` is cheap
Span indexing patternLess necessary (no borrow checker)Idiomatic workaround for self-referential structs
//! # 556. Rental / Self-Referential Pattern
//! Owning source data alongside borrowed views โ€” avoiding copies.

/// Pattern: store owned data + indices/metadata (avoids self-referential issues)
struct ParsedDocument {
    source: String,
    // Store positions instead of &str references (avoids self-referential)
    token_spans: Vec<(usize, usize)>, // (start, end) byte positions
}

impl ParsedDocument {
    fn parse(text: &str) -> Self {
        let source = text.to_string();
        let mut token_spans = Vec::new();
        let mut in_word = false;
        let mut word_start = 0;

        for (i, b) in source.bytes().enumerate() {
            let is_space = b == b' ' || b == b'\n' || b == b'\t';
            if !is_space && !in_word {
                word_start = i;
                in_word = true;
            } else if is_space && in_word {
                token_spans.push((word_start, i));
                in_word = false;
            }
        }
        if in_word {
            token_spans.push((word_start, source.len()));
        }

        ParsedDocument { source, token_spans }
    }

    fn tokens(&self) -> impl Iterator<Item = &str> {
        self.token_spans.iter().map(|&(s, e)| &self.source[s..e])
    }

    fn token_count(&self) -> usize { self.token_spans.len() }

    fn get_token(&self, i: usize) -> Option<&str> {
        self.token_spans.get(i).map(|&(s, e)| &self.source[s..e])
    }
}

/// Alternative: use Arc<String> to share source data
use std::sync::Arc;

struct SharedDocument {
    source: Arc<String>,
    tokens: Vec<Arc<String>>, // cloned substrings (small overhead)
}

impl SharedDocument {
    fn parse(text: &str) -> Self {
        let source = Arc::new(text.to_string());
        let tokens = text.split_whitespace()
            .map(|t| Arc::new(t.to_string()))
            .collect();
        SharedDocument { source, tokens }
    }
}

fn main() {
    let text = "the quick brown fox jumps over the lazy dog";
    let doc = ParsedDocument::parse(text);

    println!("Parsed '{}' โ€” {} tokens:", text, doc.token_count());
    for (i, token) in doc.tokens().enumerate() {
        println!("  [{}] {:?}", i, token);
    }
    println!("token[3]: {:?}", doc.get_token(3));

    // SharedDocument
    let shared = SharedDocument::parse("hello world rust");
    println!("\nShared doc source: {}", shared.source);
    for t in &shared.tokens {
        print!("{} ", t);
    }
    println!();
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_parsed_document() {
        let doc = ParsedDocument::parse("hello world");
        assert_eq!(doc.token_count(), 2);
        assert_eq!(doc.get_token(0), Some("hello"));
        assert_eq!(doc.get_token(1), Some("world"));
    }

    #[test]
    fn test_tokens_iter() {
        let doc = ParsedDocument::parse("a b c");
        let tokens: Vec<&str> = doc.tokens().collect();
        assert_eq!(tokens, ["a", "b", "c"]);
    }
}
(* Rental / self-referential pattern in OCaml *)
(* OCaml: GC makes this trivial -- values reference each other freely *)

type parsed = {
  source: string;
  tokens: string list;
  word_count: int;
}

let parse text =
  let tokens = String.split_on_char ' ' text in
  { source = text; tokens; word_count = List.length tokens }

let () =
  let p = parse "hello world rust programming" in
  Printf.printf "source: %s\n" p.source;
  Printf.printf "words: %d\n" p.word_count;
  List.iter (fun t -> Printf.printf "  token: %s\n" t) p.tokens