๐Ÿฆ€ Functional Rust

487: String Interning

Difficulty: 2 Level: Intermediate Cache identical strings so you store each unique value once and compare by pointer, not content.

The Problem This Solves

In compilers, parsers, and data pipelines you often encounter the same string thousands or millions of times: variable names, field names, log levels, city names. Allocating a fresh `String` for each occurrence wastes memory and makes equality checks O(n) in string length. String interning solves both problems. The intern table maps each unique string to a shared reference. The second time you see `"identifier"`, you get back the same pointer you got the first time. Equality check becomes pointer comparison โ€” O(1). Memory usage drops to one copy per unique string. This is how language runtimes store symbol tables, how databases store column names, and how many serialization formats handle repeated keys.

The Intuition

A coin minting machine. Every request for a "quarter" returns the same physical coin from the vault, not a freshly minted one. Anyone holding a quarter knows it's the same value as any other quarter โ€” they don't need to inspect the details, just compare the reference. The vault only grows when a truly new denomination is requested.

How It Works in Rust

1. Simple interner with `Arc<str>`:
use std::collections::HashMap;
use std::sync::{Arc, Mutex};

struct Interner(Mutex<HashMap<String, Arc<str>>>);

impl Interner {
    fn intern(&self, s: &str) -> Arc<str> {
        let mut map = self.0.lock().unwrap();
        if let Some(interned) = map.get(s) {
            return interned.clone();
        }
        let arc: Arc<str> = s.into();
        map.insert(s.to_string(), arc.clone());
        arc
    }
}
2. Compare interned strings by pointer:
let a = interner.intern("hello");
let b = interner.intern("hello");
assert!(Arc::ptr_eq(&a, &b)); // same allocation
3. Single-threaded โ€” swap `Mutex` for `RefCell` in non-concurrent contexts. 4. Typed symbols โ€” wrap `Arc<str>` in a newtype for type safety:
#[derive(Clone, PartialEq, Eq, Hash)]
struct Symbol(Arc<str>);

What This Unlocks

Key Differences

ConceptOCamlRust
Intern mechanism`String.intern` (stdlib!)Manual `HashMap<String, Arc<str>>`
Reference type`string` (already internable)`Arc<str>` or newtype wrapper
Thread safetyGC-managed`Mutex` around intern table
Equality check`==` (may intern automatically)`Arc::ptr_eq` for pointer comparison
// 487. String interning pattern
use std::collections::HashMap;
use std::sync::{Arc, Mutex};

pub struct Interner {
    table: HashMap<String, Arc<str>>,
}

impl Interner {
    pub fn new() -> Self { Interner { table: HashMap::new() } }
    pub fn intern(&mut self, s: &str) -> Arc<str> {
        if let Some(v) = self.table.get(s) { return Arc::clone(v); }
        let arc: Arc<str> = Arc::from(s);
        self.table.insert(s.to_string(), Arc::clone(&arc));
        arc
    }
    pub fn len(&self) -> usize { self.table.len() }
}

// Thread-safe interner
pub struct SyncInterner(Mutex<Interner>);
impl SyncInterner {
    pub fn new() -> Self { SyncInterner(Mutex::new(Interner::new())) }
    pub fn intern(&self, s: &str) -> Arc<str> { self.0.lock().unwrap().intern(s) }
}

fn main() {
    let mut interner = Interner::new();
    let a = interner.intern("hello");
    let b = interner.intern("world");
    let c = interner.intern("hello"); // same as a

    println!("a='{}' b='{}' c='{}'", a, b, c);
    // O(1) equality: compare Arc pointers
    println!("a==c: {}", Arc::ptr_eq(&a, &c)); // true
    println!("a==b: {}", Arc::ptr_eq(&a, &b)); // false
    println!("table size: {}", interner.len()); // 2

    // Simulate many repeated strings
    let words = vec!["foo","bar","baz","foo","bar","foo","qux","baz"];
    let mut interner2 = Interner::new();
    let interned: Vec<Arc<str>> = words.iter().map(|&w| interner2.intern(w)).collect();
    println!("interned {} strings, {} unique", interned.len(), interner2.len());
    // O(1) equality check
    println!("words[0]==words[3]: {}", Arc::ptr_eq(&interned[0], &interned[3]));
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn test_intern_same()    { let mut i=Interner::new(); let a=i.intern("hi"); let b=i.intern("hi"); assert!(Arc::ptr_eq(&a,&b)); }
    #[test] fn test_intern_diff()    { let mut i=Interner::new(); let a=i.intern("hi"); let b=i.intern("ho"); assert!(!Arc::ptr_eq(&a,&b)); }
    #[test] fn test_intern_size()    { let mut i=Interner::new(); i.intern("a"); i.intern("b"); i.intern("a"); assert_eq!(i.len(),2); }
}
(* 487. String interning โ€“ OCaml *)
module StringSet = Set.Make(String)

(* Simple interner: returns the canonical string *)
let table : (string, string) Hashtbl.t = Hashtbl.create 64

let intern s =
  match Hashtbl.find_opt table s with
  | Some v -> v
  | None -> Hashtbl.add table s s; s

let () =
  let a = intern "hello" in
  let b = intern "world" in
  let c = intern "hello" in
  Printf.printf "a=%s b=%s c=%s\n" a b c;
  (* In OCaml, physical equality (==) checks pointer *)
  Printf.printf "a==c: %b\n" (a == c);  (* true: same string *)
  Printf.printf "a==b: %b\n" (a == b);  (* false *)
  Printf.printf "intern table size: %d\n" (Hashtbl.length table)