๐Ÿฆ€ Functional Rust

730: Small String Optimization

Difficulty: 3 Level: Advanced Store strings up to 23 bytes inline in the enum variant โ€” heap allocation only for longer strings.

The Problem This Solves

In most programs, the majority of strings are short: identifiers, labels, error codes, keys in hash maps, command names. Rust's `String` allocates every string on the heap, even `"ok"`. For a hash map with millions of short-string keys, that's millions of heap allocations โ€” each with allocation overhead, an extra pointer to follow, and a distinct cache line for each string's characters. The Small String Optimisation (SSO) trades a slightly larger stack footprint for heap allocation elimination on short strings. Strings up to a threshold length (commonly 15โ€“23 bytes) are stored directly in the enum variant โ€” on the stack or inline in the struct. Only longer strings go to the heap. C++'s `std::string`, Rust's `compact_str` crate, and many production systems use this technique. The beauty of Rust's enum is that SSO is expressible in entirely safe code: `Inline { buf: [u8; 23], len: u8 }` holds the string bytes directly, and the discriminant tracks which variant is active. The `match` at read time is free โ€” the CPU branch predictor learns the pattern in hot loops.

The Intuition

Normally, a Rust `String` is three words: pointer + length + capacity. For a 3-character string like `"yes"`, you pay three words of overhead plus a heap round-trip just to store 3 bytes. SSO says: if the string fits in the space we'd use for the pointer and its friends, just put the string there directly. No heap, no pointer, no allocation. The 23-byte limit in this example is not arbitrary: on a 64-bit system, `size_of::<String>()` is 24 bytes. We use 23 bytes for characters and 1 byte for the length โ€” exactly 24 bytes total. The `SsoString` enum has the same size as `String`, but avoids the heap for short strings.

How It Works in Rust

const INLINE_CAP: usize = 23;

#[derive(Debug)]
enum SsoString {
 Inline { buf: [u8; INLINE_CAP], len: u8 },
 Heap(Box<str>),
}

impl SsoString {
 pub fn new(s: &str) -> Self {
     if s.len() <= INLINE_CAP {
         let mut buf = [0u8; INLINE_CAP];
         buf[..s.len()].copy_from_slice(s.as_bytes());
         SsoString::Inline { buf, len: s.len() as u8 }
     } else {
         SsoString::Heap(s.into())
     }
 }

 pub fn as_str(&self) -> &str {
     match self {
         SsoString::Inline { buf, len } =>
             std::str::from_utf8(&buf[..*len as usize]).unwrap(),
         SsoString::Heap(s) => s,
     }
 }

 pub fn len(&self) -> usize {
     match self {
         SsoString::Inline { len, .. } => *len as usize,
         SsoString::Heap(s) => s.len(),
     }
 }
}
For production use, the `compact_str` crate implements a battle-tested SSO string with a 24-byte `String`-compatible layout, `Display`, `Debug`, `Eq`, `Hash`, and `Deref<Target=str>`.

What This Unlocks

Key Differences

ConceptOCamlRust
String storageAlways heap-allocated `string``String` = heap; `&str` = borrowed slice
Custom SSO typeGADT or abstract type`enum` with `Inline`/`Heap` variants
Inline bytesNot expressible`[u8; N]` inline in enum variant
Size controlNo control`repr(C)` / explicit layout matches `String`
Production SSONot in stdlib`compact_str` crate
Length encodingNot applicable1 byte in `len: u8` field
/// 730: Small String Optimization
/// Stores โ‰ค23 bytes inline; falls back to `Box<str>` for longer strings.

const INLINE_CAP: usize = 23;

/// An SSO string. Size = 24 bytes (same as String on 64-bit).
#[derive(Debug)]
enum SsoString {
    Inline { buf: [u8; INLINE_CAP], len: u8 },
    Heap(Box<str>),
}

impl SsoString {
    pub fn new(s: &str) -> Self {
        if s.len() <= INLINE_CAP {
            let mut buf = [0u8; INLINE_CAP];
            buf[..s.len()].copy_from_slice(s.as_bytes());
            SsoString::Inline { buf, len: s.len() as u8 }
        } else {
            SsoString::Heap(s.into())
        }
    }

    pub fn as_str(&self) -> &str {
        match self {
            SsoString::Inline { buf, len } => {
                std::str::from_utf8(&buf[..*len as usize]).unwrap()
            }
            SsoString::Heap(s) => s,
        }
    }

    pub fn len(&self) -> usize {
        match self {
            SsoString::Inline { len, .. } => *len as usize,
            SsoString::Heap(s) => s.len(),
        }
    }

    pub fn is_empty(&self) -> bool { self.len() == 0 }

    pub fn is_inline(&self) -> bool {
        matches!(self, SsoString::Inline { .. })
    }
}

impl std::fmt::Display for SsoString {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_str(self.as_str())
    }
}

fn main() {
    let cases = [
        "hi",
        "hello, world",
        "exactly23byteslong!!!xx",   // 23 bytes โ€” inline
        "this string is definitely too long for inline storage",
    ];
    for s in &cases {
        let sso = SsoString::new(s);
        println!(
            "len={:2} inline={} โ†’ \"{}\"",
            sso.len(),
            sso.is_inline(),
            &sso.as_str()[..sso.len().min(30)],
        );
    }
    println!("\nSsoString size: {} bytes", std::mem::size_of::<SsoString>());
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn empty_is_inline() {
        let s = SsoString::new("");
        assert!(s.is_inline());
        assert_eq!(s.len(), 0);
        assert_eq!(s.as_str(), "");
    }

    #[test]
    fn short_string_inline() {
        let s = SsoString::new("hello");
        assert!(s.is_inline());
        assert_eq!(s.as_str(), "hello");
    }

    #[test]
    fn boundary_23_bytes_is_inline() {
        let s23 = "a".repeat(INLINE_CAP);
        let sso = SsoString::new(&s23);
        assert!(sso.is_inline());
        assert_eq!(sso.as_str(), s23);
    }

    #[test]
    fn boundary_24_bytes_is_heap() {
        let s24 = "a".repeat(INLINE_CAP + 1);
        let sso = SsoString::new(&s24);
        assert!(!sso.is_inline());
        assert_eq!(sso.as_str(), s24);
    }

    #[test]
    fn long_string_heap() {
        let long = "this is a long string that exceeds the inline capacity";
        let sso = SsoString::new(long);
        assert!(!sso.is_inline());
        assert_eq!(sso.as_str(), long);
    }
}
(* 730: Small String Optimization โ€” OCaml doesn't have true SSO,
   but we can model the concept with a variant type. *)

(* Simulate SSO: inline up to 15 bytes, otherwise heap string *)
type sso_string =
  | Inline of bytes   (* short: stored in OCaml boxed bytes (not true SSO, but conceptual) *)
  | Heap of string    (* long: regular OCaml string *)

let sso_threshold = 15

let sso_of_string s =
  if String.length s <= sso_threshold
  then Inline (Bytes.of_string s)
  else Heap s

let sso_to_string = function
  | Inline b -> Bytes.to_string b
  | Heap s   -> s

let sso_len = function
  | Inline b -> Bytes.length b
  | Heap s   -> String.length s

let is_inline = function
  | Inline _ -> true
  | Heap _   -> false

let () =
  let short = sso_of_string "hello" in
  let long  = sso_of_string "this is a very long string indeed" in
  Printf.printf "Short '%s' inline=%b len=%d\n"
    (sso_to_string short) (is_inline short) (sso_len short);
  Printf.printf "Long  '%.20s...' inline=%b len=%d\n"
    (sso_to_string long) (is_inline long) (sso_len long)