๐Ÿฆ€ Functional Rust

768: Zero-Copy Deserialisation with Lifetime Tricks

Difficulty: 4 Level: Advanced Deserialize structured data into a struct whose string fields borrow directly from the input buffer โ€” no heap allocation for string data.

The Problem This Solves

Every time you deserialize a JSON or CSV record into an owned `String`, you pay for an allocation and a copy of each string field. For parsing millions of records in a hot loop, this is a significant cost. Zero-copy deserialization eliminates it: the parsed struct holds `&str` slices that point directly into the original input bytes. This technique is the foundation of `serde`'s `Deserialize<'de>` design โ€” the `'de` lifetime is exactly this: it ties the struct's fields to the lifetime of the deserialization input. Understanding how to implement it without `serde` first makes `serde`'s lifetime signature (`impl<'de> Deserialize<'de> for MyType`) intuitive rather than mysterious. The tradeoff: the input buffer must outlive the parsed struct. When you're done processing, convert to owned values with `PersonOwned::from(view)`.

The Intuition

A `PersonView<'de>` is a view into a specific input buffer โ€” the `'de` lifetime parameter says "I was carved out of a buffer that lives at least as long as `'de`." The struct's `&'de str` fields are just pointers and lengths into that buffer โ€” no allocation. The parser returns `PersonView<'_>` where `'_` is inferred from the input: the returned struct can't outlive the string passed to `parse_view`. The compiler enforces this automatically.

How It Works in Rust

The zero-copy struct โ€” fields borrow from `'de`:
#[derive(Debug)]
pub struct PersonView<'de> {
 pub name:    &'de str,
 pub age_raw: &'de str,
 pub city:    Option<&'de str>,
}
No `String`, no allocation. Every field is a slice of the input. The parser โ€” returns borrows from the input:
pub fn parse_view(input: &str) -> Result<PersonView<'_>, ParseError> {
 fn find_field<'a>(input: &'a str, key: &str) -> Option<&'a str> {
     for part in input.split('|') {
         if let Some(v) = part.strip_prefix(&format!("{key}=")) {
             return Some(v);  // slice of input, not a copy
         }
     }
     None
 }
 // ...
}
`'_` in the return type is shorthand for "borrows from `input`" โ€” the compiler infers the lifetime. Explicit lifetime version โ€” showing `'de` in full:
pub fn deserialize_person<'de>(input: &'de str) -> Result<PersonView<'de>, ParseError> {
 parse_view(input)
}
Input and output share the same lifetime `'de` โ€” the struct fields will be valid exactly as long as `input` is. Batch parsing โ€” multiple views from one buffer:
pub fn parse_many(input: &str) -> Vec<PersonView<'_>> {
 input.lines()
      .filter(|l| !l.is_empty())
      .filter_map(|line| parse_view(line).ok())
      .collect()
}
All views share the same input buffer โ€” one allocation (for `Vec`), no copies of string data. Converting to owned when needed:
impl<'de> From<PersonView<'de>> for PersonOwned {
 fn from(v: PersonView<'de>) -> Self {
     PersonOwned {
         name: v.name.to_string(),  // allocate only when leaving the zero-copy context
         age: v.age().unwrap_or(0),
         city: v.city.map(|s| s.to_string()),
     }
 }
}

What This Unlocks

Key Differences

ConceptOCamlRust
Zero-copy string`Bytes.sub` (view) or bigstring libs`&'de str` โ€” lifetime tracks input buffer
Lifetime on structN/A (GC)`struct Foo<'de>` โ€” struct borrows from `'de`
`serde`-style `'de``ppx_deriving`, `jsonaf` (own GC strings)`impl<'de> Deserialize<'de>` โ€” borrows from input
Convert view to owned`String.copy s``.to_string()` โ€” explicit allocation at crossing point
// 768. Zero-Copy Deserialisation with Lifetime Tricks
// Borrows &'de str from input โ€” zero heap allocation

// โ”€โ”€ Zero-copy record: fields borrow from input โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

/// 'de = "deserialize" lifetime โ€” the input buffer must outlive this struct
#[derive(Debug)]
pub struct PersonView<'de> {
    pub name: &'de str,
    pub age_raw: &'de str,   // raw string, parse lazily
    pub city: Option<&'de str>,
}

impl<'de> PersonView<'de> {
    pub fn age(&self) -> Option<u32> {
        self.age_raw.parse().ok()
    }
}

// โ”€โ”€ Simple zero-copy parser โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug)]
pub struct ParseError(String);

/// Parse "name=Alice|age=30|city=Berlin" without any allocation
pub fn parse_view(input: &str) -> Result<PersonView<'_>, ParseError> {
    // Returns &str slices that borrow from `input`
    fn find_field<'a>(input: &'a str, key: &str) -> Option<&'a str> {
        let prefix = format!("{key}=");
        for part in input.split('|') {
            if let Some(v) = part.strip_prefix(prefix.as_str()) {
                return Some(v);
            }
        }
        None
    }

    let name = find_field(input, "name")
        .ok_or_else(|| ParseError("missing 'name'".into()))?;
    let age_raw = find_field(input, "age")
        .ok_or_else(|| ParseError("missing 'age'".into()))?;
    let city = find_field(input, "city");

    Ok(PersonView { name, age_raw, city })
}

// โ”€โ”€ Owned version (comparison) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug)]
pub struct PersonOwned {
    pub name: String,
    pub age: u32,
    pub city: Option<String>,
}

impl<'de> From<PersonView<'de>> for PersonOwned {
    fn from(v: PersonView<'de>) -> Self {
        PersonOwned {
            name: v.name.to_string(),
            age: v.age().unwrap_or(0),
            city: v.city.map(|s| s.to_string()),
        }
    }
}

// โ”€โ”€ Lifetime demo: showing 'de in action โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

/// This function signature shows how 'de ties input lifetime to output lifetime
pub fn deserialize_person<'de>(input: &'de str) -> Result<PersonView<'de>, ParseError> {
    parse_view(input)
}

// โ”€โ”€ Batch zero-copy parsing of many records โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

pub fn parse_many(input: &str) -> Vec<PersonView<'_>> {
    input.lines()
         .filter(|l| !l.is_empty())
         .filter_map(|line| parse_view(line).ok())
         .collect()
}

fn main() {
    let input = "name=Alice|age=30|city=Amsterdam";
    let view = parse_view(input).expect("parse failed");
    println!("Name: {}", view.name);          // &str pointing into `input`
    println!("Age : {:?}", view.age());
    println!("City: {:?}", view.city);

    // Zero-copy batch
    let records = "name=Bob|age=25\nname=Carol|age=35|city=Berlin\nname=Dave|age=40";
    let views = parse_many(records);
    println!("\nBatch ({} records):", views.len());
    for v in &views {
        println!("  {}: age={}", v.name, v.age_raw);
    }

    // Convert to owned when needed
    let owned: Vec<PersonOwned> = views.into_iter().map(PersonOwned::from).collect();
    println!("\nOwned: {:?}", owned.iter().map(|o| &o.name).collect::<Vec<_>>());
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn zero_copy_name_borrows_input() {
        let input = String::from("name=Umur|age=33");
        let view = parse_view(&input).unwrap();
        // view.name is a slice of `input` โ€” no allocation
        assert_eq!(view.name, "Umur");
        assert_eq!(view.age(), Some(33));
    }

    #[test]
    fn optional_city() {
        let input = "name=Eve|age=28|city=Paris";
        let view = parse_view(input).unwrap();
        assert_eq!(view.city, Some("Paris"));
    }

    #[test]
    fn missing_city_is_none() {
        let view = parse_view("name=X|age=1").unwrap();
        assert!(view.city.is_none());
    }

    #[test]
    fn missing_field_errors() {
        assert!(parse_view("age=30").is_err());
    }
}
(* Zero-copy style in OCaml โ€” using substrings that share the underlying buffer
   Note: OCaml strings are immutable; Bytes.sub_string still copies.
   We simulate zero-copy with offset+length pairs. *)

type string_view = { src: string; off: int; len: int }

let view_to_string sv = String.sub sv.src sv.off sv.len

(* A "zero-copy" record โ€” fields are views into the input buffer *)
type person_view = {
  name: string_view;
  age_str: string_view;  (* raw string, parse lazily *)
}

(* Parse "name=Alice|age=30" without copying field contents *)
let parse_view (s: string) : person_view option =
  let find_char s start c =
    let rec go i = if i >= String.length s then None
                   else if s.[i] = c then Some i
                   else go (i+1)
    in go start
  in
  (* find first '|' *)
  match find_char s 0 '|' with
  | None -> None
  | Some pipe ->
    (* name field: "name=Alice" โ†’ from 5 to pipe *)
    let name_off = 5 in  (* skip "name=" *)
    let name_len = pipe - name_off in
    (* age field: skip "age=" after pipe *)
    let age_off  = pipe + 1 + 4 in  (* skip "age=" *)
    let age_len  = String.length s - age_off in
    if name_len <= 0 || age_len <= 0 then None
    else Some {
      name    = { src = s; off = name_off; len = name_len };
      age_str = { src = s; off = age_off;  len = age_len  };
    }

let () =
  let input = "name=Alice|age=30" in
  match parse_view input with
  | None -> Printf.printf "parse failed\n"
  | Some pv ->
    Printf.printf "Name (view): %s\n" (view_to_string pv.name);
    Printf.printf "Age  (view): %s\n" (view_to_string pv.age_str);
    Printf.printf "Age  (int) : %d\n"
      (int_of_string (view_to_string pv.age_str))