๐Ÿฆ€ Functional Rust

767: Versioned Serialization with Migration

Difficulty: 4 Level: Advanced Handle schema evolution โ€” read old data with new code by versioning your wire format and building a migration chain.

The Problem This Solves

Your data format changes over time: you add fields, rename them, change types. Old data still exists on disk, in databases, in message queues. New code must be able to read old data without breaking. This is schema evolution, and it's one of the hardest problems in production systems. The naive solution (add nullable fields to one struct and handle all versions in one place) leads to increasingly messy code with many `if version >= 2` branches. The clean solution is a version-tagged format with a migration chain: each version is a separate type, `From` conversions migrate between adjacent versions, and a `VersionedUser` enum acts as a parsing discriminant. Reading always deserializes into the appropriate versioned type, then migrates to the current version in one `.into_current()` call. This pattern is used in database migration frameworks, Avro/Protobuf schema registries, event sourcing systems, and any long-lived storage format. The `serde` equivalent uses `#[serde(default)]` and `Option<T>` for additive changes, and explicit version tags for breaking changes.

The Intuition

Version the serialized format with a `version=N` field. Parse into a `VersionedUser::V1`, `V2`, or `V3` enum variant based on that tag. Implement `From<V1> for V2`, `From<V2> for V3`, and `From<V1> for V3` (transitive chain). A single method `.into_current()` converts any version to the latest by following the chain. New fields get sensible defaults in `From` implementations.

How It Works in Rust

// Each version is its own type โ€” no shared mutable fields
struct UserV1 { name: String, age: u32 }
struct UserV2 { name: String, age: u32, email: String }    // added field
struct UserV3 { name: String, age: u32, email: String, active: bool }  // added field

// Migration: From<older> for newer with sensible defaults
impl From<UserV1> for UserV2 {
 fn from(u: UserV1) -> Self {
     UserV2 {
         email: format!("{}@example.com", u.name.to_lowercase()),
         name: u.name,
         age: u.age,
     }
 }
}

impl From<UserV2> for UserV3 {
 fn from(u: UserV2) -> Self {
     UserV3 { active: true, name: u.name, age: u.age, email: u.email }
 }
}

// Enum as parsing discriminant
enum VersionedUser { V1(UserV1), V2(UserV2), V3(UserV3) }

impl VersionedUser {
 fn into_current(self) -> UserV3 {
     match self {
         VersionedUser::V1(u) => UserV3::from(u),  // two migrations
         VersionedUser::V2(u) => UserV3::from(u),  // one migration
         VersionedUser::V3(u) => u,                // already current
     }
 }
}

// Deserializer reads version tag, dispatches to correct parser
fn deserialize(s: &str) -> Result<VersionedUser, DeError> {
 let map = fields(s);
 match map.get("version") {
     Some("1") => Ok(VersionedUser::V1(parse_v1(&map)?)),
     Some("2") => Ok(VersionedUser::V2(parse_v2(&map)?)),
     Some("3") => Ok(VersionedUser::V3(parse_v3(&map)?)),
     Some(v)   => Err(DeError::UnsupportedVersion(v.to_string())),
     None      => Err(DeError::MissingField("version")),
 }
}
The `into_current()` pattern means migration logic is in `From` implementations โ€” easily testable, composable, and free of if/else chains.

What This Unlocks

Key Differences

ConceptOCamlRust
Version enumVariant per version`enum VersionedUser { V1(...), V2(...), V3(...) }`
Migration chainFunction composition`From<V1> for V2` + `From<V2> for V3`
Default field values`Option.value ~default:``#[serde(default)]` or explicit `From` defaults
Unknown versionException`Err(UnsupportedVersion(v.to_string()))` โ€” typed error
// 767. Versioned Serialization with Migration
// Version tag in payload, migration chain

#[derive(Debug, PartialEq)]
pub struct UserV1 {
    pub name: String,
    pub age: u32,
}

#[derive(Debug, PartialEq, Clone)]
pub struct UserV2 {
    pub name: String,
    pub age: u32,
    pub email: String,    // new in V2
}

#[derive(Debug, PartialEq, Clone)]
pub struct UserV3 {
    pub name: String,
    pub age: u32,
    pub email: String,
    pub active: bool,     // new in V3
}

// โ”€โ”€ Migration chain โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

impl From<UserV1> for UserV2 {
    fn from(u: UserV1) -> Self {
        UserV2 {
            email: format!("{}@example.com", u.name.to_lowercase().replace(' ', ".")),
            name: u.name,
            age: u.age,
        }
    }
}

impl From<UserV2> for UserV3 {
    fn from(u: UserV2) -> Self {
        UserV3 {
            name: u.name,
            age: u.age,
            email: u.email,
            active: true, // sensible default for migrated records
        }
    }
}

impl From<UserV1> for UserV3 {
    fn from(u: UserV1) -> Self {
        UserV3::from(UserV2::from(u))
    }
}

// โ”€โ”€ Versioned enum โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug)]
pub enum VersionedUser {
    V1(UserV1),
    V2(UserV2),
    V3(UserV3),
}

impl VersionedUser {
    /// Always get the latest version (migrate if needed)
    pub fn into_current(self) -> UserV3 {
        match self {
            VersionedUser::V1(u) => UserV3::from(u),
            VersionedUser::V2(u) => UserV3::from(u),
            VersionedUser::V3(u) => u,
        }
    }
}

// โ”€โ”€ Serialization โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

pub fn serialize_v3(u: &UserV3) -> String {
    format!("version=3|name={}|age={}|email={}|active={}", u.name, u.age, u.email, u.active)
}
pub fn serialize_v1(u: &UserV1) -> String {
    format!("version=1|name={}|age={}", u.name, u.age)
}

fn fields(s: &str) -> std::collections::HashMap<&str, &str> {
    s.split('|').filter_map(|p| p.split_once('=')).collect()
}

// โ”€โ”€ Deserialization โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

#[derive(Debug)]
pub enum DeError { MissingField(&'static str), UnsupportedVersion(String), ParseError(String) }

pub fn deserialize(s: &str) -> Result<VersionedUser, DeError> {
    let map = fields(s);
    match *map.get("version").ok_or(DeError::MissingField("version"))? {
        "1" => {
            let name = map.get("name").ok_or(DeError::MissingField("name"))?.to_string();
            let age  = map.get("age").ok_or(DeError::MissingField("age"))?
                .parse().map_err(|e: std::num::ParseIntError| DeError::ParseError(e.to_string()))?;
            Ok(VersionedUser::V1(UserV1 { name, age }))
        }
        "2" => {
            let name  = map.get("name").ok_or(DeError::MissingField("name"))?.to_string();
            let age   = map.get("age").ok_or(DeError::MissingField("age"))?
                .parse().map_err(|e: std::num::ParseIntError| DeError::ParseError(e.to_string()))?;
            let email = map.get("email").ok_or(DeError::MissingField("email"))?.to_string();
            Ok(VersionedUser::V2(UserV2 { name, age, email }))
        }
        "3" => {
            let name   = map.get("name").ok_or(DeError::MissingField("name"))?.to_string();
            let age    = map.get("age").ok_or(DeError::MissingField("age"))?
                .parse().map_err(|e: std::num::ParseIntError| DeError::ParseError(e.to_string()))?;
            let email  = map.get("email").ok_or(DeError::MissingField("email"))?.to_string();
            let active = map.get("active").map(|v| *v == "true").unwrap_or(true);
            Ok(VersionedUser::V3(UserV3 { name, age, email, active }))
        }
        v => Err(DeError::UnsupportedVersion(v.to_string())),
    }
}

fn main() {
    // Simulate reading old V1 data and migrating to V3
    let old = UserV1 { name: "Alice".into(), age: 30 };
    let wire = serialize_v1(&old);
    println!("Old wire: {wire}");

    let versioned = deserialize(&wire).expect("decode failed");
    let current = versioned.into_current();
    println!("Migrated to V3: {current:?}");

    // Current format round-trip
    let wire3 = serialize_v3(&current);
    println!("V3 wire: {wire3}");
    let back = deserialize(&wire3).expect("v3 decode").into_current();
    println!("V3 round-trip: {back:?}");
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn v1_migrates_to_v3() {
        let u1 = UserV1 { name: "Bob".into(), age: 25 };
        let wire = serialize_v1(&u1);
        let v3 = deserialize(&wire).unwrap().into_current();
        assert_eq!(v3.name, "Bob");
        assert_eq!(v3.age, 25);
        assert!(v3.email.contains("bob"));
        assert!(v3.active);
    }

    #[test]
    fn unknown_version_errors() {
        let result = deserialize("version=99|name=X|age=1");
        assert!(matches!(result, Err(DeError::UnsupportedVersion(_))));
    }

    #[test]
    fn v3_round_trip() {
        let u = UserV3 { name: "Carol".into(), age: 35, email: "c@test.com".into(), active: false };
        let wire = serialize_v3(&u);
        let back = deserialize(&wire).unwrap().into_current();
        assert_eq!(back, u);
    }
}
(* Versioned serialization with migration in OCaml *)

(* โ”€โ”€ V1 schema: name + age *)
type user_v1 = { name: string; age: int }

(* โ”€โ”€ V2 schema: name + age + email (new field) *)
type user_v2 = { name: string; age: int; email: string }

(* โ”€โ”€ Migration: V1 โ†’ V2 *)
let migrate_v1_to_v2 (u: user_v1) : user_v2 =
  { name  = u.name;
    age   = u.age;
    email = u.name ^ "@example.com" }  (* synthesized default *)

(* โ”€โ”€ Versioned union *)
type versioned_user =
  | V1User of user_v1
  | V2User of user_v2

(* โ”€โ”€ Serialize *)
let serialize_v2 u =
  Printf.sprintf "version=2|name=%s|age=%d|email=%s" u.name u.age u.email

let serialize_v1 u =
  Printf.sprintf "version=1|name=%s|age=%d" u.name u.age

(* โ”€โ”€ Deserialize with migration *)
let field pairs key =
  match List.assoc_opt key pairs with
  | Some v -> Ok v
  | None   -> Error ("missing field: " ^ key)

let parse_pairs s =
  String.split_on_char '|' s
  |> List.filter_map (fun p ->
    match String.split_on_char '=' p with
    | [k; v] -> Some (k, v)
    | _ -> None)

let deserialize s =
  let pairs = parse_pairs s in
  match field pairs "version" with
  | Error e -> Error e
  | Ok "1" ->
    (match field pairs "name", field pairs "age" with
     | Ok name, Ok age_s ->
       (try
         let u1 = V1User { name; age = int_of_string age_s } in
         Ok u1
        with Failure e -> Error e)
     | Error e, _ | _, Error e -> Error e)
  | Ok "2" ->
    (match field pairs "name", field pairs "age", field pairs "email" with
     | Ok name, Ok age_s, Ok email ->
       (try Ok (V2User { name; age = int_of_string age_s; email })
        with Failure e -> Error e)
     | Error e, _, _ | _, Error e, _ | _, _, Error e -> Error e)
  | Ok v -> Error ("unsupported version: " ^ v)

(* Normalize to V2 (migrating if needed) *)
let to_v2 = function
  | V1User u1 -> migrate_v1_to_v2 u1
  | V2User u2 -> u2

let () =
  (* Write in old format, read as new *)
  let old_data = serialize_v1 { name = "Alice"; age = 30 } in
  Printf.printf "Old wire: %s\n" old_data;
  match deserialize old_data with
  | Ok v ->
    let u2 = to_v2 v in
    Printf.printf "Migrated: name=%s age=%d email=%s\n" u2.name u2.age u2.email
  | Error e -> Printf.printf "Error: %s\n" e