719 Fundamental

Struct of Arrays (SoA) Pattern

Functional Programming

Tutorial

The Problem

Modern CPUs execute instructions faster than memory can supply data. The bottleneck in data-intensive loops is not computation but cache misses: loading a 64-byte cache line that contains only 4 bytes of useful data wastes 94% of bandwidth. The Array of Structs (AoS) layout—the default in most languages—interleaves every field of every record, forcing the CPU to load fields it does not need. Struct of Arrays (SoA) separates each field into its own contiguous array, enabling full cache-line utilization when iterating over a single field.

The pattern matters whenever a hot loop reads or writes only a subset of fields: physics simulations reading positions to compute forces, renderers reading vertex positions to cull geometry, databases scanning a single column, or game engines querying entity health bars. In all these cases AoS pays a memory bandwidth tax proportional to the number of unused fields.

🎯 Learning Outcomes

• Understand how CPU cache lines and spatial locality affect measured throughput

• Implement both AoS and SoA layouts and benchmark the difference

• Use #[repr(C)] and field alignment to reason about memory layouts

• Apply the SoA pattern in iterators that remain safe and ergonomic

• Recognize when SoA hurts (random access, frequent full-struct reads)

Code Example

#[derive(Clone, Debug, Default, PartialEq)]
pub struct ParticleAoS { pub x: f32, pub y: f32, pub z: f32, pub mass: f32 }

pub fn sum_x_aos(particles: &[ParticleAoS]) -> f32 {
    particles.iter().map(|p| p.x).sum()
}

type particle = { x: float; y: float; z: float; mass: float }

let sum_x_aos particles =
  Array.fold_left (fun acc p -> acc +. p.x) 0.0 particles

Key Differences

Aspect	Rust	OCaml
Default float layout	Unboxed, inline	Boxed (GC header per value)
Float-only struct	Contiguous, SIMD-ready	Flat array optimization applies
Mixed struct	Interleaved, inline fields	Each non-float field boxed
SoA ergonomics	`Vec<f32>` per field, safe	`Bigarray` or manual arrays
Auto-vectorization	Common with contiguous slices	Rare without Bigarray

Rust's value semantics mean ParticleSoA fields are guaranteed contiguous without annotation. OCaml needs explicit Bigarray.Array1 for numeric-intensive SoA work.

OCaml Approach

OCaml's GC uses a uniform boxed representation: every float inside a record is heap-allocated with a header word unless the record contains only floats, in which case OCaml applies a float-array optimization. For mixed-field structs, AoS is unavoidable without manual Bigarray usage:

(* Unboxed float arrays — cache-friendly for float-only data *)
type particle_soa = {
  x    : float array;
  y    : float array;
  mass : float array;
}

let update_positions soa dt =
  Array.iteri (fun i _ ->
    soa.x.(i) <- soa.x.(i) +. soa.vx.(i) *. dt
  ) soa.x

OCaml 5 introduces unboxed types (#float, #int) that reduce boxing overhead but full SoA still requires manual field separation.

Full Source

#![allow(clippy::all)]
//! # 719: Struct of Arrays (SoA) vs Array of Structs (AoS)
//!
//! Demonstrates how data layout affects cache efficiency.
//! AoS: `Vec<Particle>` — fields interleaved, bad for single-field iteration.
//! SoA: `{xs, ys, zs, masses}` — each field is contiguous, cache-friendly.

// ── Array of Structures (AoS) ─────────────────────────────────────────────────

/// Classic OOP layout: one struct per element.
/// Memory layout: `[x0,y0,z0,mass0, x1,y1,z1,mass1, ...]`.
/// When iterating only `x`, 75 % of each cache line is wasted.
#[derive(Clone, Debug, Default, PartialEq)]
pub struct ParticleAoS {
    pub x: f32,
    pub y: f32,
    pub z: f32,
    pub mass: f32,
}

/// Sum all x-coordinates using AoS.
/// Each element is 16 bytes; only 4 bytes (`x`) are used per iteration.
pub fn sum_x_aos(particles: &[ParticleAoS]) -> f32 {
    particles.iter().map(|p| p.x).sum()
}

/// Apply gravity to y-coordinates using AoS.
/// Even though only `y` changes, the entire struct is loaded per element.
pub fn apply_gravity_aos(particles: &mut [ParticleAoS], dt: f32) {
    particles.iter_mut().for_each(|p| p.y -= 9.81 * p.mass * dt);
}

// ── Structure of Arrays (SoA) ─────────────────────────────────────────────────

/// Cache-friendly layout: each field is a separate contiguous `Vec`.
/// Memory layout: `[x0,x1,...,xN]` then `[y0,y1,...,yN]` etc.
/// Iterating over `xs` accesses only the data you need.
#[derive(Debug, Default)]
pub struct ParticlesSoA {
    pub xs: Vec<f32>,
    pub ys: Vec<f32>,
    pub zs: Vec<f32>,
    pub masses: Vec<f32>,
}

impl ParticlesSoA {
    /// Build a SoA collection from an iterator of `(x, y, z, mass)` tuples.
    pub fn from_tuples(iter: impl IntoIterator<Item = (f32, f32, f32, f32)>) -> Self {
        let (xs, ys, zs, masses) = iter.into_iter().fold(
            (vec![], vec![], vec![], vec![]),
            |(mut xs, mut ys, mut zs, mut ms), (x, y, z, m)| {
                xs.push(x);
                ys.push(y);
                zs.push(z);
                ms.push(m);
                (xs, ys, zs, ms)
            },
        );
        Self { xs, ys, zs, masses }
    }

    /// Number of particles.
    pub fn len(&self) -> usize {
        self.xs.len()
    }

    /// Returns true when there are no particles.
    pub fn is_empty(&self) -> bool {
        self.xs.is_empty()
    }
}

/// Sum all x-coordinates using SoA.
/// Touches **only** `xs` — every byte in every cache line is useful.
pub fn sum_x_soa(soa: &ParticlesSoA) -> f32 {
    soa.xs.iter().sum()
}

/// Apply gravity to y-coordinates using SoA.
/// Only `ys` and `masses` are accessed — two contiguous passes, zero waste.
pub fn apply_gravity_soa(soa: &mut ParticlesSoA, dt: f32) {
    soa.ys
        .iter_mut()
        .zip(soa.masses.iter())
        .for_each(|(y, &m)| *y -= 9.81 * m * dt);
}

// ── Conversion helpers ────────────────────────────────────────────────────────

/// Convert AoS to SoA (useful when you collect from an API that returns AoS).
pub fn aos_to_soa(particles: &[ParticleAoS]) -> ParticlesSoA {
    ParticlesSoA {
        xs: particles.iter().map(|p| p.x).collect(),
        ys: particles.iter().map(|p| p.y).collect(),
        zs: particles.iter().map(|p| p.z).collect(),
        masses: particles.iter().map(|p| p.mass).collect(),
    }
}

/// Convert SoA back to AoS (e.g. to pass one particle to an external API).
pub fn soa_to_aos(soa: &ParticlesSoA) -> Vec<ParticleAoS> {
    soa.xs
        .iter()
        .zip(soa.ys.iter())
        .zip(soa.zs.iter())
        .zip(soa.masses.iter())
        .map(|(((x, y), z), m)| ParticleAoS {
            x: *x,
            y: *y,
            z: *z,
            mass: *m,
        })
        .collect()
}

// ── Tests ─────────────────────────────────────────────────────────────────────

#[cfg(test)]
mod tests {
    use super::*;

    fn sample_aos() -> Vec<ParticleAoS> {
        vec![
            ParticleAoS {
                x: 1.0,
                y: 2.0,
                z: 3.0,
                mass: 0.5,
            },
            ParticleAoS {
                x: 4.0,
                y: 5.0,
                z: 6.0,
                mass: 1.0,
            },
            ParticleAoS {
                x: 7.0,
                y: 8.0,
                z: 9.0,
                mass: 2.0,
            },
        ]
    }

    fn sample_soa() -> ParticlesSoA {
        ParticlesSoA::from_tuples([
            (1.0, 2.0, 3.0, 0.5),
            (4.0, 5.0, 6.0, 1.0),
            (7.0, 8.0, 9.0, 2.0),
        ])
    }

    #[test]
    fn test_sum_x_aos_and_soa_agree() {
        let aos = sample_aos();
        let soa = sample_soa();
        let expected = 1.0_f32 + 4.0 + 7.0;
        assert!((sum_x_aos(&aos) - expected).abs() < f32::EPSILON);
        assert!((sum_x_soa(&soa) - expected).abs() < f32::EPSILON);
    }

    #[test]
    fn test_sum_x_empty() {
        assert_eq!(sum_x_aos(&[]), 0.0_f32);
        assert_eq!(sum_x_soa(&ParticlesSoA::default()), 0.0_f32);
    }

    #[test]
    fn test_apply_gravity_aos() {
        let mut particles = sample_aos();
        // dt = 0 → y unchanged
        apply_gravity_aos(&mut particles, 0.0);
        assert!((particles[0].y - 2.0).abs() < f32::EPSILON);
        // dt = 1, mass = 0.5 → y -= 9.81 * 0.5 * 1 = 4.905
        apply_gravity_aos(&mut particles, 1.0);
        let expected_y0 = 2.0 - 9.81 * 0.5 * 1.0_f32;
        assert!((particles[0].y - expected_y0).abs() < 1e-5);
    }

    #[test]
    fn test_apply_gravity_soa() {
        let mut soa = sample_soa();
        apply_gravity_soa(&mut soa, 0.0);
        assert!((soa.ys[0] - 2.0).abs() < f32::EPSILON);
        apply_gravity_soa(&mut soa, 1.0);
        let expected_y0 = 2.0 - 9.81 * 0.5 * 1.0_f32;
        assert!((soa.ys[0] - expected_y0).abs() < 1e-5);
    }

    #[test]
    fn test_aos_to_soa_roundtrip() {
        let aos = sample_aos();
        let soa = aos_to_soa(&aos);
        let back = soa_to_aos(&soa);
        assert_eq!(aos, back);
    }

    #[test]
    fn test_soa_len_and_is_empty() {
        let soa = sample_soa();
        assert_eq!(soa.len(), 3);
        assert!(!soa.is_empty());
        assert!(ParticlesSoA::default().is_empty());
    }

    #[test]
    fn test_gravity_soa_and_aos_agree() {
        let mut aos = sample_aos();
        let mut soa = sample_soa();
        let dt = 0.016_f32;
        apply_gravity_aos(&mut aos, dt);
        apply_gravity_soa(&mut soa, dt);
        for (i, p) in aos.iter().enumerate() {
            assert!((p.y - soa.ys[i]).abs() < 1e-5, "y[{i}] mismatch");
        }
    }
}

(* OCaml: SoA vs AoS for cache efficiency *)

(* --- Array of Structures (AoS) --- *)
type particle = {
  x    : float;
  y    : float;
  z    : float;
  mass : float;
}

(* Sum just the x-coordinates: must traverse all fields even though we skip y/z/mass *)
let sum_x_aos particles =
  Array.fold_left (fun acc p -> acc +. p.x) 0.0 particles

(* --- Structure of Arrays (SoA) --- *)
type particles_soa = {
  xs    : float array;
  ys    : float array;
  zs    : float array;
  masses: float array;
}

(* Sum just x: touches ONLY xs — contiguous in memory *)
let sum_x_soa soa =
  Array.fold_left (+.) 0.0 soa.xs

(* Gravity update: needs x, y, mass — still more cache-friendly than AoS *)
let apply_gravity_soa soa dt =
  Array.iteri (fun i m ->
    soa.ys.(i) <- soa.ys.(i) -. 9.81 *. m *. dt
  ) soa.masses

let make_aos n =
  Array.init n (fun i ->
    { x = float_of_int i; y = float_of_int (i * 2);
      z = 0.0; mass = 1.0 })

let make_soa n =
  { xs     = Array.init n float_of_int;
    ys     = Array.init n (fun i -> float_of_int (i * 2));
    zs     = Array.make n 0.0;
    masses = Array.make n 1.0 }

let time_it label f =
  let t0 = Sys.time () in
  let result = f () in
  let t1 = Sys.time () in
  Printf.printf "%s: result=%.2f  time=%.6fs\n" label result (t1 -. t0)

let () =
  let n = 1_000_000 in
  let aos = make_aos n in
  let soa = make_soa n in

  time_it "AoS sum_x" (fun () -> sum_x_aos aos);
  time_it "SoA sum_x" (fun () -> sum_x_soa soa);

  (* OCaml note: records are heap-allocated objects with GC overhead;
     floats in float arrays are unboxed (OCaml optimises float arrays).
     So float array SoA in OCaml is actually already cache-friendly! *)
  Printf.printf "Note: OCaml float arrays are unboxed — SoA advantage is native here.\n"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    fn sample_aos() -> Vec<ParticleAoS> {
        vec![
            ParticleAoS {
                x: 1.0,
                y: 2.0,
                z: 3.0,
                mass: 0.5,
            },
            ParticleAoS {
                x: 4.0,
                y: 5.0,
                z: 6.0,
                mass: 1.0,
            },
            ParticleAoS {
                x: 7.0,
                y: 8.0,
                z: 9.0,
                mass: 2.0,
            },
        ]
    }

    fn sample_soa() -> ParticlesSoA {
        ParticlesSoA::from_tuples([
            (1.0, 2.0, 3.0, 0.5),
            (4.0, 5.0, 6.0, 1.0),
            (7.0, 8.0, 9.0, 2.0),
        ])
    }

    #[test]
    fn test_sum_x_aos_and_soa_agree() {
        let aos = sample_aos();
        let soa = sample_soa();
        let expected = 1.0_f32 + 4.0 + 7.0;
        assert!((sum_x_aos(&aos) - expected).abs() < f32::EPSILON);
        assert!((sum_x_soa(&soa) - expected).abs() < f32::EPSILON);
    }

    #[test]
    fn test_sum_x_empty() {
        assert_eq!(sum_x_aos(&[]), 0.0_f32);
        assert_eq!(sum_x_soa(&ParticlesSoA::default()), 0.0_f32);
    }

    #[test]
    fn test_apply_gravity_aos() {
        let mut particles = sample_aos();
        // dt = 0 → y unchanged
        apply_gravity_aos(&mut particles, 0.0);
        assert!((particles[0].y - 2.0).abs() < f32::EPSILON);
        // dt = 1, mass = 0.5 → y -= 9.81 * 0.5 * 1 = 4.905
        apply_gravity_aos(&mut particles, 1.0);
        let expected_y0 = 2.0 - 9.81 * 0.5 * 1.0_f32;
        assert!((particles[0].y - expected_y0).abs() < 1e-5);
    }

    #[test]
    fn test_apply_gravity_soa() {
        let mut soa = sample_soa();
        apply_gravity_soa(&mut soa, 0.0);
        assert!((soa.ys[0] - 2.0).abs() < f32::EPSILON);
        apply_gravity_soa(&mut soa, 1.0);
        let expected_y0 = 2.0 - 9.81 * 0.5 * 1.0_f32;
        assert!((soa.ys[0] - expected_y0).abs() < 1e-5);
    }

    #[test]
    fn test_aos_to_soa_roundtrip() {
        let aos = sample_aos();
        let soa = aos_to_soa(&aos);
        let back = soa_to_aos(&soa);
        assert_eq!(aos, back);
    }

    #[test]
    fn test_soa_len_and_is_empty() {
        let soa = sample_soa();
        assert_eq!(soa.len(), 3);
        assert!(!soa.is_empty());
        assert!(ParticlesSoA::default().is_empty());
    }

    #[test]
    fn test_gravity_soa_and_aos_agree() {
        let mut aos = sample_aos();
        let mut soa = sample_soa();
        let dt = 0.016_f32;
        apply_gravity_aos(&mut aos, dt);
        apply_gravity_soa(&mut soa, dt);
        for (i, p) in aos.iter().enumerate() {
            assert!((p.y - soa.ys[i]).abs() < 1e-5, "y[{i}] mismatch");
        }
    }
}

Deep Comparison

OCaml vs Rust: Struct of Arrays vs Array of Structs

Side-by-Side Code

OCaml (AoS — sum x-coordinates)

type particle = { x: float; y: float; z: float; mass: float }

let sum_x_aos particles =
  Array.fold_left (fun acc p -> acc +. p.x) 0.0 particles

OCaml (SoA — sum x-coordinates)

type particles_soa = { xs: float array; ys: float array; zs: float array; masses: float array }

let sum_x_soa soa =
  Array.fold_left (+.) 0.0 soa.xs

Rust (AoS — idiomatic)

#[derive(Clone, Debug, Default, PartialEq)]
pub struct ParticleAoS { pub x: f32, pub y: f32, pub z: f32, pub mass: f32 }

pub fn sum_x_aos(particles: &[ParticleAoS]) -> f32 {
    particles.iter().map(|p| p.x).sum()
}

Rust (SoA — idiomatic)

#[derive(Debug, Default)]
pub struct ParticlesSoA {
    pub xs: Vec<f32>, pub ys: Vec<f32>, pub zs: Vec<f32>, pub masses: Vec<f32>,
}

pub fn sum_x_soa(soa: &ParticlesSoA) -> f32 {
    soa.xs.iter().sum()
}

Rust (gravity update — SoA with zip)

pub fn apply_gravity_soa(soa: &mut ParticlesSoA, dt: f32) {
    soa.ys
        .iter_mut()
        .zip(soa.masses.iter())
        .for_each(|(y, &m)| *y -= 9.81 * m * dt);
}

Type Signatures

Concept	OCaml	Rust
AoS element	`type particle = { x: float; ... }`	`struct ParticleAoS { x: f32, ... }`
SoA container	`type particles_soa = { xs: float array; ... }`	`struct ParticlesSoA { xs: Vec<f32>, ... }`
Sum function	`val sum_x_soa : particles_soa -> float`	`fn sum_x_soa(soa: &ParticlesSoA) -> f32`
Mutation	`Array.iteri` + index	`.iter_mut().zip(...)`
Borrowing	N/A (GC)	`&[T]` for read, `&mut [T]` for write

Memory Layout

AoS: [x0|y0|z0|m0|x1|y1|z1|m1|x2|y2|z2|m2|...]
      ^^^^                                       ← only this used when summing x
      cache line brings y0,z0,m0 for free — wasted

SoA: [x0|x1|x2|...xN] [y0|y1|...yN] [z0|...] [m0|...]
     ^^^^^^^^^^^^^^^^^                          ← entire array used when summing x
      cache line is 100% useful

Key Insights

Cache line utilisation: AoS loads an entire struct (16 bytes here) per iteration but uses only 4 bytes (x); SoA accesses a tightly packed f32 array — 16 values per 64-byte cache line vs. 4 in AoS. This is a 4× difference in effective cache bandwidth for single-field workloads.

Iterator composition: Rust's .iter_mut().zip(other.iter()) lets the gravity update operate on two SoA columns simultaneously without allocating — the same pattern OCaml expresses with Array.iteri. Both compile to a simple sequential loop over contiguous memory.

Ownership encoding: Rust encodes the AoS→SoA boundary at the type level. A function taking &[ParticleAoS] signals AoS semantics; one taking &ParticlesSoA signals SoA semantics. OCaml uses structural types but relies on programmer discipline for the same distinction.

Mutable aliasing safety: apply_gravity_soa takes &mut ParticlesSoA and simultaneously borrows ys mutably and masses immutably via zip. Rust enforces at compile time that no aliasing occurs — OCaml's GC permits this but provides no static guarantee.

Layout control: Rust gives precise control over struct layout via #[repr(C)], padding, and alignment. OCaml floats are boxed by default (except in float-only records), so AoS in OCaml already suffers an additional indirection overhead that Rust avoids entirely.

When to Use Each Style

Use AoS when: entities are usually processed whole (e.g. serialising a single particle to JSON, passing one particle to a physics callback), or the collection is small enough to fit in L1 cache regardless of layout.

Use SoA when: hot loops process one or two fields across thousands of entities (physics simulations, particle systems, SIMD vectorisation). The layout change alone routinely yields 2–10× throughput improvements without algorithmic changes. SIMD intrinsics require SoA because they load a vector register from contiguous memory of a single type.

Exercises

Benchmark ParticleAoS vs ParticleSoA position update with 1 million particles

using criterion. Measure with and without compiler auto-vectorization (-C target-cpu=native).

Implement a ParticleSoAIter that yields (x, y, z, vx, vy, vz, mass) tuples by

zipping the seven slices, maintaining borrowing safety.

Extend ParticleSoA to support sorting particles by mass while keeping all arrays

synchronized (use an index array or a sort-and-permute approach).

Implement an "AoSoA" (Array of Struct of Arrays) layout using [f32; 8] lanes per

field group to combine SIMD width with cache locality.

Profile a real workload (ray-sphere intersection, N-body gravity) to identify which

fields are hot and restructure accordingly.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust