729: Avoid Allocations
Difficulty: 3 Level: Advanced Keep hot paths allocation-free by reusing buffers, using iterators, and choosing stack-friendly types.The Problem This Solves
Every heap allocation has overhead: the allocator must find a free block, maintain metadata, potentially zero memory, and eventually free it. For code that runs millions of times โ a web server handling requests, a game loop processing entities, a parser tokenising input โ repeated small allocations add measurable latency and unpredictable GC pauses (in GC languages) or fragmentation (in manual-memory languages). Rust gives you precise control: you can format a string into a pre-allocated buffer, process a sequence through iterator chains without creating intermediate `Vec`s, reuse a `Vec` by clearing it instead of dropping it, and choose stack-allocated types (`[u8; 64]`, `SmallVec`, `ArrayVec`) for small collections that never need to grow large. None of this requires `unsafe` โ it's disciplined use of Rust's standard APIs.The Intuition
The key question for any hot path: "Does this expression allocate?" A few patterns that always allocate: `String::new()`, `to_string()`, `format!()`, `Vec::new()`, `.collect::<Vec<_>>()`, `.to_owned()`, `Box::new()`, `.clone()` on non-Copy types. A few patterns that never allocate: iterator chains (`.map()`, `.filter()`, `.sum()`), `&str` and `&[u8]` borrows, `write!` into an existing `String`, `.clear()` on a `Vec` or `String`. The goal is not to eliminate all allocation โ that's premature optimisation. The goal is to identify the hot path (profiler-measured), understand which allocations are inside it, and eliminate the unnecessary ones by reusing, borrowing, or using stack-local types.How It Works in Rust
// โโ Technique 1: Write into a pre-allocated buffer โโโโโโโโโโโโโโโโโโโโโโโโ
fn format_record_into(buf: &mut String, name: &str, score: u32) {
buf.clear(); // resets content, keeps heap capacity โ no allocation
buf.push_str(name);
buf.push(':');
// Integer without temporary String:
let mut tmp = [0u8; 20];
buf.push_str(u32_to_str(score, &mut tmp));
}
// Caller allocates once:
let mut buf = String::with_capacity(64);
for record in records {
format_record_into(&mut buf, record.name, record.score);
send(&buf); // borrow โ no clone
}
// โโ Technique 2: Iterator chains โ zero intermediate collections โโโโโโโโโโ
fn sum_squares(n: u64) -> u64 {
(0..n).map(|i| i * i).sum() // no Vec โ lazy computation, one pass
}
fn hot_filter_sum(data: &[i32]) -> i32 {
data.iter()
.filter(|&&x| x > 0)
.map(|&x| x * 2)
.sum() // single pass, zero intermediate allocations
}
// โโ Technique 3: Reuse a Vec by clearing, not dropping โโโโโโโโโโโโโโโโโโโ
struct Pipeline { scratch: Vec<i32> }
impl Pipeline {
fn new() -> Self { Pipeline { scratch: Vec::with_capacity(1024) } }
fn process(&mut self, input: &[i32]) -> &[i32] {
self.scratch.clear(); // O(1) length reset, capacity preserved
for &x in input {
if x.rem_euclid(2) == 0 { self.scratch.push(x * 3); }
}
&self.scratch
}
}
// โโ Technique 4: Borrow instead of clone โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fn print_info(name: &str) { println!("{}", name); } // &str, not String
Measure with `cargo flamegraph`, `heaptrack`, or `valgrind --tool=massif` before optimising. Allocation costs are workload-dependent โ what's bottleneck in one context is noise in another.
What This Unlocks
- Web server request handling โ pre-allocate response buffers per connection, reuse across requests; eliminates allocation on every HTTP response header write.
- Embedded and no-std โ stack-only computation is mandatory when there is no heap; iterator chains and stack buffers are the only option.
- Latency-sensitive systems โ trading systems, audio processing, and real-time games require predictable latency; heap allocation introduces jitter from the allocator.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Buffer reuse | `Buffer.clear buf` | `buf.clear()` on `Vec<u8>` or `String` |
| Iterator without intermediate | `Seq` (lazy) | `Iterator` chains โ lazy, zero-allocation |
| Stack buffer for integer | `string_of_int n` (allocates) | Manual `[u8; 20]` + write-in-place |
| Clone vs borrow | Pattern matching usually copies | `&str` / `&[T]` borrows โ explicit no-copy |
| Allocation profiling | `spacetime` profiler | `heaptrack`, `cargo-instruments`, `valgrind massif` |
/// 729: Avoid Allocations โ Hot path techniques in std-only Rust
// โโ Technique 1: Stack-allocated scratch buffer โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
/// Write a formatted record into a pre-allocated `String` buffer.
/// Caller owns the buffer and can reuse it across many calls.
fn format_record_into(buf: &mut String, name: &str, score: u32) {
buf.clear(); // reset without freeing heap memory
buf.push_str(name);
buf.push(':');
// itoa-style: write integer without allocating a temporary String
let mut tmp = [0u8; 20];
let digits = u32_to_str(score, &mut tmp);
buf.push_str(digits);
}
/// Format u32 into a stack buffer; return the filled slice.
fn u32_to_str(mut n: u32, buf: &mut [u8; 20]) -> &str {
if n == 0 {
buf[19] = b'0';
return std::str::from_utf8(&buf[19..]).unwrap();
}
let mut pos = 20usize;
while n > 0 {
pos -= 1;
buf[pos] = b'0' + (n % 10) as u8;
n /= 10;
}
std::str::from_utf8(&buf[pos..]).unwrap()
}
// โโ Technique 2: Iterator chains โ zero intermediate allocations โโโโโโโโโโโโโโ
fn sum_squares(n: u64) -> u64 {
(0..n).map(|i| i * i).sum() // no Vec created; purely lazy
}
fn hot_filter_sum(data: &[i32]) -> i32 {
data.iter()
.filter(|&&x| x > 0)
.map(|&x| x * 2)
.sum() // single pass, zero allocs
}
// โโ Technique 3: Reuse a Vec by clearing, not dropping โโโโโโโโโโโโโโโโโโโโโโโ
struct Pipeline {
scratch: Vec<i32>,
}
impl Pipeline {
fn new() -> Self {
Pipeline { scratch: Vec::with_capacity(1024) }
}
fn process(&mut self, input: &[i32]) -> &[i32] {
self.scratch.clear(); // keeps allocated capacity
for &x in input {
if x.rem_euclid(2) == 0 {
self.scratch.push(x * 3);
}
}
&self.scratch
}
}
// โโ Technique 4: Fixed-size stack array as scratch โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fn count_words_no_alloc(s: &str) -> usize {
// Count words without allocating โ just scan bytes
let bytes = s.as_bytes();
let mut in_word = false;
let mut count = 0usize;
for &b in bytes {
match (in_word, b == b' ' || b == b'\t' || b == b'\n') {
(false, false) => { in_word = true; count += 1; }
(true, true) => { in_word = false; }
_ => {}
}
}
count
}
fn main() {
// Buffer reuse
let mut buf = String::with_capacity(64);
for (name, score) in [("Alice", 42u32), ("Bob", 1337), ("Carol", 0)] {
format_record_into(&mut buf, name, score);
println!("{}", buf);
}
// Iterator chains
println!("Sum of squares 0..10 = {}", sum_squares(10));
let data = [-3i32, 1, 4, -1, 5, 9, -2, 6];
println!("Filter-sum = {}", hot_filter_sum(&data));
// Vec reuse
let mut pipeline = Pipeline::new();
let r = pipeline.process(&[1, 2, 3, 4, 5, 6]);
println!("Pipeline: {:?}", r);
let r2 = pipeline.process(&[10, 11, 12]);
println!("Pipeline reuse: {:?}", r2);
// No-alloc word count
println!("Words: {}", count_words_no_alloc("hello world foo bar"));
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_format_record_reuse() {
let mut buf = String::with_capacity(32);
format_record_into(&mut buf, "Alice", 99);
assert_eq!(buf, "Alice:99");
format_record_into(&mut buf, "Bob", 0);
assert_eq!(buf, "Bob:0");
}
#[test]
fn test_sum_squares() {
assert_eq!(sum_squares(0), 0);
assert_eq!(sum_squares(4), 0 + 1 + 4 + 9); // 14
}
#[test]
fn test_hot_filter_sum() {
assert_eq!(hot_filter_sum(&[1, -2, 3, -4]), (1 + 3) * 2);
assert_eq!(hot_filter_sum(&[]), 0);
}
#[test]
fn test_pipeline_reuse() {
let mut p = Pipeline::new();
let r1 = p.process(&[2, 3, 4]);
assert_eq!(r1, &[6, 12]); // evens * 3
let r2 = p.process(&[10]);
assert_eq!(r2, &[30]);
}
#[test]
fn test_word_count() {
assert_eq!(count_words_no_alloc(""), 0);
assert_eq!(count_words_no_alloc("hello"), 1);
assert_eq!(count_words_no_alloc("hello world"), 2);
assert_eq!(count_words_no_alloc(" a b c "), 3);
}
}
(* 729: Avoid Allocations โ OCaml perspective
OCaml's GC handles heap allocation, but we can still minimize allocation
pressure by reusing buffers and using sequences (lazy lists). *)
(* Reusable buffer pattern โ avoids allocating a new string each call *)
let buf = Buffer.create 64
let format_record_into_buf name score =
Buffer.clear buf;
Buffer.add_string buf name;
Buffer.add_char buf ':';
Buffer.add_string buf (string_of_int score);
Buffer.contents buf (* still allocates the result string, but buf is reused *)
(* Sequence (lazy) โ no intermediate list allocation *)
let sum_squares_seq n =
Seq.init n (fun i -> i * i)
|> Seq.fold_left (+) 0
(* Array (stack-ish in OCaml sense โ one alloc, reused) *)
let process_with_fixed_buf data =
let scratch = Array.make 256 0 in
List.iteri (fun i x ->
if i < 256 then scratch.(i) <- x * 2
) data;
Array.sub scratch 0 (min (List.length data) 256)
(* String splitting without extra allocation โ find positions *)
let count_commas s =
String.fold_left (fun acc c -> if c = ',' then acc + 1 else acc) 0 s
let () =
Printf.printf "Record: %s\n" (format_record_into_buf "Alice" 42);
Printf.printf "Sum of squares 0..9: %d\n" (sum_squares_seq 10);
Printf.printf "Commas in 'a,b,c,d': %d\n" (count_commas "a,b,c,d");
let result = process_with_fixed_buf [1;2;3;4;5] in
Printf.printf "Processed: [%s]\n"
(Array.to_list result |> List.map string_of_int |> String.concat ";")