893-group-by-iter — Group By Iterator
Tutorial
The Problem
Run-length encoding, log analysis, and data aggregation all need to group consecutive equal elements or elements sharing a common key. SQL's GROUP BY aggregates all rows matching a key regardless of position. A consecutive group-by collapses adjacent equal elements — used in run-length encoding, detecting intervals of the same event type, and aggregating time-series data where consecutive observations belong to the same period. Haskell's Data.List.groupBy and OCaml's own group pattern both handle consecutive grouping. Rust's standard library has .chunk_by() (1.77+); for earlier versions or custom key logic, it must be implemented manually.
🎯 Learning Outcomes
(key, group) pairsHashMap for non-consecutive grouping (all elements sharing a key)Itertools::group_by (consecutive) and HashMap::entry (global)Code Example
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
let mut groups: Vec<Vec<T>> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else { return groups; };
let mut current = vec![first.clone()];
for item in iter {
if *item == *current[0] {
current.push(item.clone());
} else {
groups.push(current);
current = vec![item.clone()];
}
}
groups.push(current);
groups
}Key Differences
HashMap in Rust or Hashtbl in OCaml.chunk_by was added in 1.77; before that, it required the itertools crate or manual implementation; OCaml requires a library.Vec<Vec<T>> or Vec<(K, Vec<T>)>; OCaml produces 'a list list or association lists.OCaml Approach
OCaml implements group-by recursively: let rec group_by eq = function | [] -> [] | x :: rest -> let (same, others) = List.partition (eq x) rest in (x :: same) :: group_by eq others. This is O(n²) for the partition; the efficient consecutive version uses pattern matching on the head. For consecutive grouping: let rec group_consecutive = function | [] -> [] | [x] -> [[x]] | x :: y :: rest when x = y -> ... . Libraries like Base.List.group_by and Core.List.group provide this.
Full Source
//! 893: Group By Iter — grouping consecutive elements of a slice.
//!
//! The OCaml original hand-rolls each traversal with `List.fold_left` and
//! `List.rev`. In Rust the same operations collapse to iterator chains built
//! around [`Iterator::fold`] and [`Iterator::flat_map`]. No external crates
//! are used; everything relies on the standard library.
//!
//! Four operations are exposed, each mirroring a helper from the OCaml file:
//!
//! - [`group_consecutive`] — split into runs of equal elements.
//! - [`group_by_key`] — split into runs that share a derived key.
//! - [`run_length_encode`] / [`run_length_decode`] — classic RLE round-trip.
//! - [`dedup_consecutive`] — collapse each run to a single representative.
//!
//! All functions are generic over the element type and return owned
//! [`Vec`]-based structures so callers can further process the result with
//! any iterator combinator.
/// Split a slice into runs of consecutive equal elements.
///
/// Mirrors OCaml's `group_consecutive`. The input `[1;1;2;2;2;3;1;1]`
/// becomes `[[1;1]; [2;2;2]; [3]; [1;1]]`.
///
/// # Examples
///
/// ```
/// use example_893_group_by_iter::group_consecutive;
/// assert_eq!(
/// group_consecutive(&[1, 1, 2, 3, 3]),
/// vec![vec![1, 1], vec![2], vec![3, 3]],
/// );
/// ```
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
group_by_key(data, |x| x.clone())
.into_iter()
.map(|(_, group)| group)
.collect()
}
/// Split a slice into runs whose elements share the same derived key.
///
/// Mirrors OCaml's `group_by_key`. The key function is applied once per
/// element; consecutive elements producing equal keys land in the same
/// group, and each group is paired with its common key in the output.
///
/// # Examples
///
/// ```
/// use example_893_group_by_iter::group_by_key;
/// let groups = group_by_key(&[2, 4, 1, 3, 6], |x| x % 2);
/// assert_eq!(groups[0], (0, vec![2, 4]));
/// assert_eq!(groups[1], (1, vec![1, 3]));
/// assert_eq!(groups[2], (0, vec![6]));
/// ```
pub fn group_by_key<T, K, F>(data: &[T], key: F) -> Vec<(K, Vec<T>)>
where
T: Clone,
K: PartialEq,
F: Fn(&T) -> K,
{
let mut iter = data.iter();
let Some(first) = iter.next() else {
return Vec::new();
};
let init = (Vec::<(K, Vec<T>)>::new(), key(first), vec![first.clone()]);
let (mut groups, current_key, current) = iter.fold(init, |(mut groups, k, mut current), x| {
let new_key = key(x);
if new_key == k {
current.push(x.clone());
(groups, k, current)
} else {
groups.push((k, current));
(groups, new_key, vec![x.clone()])
}
});
groups.push((current_key, current));
groups
}
/// Run-length encode a slice as `(value, count)` pairs.
///
/// Mirrors OCaml's `run_length_encode`. Empty input yields an empty vector.
///
/// # Examples
///
/// ```
/// use example_893_group_by_iter::run_length_encode;
/// assert_eq!(
/// run_length_encode(&[1, 1, 1, 2, 2, 3]),
/// vec![(1, 3), (2, 2), (3, 1)],
/// );
/// ```
pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
group_consecutive(data)
.into_iter()
.map(|group| {
let count = group.len();
let value = group
.into_iter()
.next()
.expect("group_consecutive never returns empty groups");
(value, count)
})
.collect()
}
/// Inverse of [`run_length_encode`]: expand `(value, count)` pairs back into
/// a flat vector.
///
/// Mirrors OCaml's `run_length_decode`, implemented here with
/// [`Iterator::flat_map`] and [`std::iter::repeat_n`].
///
/// # Examples
///
/// ```
/// use example_893_group_by_iter::run_length_decode;
/// assert_eq!(
/// run_length_decode(&[(1, 3), (2, 2), (3, 1)]),
/// vec![1, 1, 1, 2, 2, 3],
/// );
/// ```
pub fn run_length_decode<T: Clone>(encoded: &[(T, usize)]) -> Vec<T> {
encoded
.iter()
.flat_map(|(value, count)| std::iter::repeat_n(value.clone(), *count))
.collect()
}
/// Collapse each run of consecutive equal elements to a single copy.
///
/// Mirrors OCaml's `dedup_consecutive`. Unlike `HashSet`-based deduplication,
/// this preserves order and only removes *adjacent* duplicates, so
/// `[1, 1, 2, 2, 1]` becomes `[1, 2, 1]`.
///
/// # Examples
///
/// ```
/// use example_893_group_by_iter::dedup_consecutive;
/// assert_eq!(dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]), vec![1, 2, 3, 1]);
/// ```
pub fn dedup_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<T> {
group_consecutive(data)
.into_iter()
.filter_map(|group| group.into_iter().next())
.collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn group_consecutive_matches_ocaml_example() {
let input = [1, 1, 2, 2, 2, 3, 1, 1];
let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn group_consecutive_empty_returns_empty() {
let result: Vec<Vec<i32>> = group_consecutive::<i32>(&[]);
assert!(result.is_empty());
}
#[test]
fn group_consecutive_single_element() {
assert_eq!(group_consecutive(&[1]), vec![vec![1]]);
}
#[test]
fn group_consecutive_all_same() {
assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
}
#[test]
fn group_consecutive_all_distinct() {
assert_eq!(
group_consecutive(&[1, 2, 3]),
vec![vec![1], vec![2], vec![3]]
);
}
#[test]
fn group_consecutive_works_on_strings() {
let input = ["a", "a", "b", "c", "c"];
let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn group_by_key_parity_matches_ocaml_example() {
let groups = group_by_key(&[2, 4, 1, 3, 6, 8], |x| x % 2);
assert_eq!(groups.len(), 3);
assert_eq!(groups[0], (0, vec![2, 4]));
assert_eq!(groups[1], (1, vec![1, 3]));
assert_eq!(groups[2], (0, vec![6, 8]));
}
#[test]
fn group_by_key_empty_returns_empty() {
let groups = group_by_key::<i32, bool, _>(&[], |x| *x > 0);
assert!(groups.is_empty());
}
#[test]
fn group_by_key_sign_changes() {
let input = [1, 2, -1, -2, 3];
let groups = group_by_key(&input, |x| *x > 0);
assert_eq!(
groups,
vec![(true, vec![1, 2]), (false, vec![-1, -2]), (true, vec![3])]
);
}
#[test]
fn run_length_encode_matches_ocaml_example() {
let input = [1, 1, 1, 2, 2, 3, 3, 3, 3];
let expected = vec![(1, 3), (2, 2), (3, 4)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn run_length_encode_empty_returns_empty() {
let result: Vec<(char, usize)> = run_length_encode::<char>(&[]);
assert!(result.is_empty());
}
#[test]
fn run_length_decode_matches_ocaml_example() {
let encoded = [(1, 3), (2, 2), (3, 4)];
let expected = vec![1, 1, 1, 2, 2, 3, 3, 3, 3];
assert_eq!(run_length_decode(&encoded), expected);
}
#[test]
fn run_length_round_trip() {
let input = vec!['a', 'a', 'b', 'c', 'c', 'c', 'a'];
assert_eq!(run_length_decode(&run_length_encode(&input)), input);
}
#[test]
fn run_length_decode_skips_zero_counts() {
let encoded = [(1, 0), (2, 2), (3, 0)];
assert_eq!(run_length_decode(&encoded), vec![2, 2]);
}
#[test]
fn dedup_consecutive_matches_ocaml_example() {
assert_eq!(
dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]),
vec![1, 2, 3, 1]
);
}
#[test]
fn dedup_consecutive_preserves_non_adjacent_duplicates() {
assert_eq!(dedup_consecutive(&[1, 2, 1, 2, 1]), vec![1, 2, 1, 2, 1]);
}
#[test]
fn dedup_consecutive_empty_returns_empty() {
let result: Vec<i32> = dedup_consecutive::<i32>(&[]);
assert!(result.is_empty());
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn group_consecutive_matches_ocaml_example() {
let input = [1, 1, 2, 2, 2, 3, 1, 1];
let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn group_consecutive_empty_returns_empty() {
let result: Vec<Vec<i32>> = group_consecutive::<i32>(&[]);
assert!(result.is_empty());
}
#[test]
fn group_consecutive_single_element() {
assert_eq!(group_consecutive(&[1]), vec![vec![1]]);
}
#[test]
fn group_consecutive_all_same() {
assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
}
#[test]
fn group_consecutive_all_distinct() {
assert_eq!(
group_consecutive(&[1, 2, 3]),
vec![vec![1], vec![2], vec![3]]
);
}
#[test]
fn group_consecutive_works_on_strings() {
let input = ["a", "a", "b", "c", "c"];
let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn group_by_key_parity_matches_ocaml_example() {
let groups = group_by_key(&[2, 4, 1, 3, 6, 8], |x| x % 2);
assert_eq!(groups.len(), 3);
assert_eq!(groups[0], (0, vec![2, 4]));
assert_eq!(groups[1], (1, vec![1, 3]));
assert_eq!(groups[2], (0, vec![6, 8]));
}
#[test]
fn group_by_key_empty_returns_empty() {
let groups = group_by_key::<i32, bool, _>(&[], |x| *x > 0);
assert!(groups.is_empty());
}
#[test]
fn group_by_key_sign_changes() {
let input = [1, 2, -1, -2, 3];
let groups = group_by_key(&input, |x| *x > 0);
assert_eq!(
groups,
vec![(true, vec![1, 2]), (false, vec![-1, -2]), (true, vec![3])]
);
}
#[test]
fn run_length_encode_matches_ocaml_example() {
let input = [1, 1, 1, 2, 2, 3, 3, 3, 3];
let expected = vec![(1, 3), (2, 2), (3, 4)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn run_length_encode_empty_returns_empty() {
let result: Vec<(char, usize)> = run_length_encode::<char>(&[]);
assert!(result.is_empty());
}
#[test]
fn run_length_decode_matches_ocaml_example() {
let encoded = [(1, 3), (2, 2), (3, 4)];
let expected = vec![1, 1, 1, 2, 2, 3, 3, 3, 3];
assert_eq!(run_length_decode(&encoded), expected);
}
#[test]
fn run_length_round_trip() {
let input = vec!['a', 'a', 'b', 'c', 'c', 'c', 'a'];
assert_eq!(run_length_decode(&run_length_encode(&input)), input);
}
#[test]
fn run_length_decode_skips_zero_counts() {
let encoded = [(1, 0), (2, 2), (3, 0)];
assert_eq!(run_length_decode(&encoded), vec![2, 2]);
}
#[test]
fn dedup_consecutive_matches_ocaml_example() {
assert_eq!(
dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]),
vec![1, 2, 3, 1]
);
}
#[test]
fn dedup_consecutive_preserves_non_adjacent_duplicates() {
assert_eq!(dedup_consecutive(&[1, 2, 1, 2, 1]), vec![1, 2, 1, 2, 1]);
}
#[test]
fn dedup_consecutive_empty_returns_empty() {
let result: Vec<i32> = dedup_consecutive::<i32>(&[]);
assert!(result.is_empty());
}
}
Deep Comparison
OCaml vs Rust: Group By Iterator
Side-by-Side Code
OCaml
(* Group consecutive equal elements *)
let group_consecutive lst =
match lst with
| [] -> []
| first :: rest ->
let groups, current = List.fold_left (fun (groups, current) x ->
match current with
| [] -> (groups, [x])
| hd :: _ when hd = x -> (groups, x :: current)
| _ -> (List.rev current :: groups, [x])
) ([], [first]) rest in
List.rev (List.rev current :: groups)
(* Group by key function *)
let group_by_key key lst =
match lst with
| [] -> []
| first :: rest ->
let groups, current_key, current = List.fold_left (fun (groups, k, current) x ->
let new_key = key x in
if new_key = k then (groups, k, x :: current)
else (List.rev current :: groups, new_key, [x])
) ([], key first, [first]) rest in
List.rev (List.rev current :: groups)
Rust (idiomatic)
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
let mut groups: Vec<Vec<T>> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else { return groups; };
let mut current = vec![first.clone()];
for item in iter {
if *item == *current[0] {
current.push(item.clone());
} else {
groups.push(current);
current = vec![item.clone()];
}
}
groups.push(current);
groups
}
Rust (functional/recursive — run-length encoding via iterator composition)
pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
group_consecutive(data)
.into_iter()
.map(|g| {
let len = g.len();
(g.into_iter().next().unwrap(), len)
})
.collect()
}
Type Signatures
| Concept | OCaml | Rust |
|---|---|---|
| Group consecutive | 'a list -> 'a list list | fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> |
| Group by key | ('a -> 'b) -> 'a list -> 'a list list | fn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)> |
| Run-length encode | 'a list -> ('a * int) list | fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> |
| List type | 'a list | &[T] (borrowed slice) |
| Grouped result | 'a list list | Vec<Vec<T>> |
Key Insights
List.fold_left with an immutable accumulator tuple (groups, current), building the result purely functionally. Rust naturally reaches for mutable Vec accumulators inside a for loop — equally idiomatic and avoids repeated allocation overhead.current with hd :: _ when hd = x, which is elegant list deconstruction. Rust compares via current[0] on a Vec, trading structural elegance for direct, bounds-clear indexing.Fn(&T) -> K in Rust, 'a -> 'b in OCaml). Rust's trait bounds make the constraint explicit — K: PartialEq — while OCaml infers equality capability automatically via polymorphic =.Clone:** Rust requires T: Clone to copy values into groups from a borrowed slice. OCaml's garbage collector and persistent lists make this cost invisible — values are always shared. The Clone bound is the honest cost of building owned sub-collections from a borrowed input.run_length_encode naturally composes group_consecutive with .map().collect(), mirroring functional pipeline style. OCaml achieves the same with List.map over the grouped result — both idioms express the same dataflow clearly.When to Use Each Style
Use idiomatic Rust (mutable accumulator) when: performance matters and you want to avoid repeated intermediate allocations; the grouping logic is straightforward and clarity trumps minimalism.
Use iterator composition when: you want to build higher-level operations from simpler ones (e.g., run_length_encode built on top of group_consecutive), keeping each function small, pure, and independently testable.
Exercises
run_length_encode<T: Eq + Clone>(data: &[T]) -> Vec<(T, usize)> using group_consecutive.group_by_all<T: Clone, K: Eq + Hash>(data: &[T], key: impl Fn(&T) -> K) -> HashMap<K, Vec<T>> for non-consecutive grouping.group_by_ranges(data: &[i32], range_size: i32) -> Vec<(i32, Vec<i32>)> that groups numbers into buckets of size range_size.