ExamplesBy LevelBy TopicLearning Paths
893 Intermediate

893-group-by-iter — Group By Iterator

Functional Programming

Tutorial

The Problem

Run-length encoding, log analysis, and data aggregation all need to group consecutive equal elements or elements sharing a common key. SQL's GROUP BY aggregates all rows matching a key regardless of position. A consecutive group-by collapses adjacent equal elements — used in run-length encoding, detecting intervals of the same event type, and aggregating time-series data where consecutive observations belong to the same period. Haskell's Data.List.groupBy and OCaml's own group pattern both handle consecutive grouping. Rust's standard library has .chunk_by() (1.77+); for earlier versions or custom key logic, it must be implemented manually.

🎯 Learning Outcomes

  • • Implement group-consecutive using a peekable iterator pattern
  • • Implement group-by-key that produces (key, group) pairs
  • • Use HashMap for non-consecutive grouping (all elements sharing a key)
  • • Compare consecutive grouping (run-length style) with global grouping (SQL style)
  • • Recognize the difference between Itertools::group_by (consecutive) and HashMap::entry (global)
  • Code Example

    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        let mut groups: Vec<Vec<T>> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else { return groups; };
        let mut current = vec![first.clone()];
    
        for item in iter {
            if *item == *current[0] {
                current.push(item.clone());
            } else {
                groups.push(current);
                current = vec![item.clone()];
            }
        }
        groups.push(current);
        groups
    }

    Key Differences

  • Consecutive vs global: Both languages have consecutive grouping (run-length style); global grouping requires a HashMap in Rust or Hashtbl in OCaml.
  • Standard library gap: Rust's chunk_by was added in 1.77; before that, it required the itertools crate or manual implementation; OCaml requires a library.
  • Output structure: Rust produces Vec<Vec<T>> or Vec<(K, Vec<T>)>; OCaml produces 'a list list or association lists.
  • Peekable vs recursive: Rust uses a peekable loop for efficiency; OCaml uses recursive pattern matching.
  • OCaml Approach

    OCaml implements group-by recursively: let rec group_by eq = function | [] -> [] | x :: rest -> let (same, others) = List.partition (eq x) rest in (x :: same) :: group_by eq others. This is O(n²) for the partition; the efficient consecutive version uses pattern matching on the head. For consecutive grouping: let rec group_consecutive = function | [] -> [] | [x] -> [[x]] | x :: y :: rest when x = y -> ... . Libraries like Base.List.group_by and Core.List.group provide this.

    Full Source

    //! 893: Group By Iter — grouping consecutive elements of a slice.
    //!
    //! The OCaml original hand-rolls each traversal with `List.fold_left` and
    //! `List.rev`. In Rust the same operations collapse to iterator chains built
    //! around [`Iterator::fold`] and [`Iterator::flat_map`]. No external crates
    //! are used; everything relies on the standard library.
    //!
    //! Four operations are exposed, each mirroring a helper from the OCaml file:
    //!
    //! - [`group_consecutive`] — split into runs of equal elements.
    //! - [`group_by_key`] — split into runs that share a derived key.
    //! - [`run_length_encode`] / [`run_length_decode`] — classic RLE round-trip.
    //! - [`dedup_consecutive`] — collapse each run to a single representative.
    //!
    //! All functions are generic over the element type and return owned
    //! [`Vec`]-based structures so callers can further process the result with
    //! any iterator combinator.
    
    /// Split a slice into runs of consecutive equal elements.
    ///
    /// Mirrors OCaml's `group_consecutive`. The input `[1;1;2;2;2;3;1;1]`
    /// becomes `[[1;1]; [2;2;2]; [3]; [1;1]]`.
    ///
    /// # Examples
    ///
    /// ```
    /// use example_893_group_by_iter::group_consecutive;
    /// assert_eq!(
    ///     group_consecutive(&[1, 1, 2, 3, 3]),
    ///     vec![vec![1, 1], vec![2], vec![3, 3]],
    /// );
    /// ```
    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        group_by_key(data, |x| x.clone())
            .into_iter()
            .map(|(_, group)| group)
            .collect()
    }
    
    /// Split a slice into runs whose elements share the same derived key.
    ///
    /// Mirrors OCaml's `group_by_key`. The key function is applied once per
    /// element; consecutive elements producing equal keys land in the same
    /// group, and each group is paired with its common key in the output.
    ///
    /// # Examples
    ///
    /// ```
    /// use example_893_group_by_iter::group_by_key;
    /// let groups = group_by_key(&[2, 4, 1, 3, 6], |x| x % 2);
    /// assert_eq!(groups[0], (0, vec![2, 4]));
    /// assert_eq!(groups[1], (1, vec![1, 3]));
    /// assert_eq!(groups[2], (0, vec![6]));
    /// ```
    pub fn group_by_key<T, K, F>(data: &[T], key: F) -> Vec<(K, Vec<T>)>
    where
        T: Clone,
        K: PartialEq,
        F: Fn(&T) -> K,
    {
        let mut iter = data.iter();
        let Some(first) = iter.next() else {
            return Vec::new();
        };
    
        let init = (Vec::<(K, Vec<T>)>::new(), key(first), vec![first.clone()]);
        let (mut groups, current_key, current) = iter.fold(init, |(mut groups, k, mut current), x| {
            let new_key = key(x);
            if new_key == k {
                current.push(x.clone());
                (groups, k, current)
            } else {
                groups.push((k, current));
                (groups, new_key, vec![x.clone()])
            }
        });
        groups.push((current_key, current));
        groups
    }
    
    /// Run-length encode a slice as `(value, count)` pairs.
    ///
    /// Mirrors OCaml's `run_length_encode`. Empty input yields an empty vector.
    ///
    /// # Examples
    ///
    /// ```
    /// use example_893_group_by_iter::run_length_encode;
    /// assert_eq!(
    ///     run_length_encode(&[1, 1, 1, 2, 2, 3]),
    ///     vec![(1, 3), (2, 2), (3, 1)],
    /// );
    /// ```
    pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
        group_consecutive(data)
            .into_iter()
            .map(|group| {
                let count = group.len();
                let value = group
                    .into_iter()
                    .next()
                    .expect("group_consecutive never returns empty groups");
                (value, count)
            })
            .collect()
    }
    
    /// Inverse of [`run_length_encode`]: expand `(value, count)` pairs back into
    /// a flat vector.
    ///
    /// Mirrors OCaml's `run_length_decode`, implemented here with
    /// [`Iterator::flat_map`] and [`std::iter::repeat_n`].
    ///
    /// # Examples
    ///
    /// ```
    /// use example_893_group_by_iter::run_length_decode;
    /// assert_eq!(
    ///     run_length_decode(&[(1, 3), (2, 2), (3, 1)]),
    ///     vec![1, 1, 1, 2, 2, 3],
    /// );
    /// ```
    pub fn run_length_decode<T: Clone>(encoded: &[(T, usize)]) -> Vec<T> {
        encoded
            .iter()
            .flat_map(|(value, count)| std::iter::repeat_n(value.clone(), *count))
            .collect()
    }
    
    /// Collapse each run of consecutive equal elements to a single copy.
    ///
    /// Mirrors OCaml's `dedup_consecutive`. Unlike `HashSet`-based deduplication,
    /// this preserves order and only removes *adjacent* duplicates, so
    /// `[1, 1, 2, 2, 1]` becomes `[1, 2, 1]`.
    ///
    /// # Examples
    ///
    /// ```
    /// use example_893_group_by_iter::dedup_consecutive;
    /// assert_eq!(dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]), vec![1, 2, 3, 1]);
    /// ```
    pub fn dedup_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<T> {
        group_consecutive(data)
            .into_iter()
            .filter_map(|group| group.into_iter().next())
            .collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn group_consecutive_matches_ocaml_example() {
            let input = [1, 1, 2, 2, 2, 3, 1, 1];
            let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn group_consecutive_empty_returns_empty() {
            let result: Vec<Vec<i32>> = group_consecutive::<i32>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn group_consecutive_single_element() {
            assert_eq!(group_consecutive(&[1]), vec![vec![1]]);
        }
    
        #[test]
        fn group_consecutive_all_same() {
            assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
        }
    
        #[test]
        fn group_consecutive_all_distinct() {
            assert_eq!(
                group_consecutive(&[1, 2, 3]),
                vec![vec![1], vec![2], vec![3]]
            );
        }
    
        #[test]
        fn group_consecutive_works_on_strings() {
            let input = ["a", "a", "b", "c", "c"];
            let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn group_by_key_parity_matches_ocaml_example() {
            let groups = group_by_key(&[2, 4, 1, 3, 6, 8], |x| x % 2);
            assert_eq!(groups.len(), 3);
            assert_eq!(groups[0], (0, vec![2, 4]));
            assert_eq!(groups[1], (1, vec![1, 3]));
            assert_eq!(groups[2], (0, vec![6, 8]));
        }
    
        #[test]
        fn group_by_key_empty_returns_empty() {
            let groups = group_by_key::<i32, bool, _>(&[], |x| *x > 0);
            assert!(groups.is_empty());
        }
    
        #[test]
        fn group_by_key_sign_changes() {
            let input = [1, 2, -1, -2, 3];
            let groups = group_by_key(&input, |x| *x > 0);
            assert_eq!(
                groups,
                vec![(true, vec![1, 2]), (false, vec![-1, -2]), (true, vec![3])]
            );
        }
    
        #[test]
        fn run_length_encode_matches_ocaml_example() {
            let input = [1, 1, 1, 2, 2, 3, 3, 3, 3];
            let expected = vec![(1, 3), (2, 2), (3, 4)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn run_length_encode_empty_returns_empty() {
            let result: Vec<(char, usize)> = run_length_encode::<char>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn run_length_decode_matches_ocaml_example() {
            let encoded = [(1, 3), (2, 2), (3, 4)];
            let expected = vec![1, 1, 1, 2, 2, 3, 3, 3, 3];
            assert_eq!(run_length_decode(&encoded), expected);
        }
    
        #[test]
        fn run_length_round_trip() {
            let input = vec!['a', 'a', 'b', 'c', 'c', 'c', 'a'];
            assert_eq!(run_length_decode(&run_length_encode(&input)), input);
        }
    
        #[test]
        fn run_length_decode_skips_zero_counts() {
            let encoded = [(1, 0), (2, 2), (3, 0)];
            assert_eq!(run_length_decode(&encoded), vec![2, 2]);
        }
    
        #[test]
        fn dedup_consecutive_matches_ocaml_example() {
            assert_eq!(
                dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]),
                vec![1, 2, 3, 1]
            );
        }
    
        #[test]
        fn dedup_consecutive_preserves_non_adjacent_duplicates() {
            assert_eq!(dedup_consecutive(&[1, 2, 1, 2, 1]), vec![1, 2, 1, 2, 1]);
        }
    
        #[test]
        fn dedup_consecutive_empty_returns_empty() {
            let result: Vec<i32> = dedup_consecutive::<i32>(&[]);
            assert!(result.is_empty());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn group_consecutive_matches_ocaml_example() {
            let input = [1, 1, 2, 2, 2, 3, 1, 1];
            let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn group_consecutive_empty_returns_empty() {
            let result: Vec<Vec<i32>> = group_consecutive::<i32>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn group_consecutive_single_element() {
            assert_eq!(group_consecutive(&[1]), vec![vec![1]]);
        }
    
        #[test]
        fn group_consecutive_all_same() {
            assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
        }
    
        #[test]
        fn group_consecutive_all_distinct() {
            assert_eq!(
                group_consecutive(&[1, 2, 3]),
                vec![vec![1], vec![2], vec![3]]
            );
        }
    
        #[test]
        fn group_consecutive_works_on_strings() {
            let input = ["a", "a", "b", "c", "c"];
            let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn group_by_key_parity_matches_ocaml_example() {
            let groups = group_by_key(&[2, 4, 1, 3, 6, 8], |x| x % 2);
            assert_eq!(groups.len(), 3);
            assert_eq!(groups[0], (0, vec![2, 4]));
            assert_eq!(groups[1], (1, vec![1, 3]));
            assert_eq!(groups[2], (0, vec![6, 8]));
        }
    
        #[test]
        fn group_by_key_empty_returns_empty() {
            let groups = group_by_key::<i32, bool, _>(&[], |x| *x > 0);
            assert!(groups.is_empty());
        }
    
        #[test]
        fn group_by_key_sign_changes() {
            let input = [1, 2, -1, -2, 3];
            let groups = group_by_key(&input, |x| *x > 0);
            assert_eq!(
                groups,
                vec![(true, vec![1, 2]), (false, vec![-1, -2]), (true, vec![3])]
            );
        }
    
        #[test]
        fn run_length_encode_matches_ocaml_example() {
            let input = [1, 1, 1, 2, 2, 3, 3, 3, 3];
            let expected = vec![(1, 3), (2, 2), (3, 4)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn run_length_encode_empty_returns_empty() {
            let result: Vec<(char, usize)> = run_length_encode::<char>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn run_length_decode_matches_ocaml_example() {
            let encoded = [(1, 3), (2, 2), (3, 4)];
            let expected = vec![1, 1, 1, 2, 2, 3, 3, 3, 3];
            assert_eq!(run_length_decode(&encoded), expected);
        }
    
        #[test]
        fn run_length_round_trip() {
            let input = vec!['a', 'a', 'b', 'c', 'c', 'c', 'a'];
            assert_eq!(run_length_decode(&run_length_encode(&input)), input);
        }
    
        #[test]
        fn run_length_decode_skips_zero_counts() {
            let encoded = [(1, 0), (2, 2), (3, 0)];
            assert_eq!(run_length_decode(&encoded), vec![2, 2]);
        }
    
        #[test]
        fn dedup_consecutive_matches_ocaml_example() {
            assert_eq!(
                dedup_consecutive(&[1, 1, 2, 2, 3, 3, 1, 1]),
                vec![1, 2, 3, 1]
            );
        }
    
        #[test]
        fn dedup_consecutive_preserves_non_adjacent_duplicates() {
            assert_eq!(dedup_consecutive(&[1, 2, 1, 2, 1]), vec![1, 2, 1, 2, 1]);
        }
    
        #[test]
        fn dedup_consecutive_empty_returns_empty() {
            let result: Vec<i32> = dedup_consecutive::<i32>(&[]);
            assert!(result.is_empty());
        }
    }

    Deep Comparison

    OCaml vs Rust: Group By Iterator

    Side-by-Side Code

    OCaml

    (* Group consecutive equal elements *)
    let group_consecutive lst =
      match lst with
      | [] -> []
      | first :: rest ->
        let groups, current = List.fold_left (fun (groups, current) x ->
          match current with
          | [] -> (groups, [x])
          | hd :: _ when hd = x -> (groups, x :: current)
          | _ -> (List.rev current :: groups, [x])
        ) ([], [first]) rest in
        List.rev (List.rev current :: groups)
    
    (* Group by key function *)
    let group_by_key key lst =
      match lst with
      | [] -> []
      | first :: rest ->
        let groups, current_key, current = List.fold_left (fun (groups, k, current) x ->
          let new_key = key x in
          if new_key = k then (groups, k, x :: current)
          else (List.rev current :: groups, new_key, [x])
        ) ([], key first, [first]) rest in
        List.rev (List.rev current :: groups)
    

    Rust (idiomatic)

    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        let mut groups: Vec<Vec<T>> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else { return groups; };
        let mut current = vec![first.clone()];
    
        for item in iter {
            if *item == *current[0] {
                current.push(item.clone());
            } else {
                groups.push(current);
                current = vec![item.clone()];
            }
        }
        groups.push(current);
        groups
    }
    

    Rust (functional/recursive — run-length encoding via iterator composition)

    pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
        group_consecutive(data)
            .into_iter()
            .map(|g| {
                let len = g.len();
                (g.into_iter().next().unwrap(), len)
            })
            .collect()
    }
    

    Type Signatures

    ConceptOCamlRust
    Group consecutive'a list -> 'a list listfn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>>
    Group by key('a -> 'b) -> 'a list -> 'a list listfn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)>
    Run-length encode'a list -> ('a * int) listfn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)>
    List type'a list&[T] (borrowed slice)
    Grouped result'a list listVec<Vec<T>>

    Key Insights

  • Fold vs imperative accumulation: OCaml uses List.fold_left with an immutable accumulator tuple (groups, current), building the result purely functionally. Rust naturally reaches for mutable Vec accumulators inside a for loop — equally idiomatic and avoids repeated allocation overhead.
  • Structural pattern matching vs. index comparison: OCaml matches on the head of current with hd :: _ when hd = x, which is elegant list deconstruction. Rust compares via current[0] on a Vec, trading structural elegance for direct, bounds-clear indexing.
  • Key function as higher-order argument: Both languages accept a key function (Fn(&T) -> K in Rust, 'a -> 'b in OCaml). Rust's trait bounds make the constraint explicit — K: PartialEq — while OCaml infers equality capability automatically via polymorphic =.
  • **Ownership and Clone:** Rust requires T: Clone to copy values into groups from a borrowed slice. OCaml's garbage collector and persistent lists make this cost invisible — values are always shared. The Clone bound is the honest cost of building owned sub-collections from a borrowed input.
  • Iterator composition: Rust's run_length_encode naturally composes group_consecutive with .map().collect(), mirroring functional pipeline style. OCaml achieves the same with List.map over the grouped result — both idioms express the same dataflow clearly.
  • When to Use Each Style

    Use idiomatic Rust (mutable accumulator) when: performance matters and you want to avoid repeated intermediate allocations; the grouping logic is straightforward and clarity trumps minimalism.

    Use iterator composition when: you want to build higher-level operations from simpler ones (e.g., run_length_encode built on top of group_consecutive), keeping each function small, pure, and independently testable.

    Exercises

  • Implement run_length_encode<T: Eq + Clone>(data: &[T]) -> Vec<(T, usize)> using group_consecutive.
  • Write group_by_all<T: Clone, K: Eq + Hash>(data: &[T], key: impl Fn(&T) -> K) -> HashMap<K, Vec<T>> for non-consecutive grouping.
  • Implement group_by_ranges(data: &[i32], range_size: i32) -> Vec<(i32, Vec<i32>)> that groups numbers into buckets of size range_size.
  • Open Source Repos