๐Ÿฆ€ Functional Rust

073: Topological Sort

Difficulty: 3 Level: Advanced Order nodes in a directed acyclic graph (DAG) so every dependency comes before the thing that depends on it โ€” the algorithm behind build systems, package managers, and task schedulers.

The Problem This Solves

You have tasks where some must complete before others can start: compile `utils` before `main`, install `tokio` before `axum`, run database migrations before starting the server. The dependency graph might have complex chains and fan-outs, but it must have no cycles (a cycle means task A requires B which requires A โ€” impossible to satisfy). Topological sort produces a linear ordering where every edge points forward: if there's an edge from A to B (A depends on B), then B appears before A in the result. Without it, you'd need to manually calculate the right order โ€” tedious and error-prone for large graphs. Two classic algorithms: DFS-based (post-order reversal) and Kahn's algorithm (BFS with in-degree counting). Rust's `HashMap` and `HashSet` make both natural to implement, though the explicit ownership of mutable state makes the imperative style more visible than OCaml's purely functional approach.

The Intuition

In OCaml's purely functional style, visited state and the accumulation list are threaded through recursive calls โ€” nothing is mutated in place. In Rust, you use `HashSet` for `visited` and `Vec` for `order` with mutable references โ€” explicit about who owns and who mutates. The algorithm is the same; the ownership discipline is different. DFS post-order: visit all descendants of a node before adding it to the result. Reverse the result. Kahn's: find all nodes with no incoming edges, remove them (decrement neighbors' in-degrees), repeat.

How It Works in Rust

// DFS-based: post-order traversal, then reverse
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
 let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
 let mut all_nodes = HashSet::new();
 for &(from, to) in edges {
     all_nodes.insert(from); all_nodes.insert(to);
     adj.entry(from).or_default().push(to);
 }

 let mut visited = HashSet::new();
 let mut order = Vec::new();

 fn visit<'a>(node: &'a str, adj: &HashMap<&'a str, Vec<&'a str>>,
              visited: &mut HashSet<&'a str>, order: &mut Vec<String>) {
     if visited.contains(node) { return; }
     visited.insert(node);
     for &neighbor in adj.get(node).into_iter().flatten() {
         visit(neighbor, adj, visited, order);
     }
     order.push(node.to_string()); // post-order: add AFTER all descendants
 }

 for node in all_nodes { visit(node, &adj, &mut visited, &mut order); }
 order.reverse(); // reverse post-order = topological order
 order
}
// Kahn's algorithm: BFS with in-degree counting โ€” also detects cycles (leftover nodes)
pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
 // Count incoming edges for each node
 let mut in_degree: HashMap<&str, usize> = HashMap::new();
 // ... start with zero-in-degree nodes, process and decrement neighbors
}
The DFS approach is recursive and elegant but can stack-overflow on very deep graphs. Kahn's is iterative, naturally detects cycles (if `result.len() != total_nodes`, there's a cycle), and is often preferred in production.

What This Unlocks

Key Differences

ConceptOCamlRust
Visited stateThreaded through recursion as parameterMutable `HashSet` passed by `&mut` reference
Result accumulationAccumulated in return value or parameterMutable `Vec` passed by `&mut` reference
Adjacency representationAssociation list or `Hashtbl``HashMap<&str, Vec<&str>>`
Recursive helperLocal `let rec` inside the functionNested `fn` (no closures for mutual recursion)
Cycle detectionExtra state in recursive functionKahn's: check `result.len() != node_count`
/// # Topological Sort โ€” DAG Ordering
///
/// Order nodes in a directed acyclic graph so every edge goes from earlier to later.
/// Uses DFS-based algorithm.

use std::collections::{HashMap, HashSet};

/// DFS-based topological sort.
/// Returns nodes in topological order (dependencies first).
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
    // Collect all nodes
    let mut all_nodes = HashSet::new();
    let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
    for &(from, to) in edges {
        all_nodes.insert(from);
        all_nodes.insert(to);
        adj.entry(from).or_default().push(to);
    }

    let mut visited = HashSet::new();
    let mut order = Vec::new();

    fn visit<'a>(
        node: &'a str,
        adj: &HashMap<&'a str, Vec<&'a str>>,
        visited: &mut HashSet<&'a str>,
        order: &mut Vec<String>,
    ) {
        if visited.contains(node) {
            return;
        }
        visited.insert(node);
        if let Some(neighbors) = adj.get(node) {
            for &neighbor in neighbors {
                visit(neighbor, adj, visited, order);
            }
        }
        // Post-order: add after all descendants visited
        order.push(node.to_string());
    }

    // Visit all nodes (handles disconnected components)
    let mut sorted_nodes: Vec<&str> = all_nodes.into_iter().collect();
    sorted_nodes.sort(); // deterministic ordering
    for node in sorted_nodes {
        visit(node, &adj, &mut visited, &mut order);
    }

    order.reverse(); // reverse post-order = topological order
    order
}

/// Kahn's algorithm (BFS-based) โ€” alternative approach using in-degree counting.
pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
    let mut all_nodes = HashSet::new();
    let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
    let mut in_degree: HashMap<&str, usize> = HashMap::new();

    for &(from, to) in edges {
        all_nodes.insert(from);
        all_nodes.insert(to);
        adj.entry(from).or_default().push(to);
        *in_degree.entry(to).or_insert(0) += 1;
        in_degree.entry(from).or_insert(0);
    }

    // Start with nodes that have no incoming edges
    let mut queue: Vec<&str> = in_degree
        .iter()
        .filter(|(_, &deg)| deg == 0)
        .map(|(&node, _)| node)
        .collect();
    queue.sort(); // deterministic
    let mut result = Vec::new();

    while let Some(node) = queue.first().copied() {
        queue.remove(0);
        result.push(node.to_string());
        if let Some(neighbors) = adj.get(node) {
            for &neighbor in neighbors {
                let deg = in_degree.get_mut(neighbor).unwrap();
                *deg -= 1;
                if *deg == 0 {
                    queue.push(neighbor);
                    queue.sort();
                }
            }
        }
    }

    result
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic() {
        let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
        let order = topo_sort(&edges);
        // a must come before b and c; b and c before d; d before e
        let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
        assert!(pos("a") < pos("b"));
        assert!(pos("a") < pos("c"));
        assert!(pos("b") < pos("d"));
        assert!(pos("d") < pos("e"));
    }

    #[test]
    fn test_kahn() {
        let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
        let order = topo_sort_kahn(&edges);
        let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
        assert!(pos("a") < pos("b"));
        assert!(pos("d") < pos("e"));
    }

    #[test]
    fn test_empty() {
        let order = topo_sort(&[]);
        assert!(order.is_empty());
    }

    #[test]
    fn test_single_edge() {
        let order = topo_sort(&[("a", "b")]);
        assert_eq!(order, vec!["a", "b"]);
    }

    #[test]
    fn test_linear_chain() {
        let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
        let order = topo_sort(&edges);
        assert_eq!(order, vec!["a", "b", "c", "d"]);
    }
}

fn main() {
    println!("{:?}", pos("a") < pos("b"));
    println!("{:?}", pos("a") < pos("c"));
    println!("{:?}", pos("b") < pos("d"));
}
(* Topological Sort โ€” DAG Ordering *)

module SS = Set.Make(String)

let topo_sort edges =
  let neighbors node =
    List.filter_map (fun (a, b) -> if a = node then Some b else None) edges in
  let all_nodes = List.fold_left (fun s (a, b) -> SS.add a (SS.add b s)) SS.empty edges in
  let rec visit node (visited, order) =
    if SS.mem node visited then (visited, order)
    else
      let visited = SS.add node visited in
      let visited, order = List.fold_left (fun acc n ->
        visit n acc) (visited, order) (neighbors node) in
      (visited, node :: order)
  in
  let _, order = SS.fold (fun node acc -> visit node acc) all_nodes (SS.empty, []) in
  order

let () =
  let edges = [("a","b");("a","c");("b","d");("c","d");("d","e")] in
  List.iter (Printf.printf "%s ") (topo_sort edges);
  print_newline ()

๐Ÿ“Š Detailed Comparison

Topological Sort โ€” OCaml vs Rust Comparison

Core Insight

Topological sort reveals how each language handles graph traversal state. OCaml threads immutable `(visited_set, result_list)` pairs through recursive calls โ€” purely functional but verbose. Rust passes mutable references to `HashSet` and `Vec` โ€” more explicit about mutation but closer to the imperative algorithm.

OCaml Approach

Uses `Set.Make(String)` for the visited set. The `visit` function takes and returns `(visited, order)` tuples โ€” no mutation, but every recursive call creates new tuples. `SS.fold` iterates over all nodes to handle disconnected components. The functional style ensures referential transparency.

Rust Approach

Two algorithms: (1) DFS with `&mut HashSet` and `&mut Vec` passed to recursive `visit` โ€” clear ownership of mutable state. (2) Kahn's BFS algorithm using in-degree counting. The Rust version explicitly allocates `HashMap` for adjacency lists upfront, making the graph structure clear.

Comparison Table

AspectOCamlRust
MemoryImmutable sets (tree nodes)HashSet + Vec (hash table + array)
Null safetyN/AN/A
ErrorsNo cycle detectionNo cycle detection (add if needed)
IterationRecursive with tuple threadingRecursive with `&mut` references
StateFunctional (pass-and-return)Mutable references (borrow checker)

Things Rust Learners Should Notice

1. `&mut` in recursive functions โ€” passing mutable refs down the call stack is idiomatic for graph algorithms 2. `HashMap::entry().or_default()` โ€” builds adjacency lists cleanly 3. Lifetime annotations `'a` โ€” needed when storing references to input data in the visited set 4. Two algorithms โ€” DFS (post-order reverse) vs Kahn's (BFS with in-degree) โ€” same result, different trade-offs 5. Deterministic output โ€” sorting nodes before iteration ensures reproducible results in tests

Further Reading

  • [Topological sort (Wikipedia)](https://en.wikipedia.org/wiki/Topological_sorting)
  • [petgraph crate](https://docs.rs/petgraph/) โ€” production-quality graph algorithms for Rust