๐Ÿฆ€ Functional Rust

496: String Diff / Edit Distance

Difficulty: 2 Level: Intermediate Compare two strings to find how many edits separate them โ€” and identify the closest match in a candidate list.

The Problem This Solves

When you accept user input, you constantly face near-misses: a typo in a command name, a misspelled identifier, a search query that's almost-but-not-quite right. Without a distance metric you're stuck with exact matching, which means any typo silently fails. Levenshtein edit distance gives you a number: how many single-character insertions, deletions, or substitutions turn one string into another. With that number you can rank candidates, suggest corrections ("did you mean...?"), and build fuzzy search. This is the same algorithm behind spell-checkers, DNA sequence alignment, and `git diff`. Beyond finding the single nearest match, character-by-character comparison lets you highlight where strings differ โ€” useful for test failure output, diffs, and logging.

The Intuition

Imagine filling in a grid where rows are characters of the first string and columns are characters of the second. Each cell stores the minimum edits needed to transform the prefix ending at that row into the prefix ending at that column. If the characters match, carry the diagonal value forward for free. If they don't, take the cheapest of three options: insert, delete, or substitute (each costs 1). The bottom-right corner is your answer. For "nearest match", just run Levenshtein against every candidate and take the minimum.

How It Works in Rust

Step 1 โ€” Collect chars to avoid byte-indexing into UTF-8:
let sv: Vec<char> = s.chars().collect();
let tv: Vec<char> = t.chars().collect();
Step 2 โ€” Fill the DP table with classic mutable 2D vec:
let mut dp = vec![vec![0usize; n+1]; m+1];
for i in 0..=m { dp[i][0] = i; }
for j in 0..=n { dp[0][j] = j; }
for i in 1..=m {
 for j in 1..=n {
     dp[i][j] = if sv[i-1] == tv[j-1] {
         dp[i-1][j-1]
     } else {
         1 + dp[i-1][j].min(dp[i][j-1]).min(dp[i-1][j-1])
     };
 }
}
Step 3 โ€” Find the closest candidate using `min_by_key`:
fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
 candidates.iter()
     .min_by_key(|&&c| levenshtein(query, c))
     .copied()
}
The lifetime `'a` ensures the returned `&str` borrows from `candidates`, not from `query`. Step 4 โ€” Simple char diff with `zip`:
let diff: String = a.chars().zip(b.chars())
 .map(|(ac, bc)| if ac == bc { ac } else { '*' })
 .collect();

What This Unlocks

Key Differences

ConceptOCamlRust
2D DP table`Array.make_matrix``vec![vec![...]]`
Char iteration`String.to_seq` / `Bytes.get``.chars().collect::<Vec<char>>()`
Min of three`min a (min b c)``.min(b).min(c)` chained
Return borrowed strGC handles lifetimesexplicit `'a` on return type
// 496. String diff / edit distance
fn levenshtein(s: &str, t: &str) -> usize {
    let sv: Vec<char> = s.chars().collect();
    let tv: Vec<char> = t.chars().collect();
    let (m, n) = (sv.len(), tv.len());
    let mut dp = vec![vec![0usize; n+1]; m+1];
    for i in 0..=m { dp[i][0] = i; }
    for j in 0..=n { dp[0][j] = j; }
    for i in 1..=m { for j in 1..=n {
        dp[i][j] = if sv[i-1]==tv[j-1] { dp[i-1][j-1] }
                   else { 1 + dp[i-1][j].min(dp[i][j-1]).min(dp[i-1][j-1]) };
    }}
    dp[m][n]
}

fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
    candidates.iter().min_by_key(|&&c| levenshtein(query, c)).copied()
}

fn starts_with_ignore_case(s: &str, prefix: &str) -> bool {
    s.len() >= prefix.len() && s[..prefix.len()].eq_ignore_ascii_case(prefix)
}

fn main() {
    let pairs = [("kitten","sitting"),("hello","hello"),("abc","xyz"),("","abc"),("rust","bust")];
    for (a,b) in pairs { println!("dist('{}','{}') = {}", a, b, levenshtein(a, b)); }

    let words = &["rust","bust","just","trust","gust"];
    println!("closest to 'rast': {:?}", closest("rast", words));
    println!("closest to 'jost': {:?}", closest("jost", words));

    // Simple diff: highlight changed chars
    let a = "hello world";
    let b = "hello rust!";
    let same: String = a.chars().zip(b.chars()).map(|(ac,bc)| if ac==bc { ac } else { '*' }).collect();
    println!("diff: {}", same);
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn test_kitten()  { assert_eq!(levenshtein("kitten","sitting"),3); }
    #[test] fn test_same()    { assert_eq!(levenshtein("abc","abc"),0); }
    #[test] fn test_empty()   { assert_eq!(levenshtein("","abc"),3); assert_eq!(levenshtein("abc",""),3); }
    #[test] fn test_closest() { assert_eq!(closest("rast",&["rust","bust","just"]),Some("rust")); }
}
(* 496. Edit distance โ€“ OCaml *)
let levenshtein s t =
  let m=String.length s and n=String.length t in
  let d=Array.init (m+1) (fun i -> Array.init (n+1) (fun j ->
    if i=0 then j else if j=0 then i else 0)) in
  for i=1 to m do for j=1 to n do
    d.(i).(j) <-
      if s.[i-1]=t.[j-1] then d.(i-1).(j-1)
      else 1 + min d.(i-1).(j) (min d.(i).(j-1) d.(i-1).(j-1))
  done done;
  d.(m).(n)

let () =
  let pairs = [("kitten","sitting");("hello","hello");("abc","xyz");("","abc")] in
  List.iter (fun (a,b) ->
    Printf.printf "dist('%s','%s')=%d\n" a b (levenshtein a b)
  ) pairs