496: String Diff / Edit Distance
Difficulty: 2 Level: Intermediate Compare two strings to find how many edits separate them โ and identify the closest match in a candidate list.The Problem This Solves
When you accept user input, you constantly face near-misses: a typo in a command name, a misspelled identifier, a search query that's almost-but-not-quite right. Without a distance metric you're stuck with exact matching, which means any typo silently fails. Levenshtein edit distance gives you a number: how many single-character insertions, deletions, or substitutions turn one string into another. With that number you can rank candidates, suggest corrections ("did you mean...?"), and build fuzzy search. This is the same algorithm behind spell-checkers, DNA sequence alignment, and `git diff`. Beyond finding the single nearest match, character-by-character comparison lets you highlight where strings differ โ useful for test failure output, diffs, and logging.The Intuition
Imagine filling in a grid where rows are characters of the first string and columns are characters of the second. Each cell stores the minimum edits needed to transform the prefix ending at that row into the prefix ending at that column. If the characters match, carry the diagonal value forward for free. If they don't, take the cheapest of three options: insert, delete, or substitute (each costs 1). The bottom-right corner is your answer. For "nearest match", just run Levenshtein against every candidate and take the minimum.How It Works in Rust
Step 1 โ Collect chars to avoid byte-indexing into UTF-8:let sv: Vec<char> = s.chars().collect();
let tv: Vec<char> = t.chars().collect();
Step 2 โ Fill the DP table with classic mutable 2D vec:
let mut dp = vec![vec![0usize; n+1]; m+1];
for i in 0..=m { dp[i][0] = i; }
for j in 0..=n { dp[0][j] = j; }
for i in 1..=m {
for j in 1..=n {
dp[i][j] = if sv[i-1] == tv[j-1] {
dp[i-1][j-1]
} else {
1 + dp[i-1][j].min(dp[i][j-1]).min(dp[i-1][j-1])
};
}
}
Step 3 โ Find the closest candidate using `min_by_key`:
fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
candidates.iter()
.min_by_key(|&&c| levenshtein(query, c))
.copied()
}
The lifetime `'a` ensures the returned `&str` borrows from `candidates`, not from `query`.
Step 4 โ Simple char diff with `zip`:
let diff: String = a.chars().zip(b.chars())
.map(|(ac, bc)| if ac == bc { ac } else { '*' })
.collect();
What This Unlocks
- Autocomplete and fuzzy search โ rank suggestions by edit distance to the query.
- Error messages โ "unknown command 'buid', did you mean 'build'?" in any CLI.
- Diff visualization โ character-level highlighting without pulling in a diff library.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| 2D DP table | `Array.make_matrix` | `vec![vec![...]]` |
| Char iteration | `String.to_seq` / `Bytes.get` | `.chars().collect::<Vec<char>>()` |
| Min of three | `min a (min b c)` | `.min(b).min(c)` chained |
| Return borrowed str | GC handles lifetimes | explicit `'a` on return type |
// 496. String diff / edit distance
fn levenshtein(s: &str, t: &str) -> usize {
let sv: Vec<char> = s.chars().collect();
let tv: Vec<char> = t.chars().collect();
let (m, n) = (sv.len(), tv.len());
let mut dp = vec![vec![0usize; n+1]; m+1];
for i in 0..=m { dp[i][0] = i; }
for j in 0..=n { dp[0][j] = j; }
for i in 1..=m { for j in 1..=n {
dp[i][j] = if sv[i-1]==tv[j-1] { dp[i-1][j-1] }
else { 1 + dp[i-1][j].min(dp[i][j-1]).min(dp[i-1][j-1]) };
}}
dp[m][n]
}
fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
candidates.iter().min_by_key(|&&c| levenshtein(query, c)).copied()
}
fn starts_with_ignore_case(s: &str, prefix: &str) -> bool {
s.len() >= prefix.len() && s[..prefix.len()].eq_ignore_ascii_case(prefix)
}
fn main() {
let pairs = [("kitten","sitting"),("hello","hello"),("abc","xyz"),("","abc"),("rust","bust")];
for (a,b) in pairs { println!("dist('{}','{}') = {}", a, b, levenshtein(a, b)); }
let words = &["rust","bust","just","trust","gust"];
println!("closest to 'rast': {:?}", closest("rast", words));
println!("closest to 'jost': {:?}", closest("jost", words));
// Simple diff: highlight changed chars
let a = "hello world";
let b = "hello rust!";
let same: String = a.chars().zip(b.chars()).map(|(ac,bc)| if ac==bc { ac } else { '*' }).collect();
println!("diff: {}", same);
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn test_kitten() { assert_eq!(levenshtein("kitten","sitting"),3); }
#[test] fn test_same() { assert_eq!(levenshtein("abc","abc"),0); }
#[test] fn test_empty() { assert_eq!(levenshtein("","abc"),3); assert_eq!(levenshtein("abc",""),3); }
#[test] fn test_closest() { assert_eq!(closest("rast",&["rust","bust","just"]),Some("rust")); }
}
(* 496. Edit distance โ OCaml *)
let levenshtein s t =
let m=String.length s and n=String.length t in
let d=Array.init (m+1) (fun i -> Array.init (n+1) (fun j ->
if i=0 then j else if j=0 then i else 0)) in
for i=1 to m do for j=1 to n do
d.(i).(j) <-
if s.[i-1]=t.[j-1] then d.(i-1).(j-1)
else 1 + min d.(i-1).(j) (min d.(i).(j-1) d.(i-1).(j-1))
done done;
d.(m).(n)
let () =
let pairs = [("kitten","sitting");("hello","hello");("abc","xyz");("","abc")] in
List.iter (fun (a,b) ->
Printf.printf "dist('%s','%s')=%d\n" a b (levenshtein a b)
) pairs