🦀 Functional Rust

816: Rabin-Karp Rolling Hash Search

Difficulty: 4 Level: Advanced Hash-based string search: slide a window in O(1) per step by subtracting the old character and adding the new one — enabling efficient multi-pattern search.

The Problem This Solves

KMP and BMH excel at single-pattern search. But what if you need to search for thousands of patterns simultaneously — virus signatures in a file, a dictionary of forbidden words, or plagiarism detection across documents? Running KMP once per pattern costs O(patterns × text), which is unacceptable. Rabin-Karp solves this by reducing each pattern to a hash value. Computing a hash for every m-length window in the text naïvely is O(n×m), but with a rolling hash — where you subtract the outgoing character's contribution and add the incoming one — each slide costs O(1). This gives O(n + m) expected time for single-pattern and O(n + k×m) for k patterns (or O(n + k) with a hash set). The practical application is document fingerprinting: Rabin fingerprinting (a variant) powers Google's copy detection, plagiarism checkers, and the rsync rolling checksum for efficient file synchronization.

The Intuition

The hash is a polynomial: `H = s[0]×base^(m-1) + s[1]×base^(m-2) + … + s[m-1]`. To slide one position right: subtract `s[i]×base^(m-1)` (the leftmost character's contribution), multiply by `base` to shift everything left, then add `s[i+m]`. All operations are mod a large prime to keep values in u64 range. When hashes match, verify with a direct comparison to handle collisions. Expected O(n + m) because collisions are rare with a good prime. OCaml uses the same polynomial; Rust adds explicit `u64` wrapping and `+ PRIME` to avoid underflow in subtraction before taking mod.

How It Works in Rust

const BASE:  u64 = 256;
const PRIME: u64 = 1_000_000_007;

fn rabin_karp(text: &str, pattern: &str) -> Vec<usize> {
 let (t, p) = (text.as_bytes(), pattern.as_bytes());
 let (n, m) = (t.len(), p.len());
 if m == 0 || m > n { return vec![]; }

 // Precompute base^(m-1) mod PRIME — the weight of the leftmost character
 let mut pow = 1u64;
 for _ in 0..m - 1 { pow = pow * BASE % PRIME; }

 // Initial hashes for pattern and first window
 let (mut hash_p, mut hash_t) = (0u64, 0u64);
 for i in 0..m {
     hash_p = (hash_p * BASE + p[i] as u64) % PRIME;
     hash_t = (hash_t * BASE + t[i] as u64) % PRIME;
 }

 let mut matches = Vec::new();
 for i in 0..=n - m {
     // Hash match → verify to eliminate collisions
     if hash_t == hash_p && &t[i..i + m] == p {
         matches.push(i);
     }
     if i < n - m {
         // Rolling update: remove t[i], add t[i+m]
         // +PRIME before subtracting prevents underflow in modular arithmetic
         hash_t = (hash_t + PRIME - t[i] as u64 * pow % PRIME) % PRIME;
         hash_t = (hash_t * BASE + t[i + m] as u64) % PRIME;
     }
 }
 matches
}
The `+ PRIME` before subtraction is the idiomatic Rust pattern for modular subtraction with unsigned integers — avoids wrapping panics in debug mode.

What This Unlocks

Key Differences

ConceptOCamlRust
Polynomial hash`let h = h base + c mod p`Same; explicit `u64` types
Modular subtraction`(h - oldpow + prime) mod prime``(h + PRIME - old * pow % PRIME) % PRIME`
Integer overflow`Int64` for 64-bit safety`u64` arithmetic; `u128` not needed here
Sliding windowManual loop with ref varsIdiomatic `for i in 0..=n-m`
Collision check`String.sub` comparison`&t[i..i+m] == p` slice equality
// Rabin-Karp Rolling Hash Search — O(n+m) expected

const BASE:  u64 = 256;
const PRIME: u64 = 1_000_000_007;

fn rabin_karp(text: &str, pattern: &str) -> Vec<usize> {
    let t = text.as_bytes();
    let p = pattern.as_bytes();
    let (n, m) = (t.len(), p.len());
    if m == 0 || m > n { return vec![]; }

    // Compute base^(m-1) mod prime
    let mut pow = 1u64;
    for _ in 0..m - 1 { pow = pow * BASE % PRIME; }

    // Initial hashes
    let mut hash_p = 0u64;
    let mut hash_t = 0u64;
    for i in 0..m {
        hash_p = (hash_p * BASE + p[i] as u64) % PRIME;
        hash_t = (hash_t * BASE + t[i] as u64) % PRIME;
    }

    let mut matches = Vec::new();
    for i in 0..=n - m {
        if hash_t == hash_p && &t[i..i + m] == p {
            matches.push(i);
        }
        if i < n - m {
            hash_t = (hash_t + PRIME - t[i] as u64 * pow % PRIME) % PRIME;
            hash_t = (hash_t * BASE + t[i + m] as u64) % PRIME;
        }
    }
    matches
}

fn main() {
    let text = "ABCDABABCDABCDAB";
    let pat  = "ABCD";
    println!("Text:    {text:?}");
    println!("Pattern: {pat:?}");
    println!("Matches: {:?}", rabin_karp(text, pat));

    println!("\"aab\" in \"aaabaaabaa\": {:?}", rabin_karp("aaabaaabaa", "aab"));
    println!("No match: {:?}", rabin_karp("abcdef", "xyz"));
}
(* Rabin-Karp Rolling Hash — O(n+m) expected *)

let base  = 256
let prime = 1_000_000_007

let rabin_karp text pattern =
  let n = String.length text in
  let m = String.length pattern in
  if m = 0 then []
  else if m > n then []
  else begin
    (* Compute base^(m-1) mod prime *)
    let pow = ref 1 in
    for _ = 1 to m - 1 do
      pow := (!pow * base) mod prime
    done;

    (* Compute initial hashes *)
    let hash_p = ref 0 in
    let hash_t = ref 0 in
    for i = 0 to m - 1 do
      hash_p := (!hash_p * base + Char.code pattern.[i]) mod prime;
      hash_t := (!hash_t * base + Char.code text.[i])    mod prime
    done;

    let matches = ref [] in
    for i = 0 to n - m do
      if !hash_t = !hash_p then begin
        (* Verify (avoid false positives) *)
        let ok = ref true in
        for j = 0 to m - 1 do
          if text.[i+j] <> pattern.[j] then ok := false
        done;
        if !ok then matches := i :: !matches
      end;
      (* Roll hash *)
      if i < n - m then begin
        hash_t := ((!hash_t - Char.code text.[i] * !pow mod prime + prime) * base
                   + Char.code text.[i + m]) mod prime
      end
    done;
    List.rev !matches
  end

let () =
  let text = "ABCDABABCDABCDAB" in
  let pat  = "ABCD" in
  Printf.printf "Text:    %S\n" text;
  Printf.printf "Pattern: %S\n" pat;
  Printf.printf "Matches: [%s]\n"
    (String.concat "; " (List.map string_of_int (rabin_karp text pat)));

  let m2 = rabin_karp "aaabaaabaa" "aab" in
  Printf.printf "\"aab\" in \"aaabaaabaa\": [%s]\n"
    (String.concat "; " (List.map string_of_int m2))