794 Intermediate

794-word-break-dp — Word Break

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "794-word-break-dp — Word Break" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Given a string and a dictionary of words, determine if the string can be segmented into a sequence of dictionary words. Key difference from OCaml: 1. **HashSet vs Hashtbl**: Rust uses `HashSet<&str>` for O(1) lookup; OCaml's `Hashtbl.mem` provides the same.

Tutorial

The Problem

Given a string and a dictionary of words, determine if the string can be segmented into a sequence of dictionary words. This is used in natural language processing (Chinese word segmentation — Chinese text has no spaces), search engine query parsing, and code tokenization. The DP approach avoids the exponential backtracking of a naive recursive search by caching whether each prefix is breakable.

🎯 Learning Outcomes

• Model word break as dp[i] = whether s[:i] can be formed from dictionary words

• Apply the recurrence: dp[i] = any j: dp[j] && s[j..i] in dict

• Use a HashSet for O(1) word lookups vs. O(k) linear scan

• Understand the O(n²) time and O(n) space complexity

• Extend to count all possible segmentations (exponential output)

Code Example

#![allow(clippy::all)]
//! # Word Break

use std::collections::HashSet;

pub fn word_break(s: &str, dict: &[&str]) -> bool {
    let words: HashSet<&str> = dict.iter().copied().collect();
    let n = s.len();
    let mut dp = vec![false; n + 1];
    dp[0] = true;
    for i in 1..=n {
        for j in 0..i {
            if dp[j] && words.contains(&s[j..i]) {
                dp[i] = true;
                break;
            }
        }
    }
    dp[n]
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_word_break() {
        assert!(word_break("leetcode", &["leet", "code"]));
    }
    #[test]
    fn test_no_break() {
        assert!(!word_break(
            "catsandog",
            &["cats", "dog", "sand", "and", "cat"]
        ));
    }
}

(* Word Break — dictionary DP O(n²) *)
module SSet = Set.Make(String)

let word_break s dict =
  let dict_set = List.fold_left (fun acc w -> SSet.add w acc) SSet.empty dict in
  let n  = String.length s in
  let dp = Array.make (n + 1) false in
  dp.(0) <- true;
  for i = 1 to n do
    let j = ref 0 in
    while !j < i && not dp.(i) do
      if dp.(!j) && SSet.mem (String.sub s !j (i - !j)) dict_set then
        dp.(i) <- true;
      incr j
    done
  done;
  dp.(n)

let word_break_all s dict =
  let dict_set = List.fold_left (fun acc w -> SSet.add w acc) SSet.empty dict in
  let n  = String.length s in
  (* prev.(i) = list of j such that dp[j]=true and s[j..i] in dict *)
  let dp   = Array.make (n + 1) false in
  let prev = Array.make (n + 1) [] in
  dp.(0) <- true;
  for i = 1 to n do
    for j = 0 to i - 1 do
      if dp.(j) && SSet.mem (String.sub s j (i - j)) dict_set then begin
        dp.(i) <- true;
        prev.(i) <- j :: prev.(i)
      end
    done
  done;
  (* Collect all sentences *)
  let rec collect i =
    if i = 0 then [[]]
    else
      List.concat_map (fun j ->
        let word = String.sub s j (i - j) in
        List.map (fun rest -> rest @ [word]) (collect j)
      ) prev.(i)
  in
  if dp.(n) then collect n else []

let () =
  let dict = ["leet"; "code"; "leets"; "code"; "cats"; "and"; "sand"; "dog"; "cat"] in
  Printf.printf "'leetcode'    -> %b\n" (word_break "leetcode" dict);
  Printf.printf "'catsanddog' -> %b\n" (word_break "catsanddog" dict);
  Printf.printf "'catsanddogx'-> %b\n" (word_break "catsanddogx" dict);
  let sentences = word_break_all "catsanddog" ["cat";"cats";"and";"sand";"dog"] in
  Printf.printf "All parses of 'catsanddog':\n";
  List.iter (fun ws -> Printf.printf "  %s\n" (String.concat " " ws)) sentences

Key Differences

HashSet vs Hashtbl: Rust uses HashSet<&str> for O(1) lookup; OCaml's Hashtbl.mem provides the same.

Slice comparison: Rust's s[j..i] string slice is a &str; OCaml's String.sub creates a new string allocation on each check — a minor performance difference.

Early termination: Rust's break exits the inner loop once dp[i] is true; OCaml uses a boolean flag or exception-based early exit.

Segmentation: The all-segmentations variant uses DFS with memoization; both languages implement it similarly.

OCaml Approach

OCaml implements with Array.make (n+1) false and uses Hashtbl for the dictionary. The inner loop uses try with Hashtbl.find or Hashtbl.mem. Functional style uses Array.init + List.exists over the dictionary. The words list can be sorted by length to prune impossible substrings early. OCaml's Re library provides regex-based word segmentation as an alternative approach.

Full Source

#![allow(clippy::all)]
//! # Word Break

use std::collections::HashSet;

pub fn word_break(s: &str, dict: &[&str]) -> bool {
    let words: HashSet<&str> = dict.iter().copied().collect();
    let n = s.len();
    let mut dp = vec![false; n + 1];
    dp[0] = true;
    for i in 1..=n {
        for j in 0..i {
            if dp[j] && words.contains(&s[j..i]) {
                dp[i] = true;
                break;
            }
        }
    }
    dp[n]
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_word_break() {
        assert!(word_break("leetcode", &["leet", "code"]));
    }
    #[test]
    fn test_no_break() {
        assert!(!word_break(
            "catsandog",
            &["cats", "dog", "sand", "and", "cat"]
        ));
    }
}

(* Word Break — dictionary DP O(n²) *)
module SSet = Set.Make(String)

let word_break s dict =
  let dict_set = List.fold_left (fun acc w -> SSet.add w acc) SSet.empty dict in
  let n  = String.length s in
  let dp = Array.make (n + 1) false in
  dp.(0) <- true;
  for i = 1 to n do
    let j = ref 0 in
    while !j < i && not dp.(i) do
      if dp.(!j) && SSet.mem (String.sub s !j (i - !j)) dict_set then
        dp.(i) <- true;
      incr j
    done
  done;
  dp.(n)

let word_break_all s dict =
  let dict_set = List.fold_left (fun acc w -> SSet.add w acc) SSet.empty dict in
  let n  = String.length s in
  (* prev.(i) = list of j such that dp[j]=true and s[j..i] in dict *)
  let dp   = Array.make (n + 1) false in
  let prev = Array.make (n + 1) [] in
  dp.(0) <- true;
  for i = 1 to n do
    for j = 0 to i - 1 do
      if dp.(j) && SSet.mem (String.sub s j (i - j)) dict_set then begin
        dp.(i) <- true;
        prev.(i) <- j :: prev.(i)
      end
    done
  done;
  (* Collect all sentences *)
  let rec collect i =
    if i = 0 then [[]]
    else
      List.concat_map (fun j ->
        let word = String.sub s j (i - j) in
        List.map (fun rest -> rest @ [word]) (collect j)
      ) prev.(i)
  in
  if dp.(n) then collect n else []

let () =
  let dict = ["leet"; "code"; "leets"; "code"; "cats"; "and"; "sand"; "dog"; "cat"] in
  Printf.printf "'leetcode'    -> %b\n" (word_break "leetcode" dict);
  Printf.printf "'catsanddog' -> %b\n" (word_break "catsanddog" dict);
  Printf.printf "'catsanddogx'-> %b\n" (word_break "catsanddogx" dict);
  let sentences = word_break_all "catsanddog" ["cat";"cats";"and";"sand";"dog"] in
  Printf.printf "All parses of 'catsanddog':\n";
  List.iter (fun ws -> Printf.printf "  %s\n" (String.concat " " ws)) sentences

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_word_break() {
        assert!(word_break("leetcode", &["leet", "code"]));
    }
    #[test]
    fn test_no_break() {
        assert!(!word_break(
            "catsandog",
            &["cats", "dog", "sand", "and", "cat"]
        ));
    }
}

Deep Comparison

OCaml vs Rust: Word Break Dp

Overview

See the example.rs and example.ml files for detailed implementations.

Key Differences

Aspect	OCaml	Rust
Type system	Hindley-Milner	Ownership + traits
Memory	GC	Zero-cost abstractions
Mutability	Explicit ref	mut keyword
Error handling	Option/Result	Result<T, E>

See README.md for detailed comparison.

Exercises

Implement word_break_all(s, dict) -> Vec<String> that returns all valid space-separated segmentations (e.g., ["cat sand dog", "cats and dog"]).

Add word length bounds to the inner loop: only check substrings of length between min_word_len and max_word_len in the dictionary, reducing the inner loop iterations.

Implement a streaming version that breaks a text input into words on the fly, maintaining the DP state as new characters arrive.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust