499: Escaping and Unescaping Strings
Difficulty: 2 Level: Intermediate Safely encode special characters for HTML, JSON, and display โ and reverse the process.The Problem This Solves
Every format that embeds strings has special characters that need escaping. HTML can't contain raw `<` โ it must be `<`. JSON can't contain unescaped `"` โ it must be `\"`. Log output can't contain unescaped newlines if you want one-line-per-entry formatting. Python has `html.escape()`, `json.dumps()`. JavaScript has `encodeURIComponent()`, `JSON.stringify()`. These are usually library functions. Rust's standard library doesn't include HTML or JSON escaping โ those live in crates โ but the pattern of implementing escaping is worth knowing because it appears in parsers, serializers, and any code that bridges format boundaries. Implementing your own escaper in Rust also teaches an important pattern: character-by-character transformation with `flat_map` on `chars()`, producing multiple output characters per input character. The `escape_control` function shows how to build stateful unescaping with a `Peekable` iterator.The Intuition
Escaping is "replace this char with a safe representation." Unescaping is the reverse. Both are string transformations. For escaping: iterate chars, match on special ones, emit their replacement. The `flat_map` approach is idiomatic โ each input char produces 0 or more output chars. Collect into a new `String`. For unescaping: you need to look ahead โ `\n` is two chars that represent one. Use a `Peekable` iterator or a `while let` loop that calls `.next()` explicitly when you see the escape character. Rust's built-in `{:?}` format already escapes strings for debug output. If you just need to print a string in a way that shows all special characters visibly, `println!("{:?}", s)` is instant and free.How It Works in Rust
// HTML escaping โ flat_map: each char โ 0 or more output chars
fn escape_html(s: &str) -> String {
s.chars().flat_map(|c| match c {
'<' => "<".chars().collect::<Vec<_>>(),
'>' => ">".chars().collect(),
'&' => "&".chars().collect(),
'"' => """.chars().collect(),
'\'' => "'".chars().collect(),
c => vec![c],
}).collect()
}
// HTML unescaping โ simple: replace known sequences
fn unescape_html(s: &str) -> String {
s.replace("<", "<")
.replace(">", ">")
.replace("&", "&")
.replace(""", "\"")
.replace("'", "'")
}
let html = "<div class=\"hello\">Hello & World!</div>";
let escaped = escape_html(html);
// "<div class="hello">Hello & World!</div>"
assert_eq!(unescape_html(&escaped), html); // roundtrip โ
// Control character escaping โ peekable iterator for unescaping
fn escape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'\n' => out.push_str("\\n"),
'\t' => out.push_str("\\t"),
'\r' => out.push_str("\\r"),
'\\' => out.push_str("\\\\"),
'"' => out.push_str("\\\""),
c => out.push(c),
}
}
out
}
fn unescape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
let mut iter = s.chars().peekable();
while let Some(c) = iter.next() {
if c == '\\' {
match iter.next() { // consume the NEXT char to interpret the escape
Some('n') => out.push('\n'),
Some('t') => out.push('\t'),
Some('\\') => out.push('\\'),
Some('"') => out.push('"'),
Some(other) => { out.push('\\'); out.push(other); }
None => out.push('\\'),
}
} else {
out.push(c);
}
}
out
}
// Built-in: debug format escapes automatically
let s = "hello\nworld\ttab";
println!("{:?}", s); // "hello\nworld\ttab" โ visible escapes
What This Unlocks
- HTML generation โ escape user input before inserting into HTML to prevent XSS.
- Log formatting โ escape newlines in log values to maintain one-log-per-line format.
- Custom serialization โ implement the escape/unescape loop pattern for any format (CSV, TOML, protocol buffers).
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| HTML escape | Manual Buffer loop | Manual with `flat_map` (or `html-escape` crate) |
| Control char escape | Manual Buffer loop | Manual with `push_str` per char |
| Char-to-multi-char | `Buffer.add_string buf "..."` | `flat_map` โ expand each char |
| Stateful unescaping | Manual `ref` loop | `Peekable` iterator + `iter.next()` |
| Debug string repr | `Printf.printf "%s"` | `{:?}` โ built-in, shows escapes |
| JSON escaping | No std support | `serde_json` crate |
// 499. Escaping and unescaping strings
fn escape_html(s: &str) -> String {
s.chars().flat_map(|c| match c {
'<' => "<".chars().collect::<Vec<_>>(),
'>' => ">".chars().collect(),
'&' => "&".chars().collect(),
'"' => """.chars().collect(),
'\'' => "'".chars().collect(),
c => vec![c],
}).collect()
}
fn unescape_html(s: &str) -> String {
s.replace("<","<").replace(">",">")
.replace("&","&").replace(""","\"").replace("'","'")
}
fn escape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'\n' => out.push_str("\\n"),
'\t' => out.push_str("\\t"),
'\r' => out.push_str("\\r"),
'\\' => out.push_str("\\\\"),
'"' => out.push_str("\\\""),
c => out.push(c),
}
}
out
}
fn unescape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
let mut iter = s.chars().peekable();
while let Some(c) = iter.next() {
if c == '\\' {
match iter.next() {
Some('n') => out.push('\n'),
Some('t') => out.push('\t'),
Some('r') => out.push('\r'),
Some('\\') => out.push('\\'),
Some('"') => out.push('"'),
Some(c) => { out.push('\\'); out.push(c); }
None => out.push('\\'),
}
} else {
out.push(c);
}
}
out
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn test_html_escape() { assert_eq!(escape_html("<b>hi</b>"),"<b>hi</b>"); }
#[test] fn test_html_unescape() { assert_eq!(unescape_html("<b>"),"<b>"); }
#[test] fn test_roundtrip_html(){ let s="<div>&</div>"; assert_eq!(unescape_html(&escape_html(s)),s); }
#[test] fn test_control_esc() { assert_eq!(escape_control("a\nb"),"a\\nb"); }
#[test] fn test_control_unesc() { assert_eq!(unescape_control("a\\nb"),"a\nb"); }
}
(* 499. String escaping โ OCaml *)
let escape_html s =
let buf = Buffer.create (String.length s) in
String.iter (fun c -> match c with
| '<' -> Buffer.add_string buf "<"
| '>' -> Buffer.add_string buf ">"
| '&' -> Buffer.add_string buf "&"
| '"' -> Buffer.add_string buf """
| '\'' -> Buffer.add_string buf "'"
| c -> Buffer.add_char buf c
) s;
Buffer.contents buf
let escape_backslash s =
let buf = Buffer.create (String.length s) in
String.iter (fun c -> match c with
| '\n' -> Buffer.add_string buf "\\n"
| '\t' -> Buffer.add_string buf "\\t"
| '\\' -> Buffer.add_string buf "\\\\"
| c -> Buffer.add_char buf c
) s;
Buffer.contents buf
let () =
let html = "<div class=\"hello\">Hello & World!</div>" in
Printf.printf "%s\n" (escape_html html);
let raw = "line1\nline2\ttab\\slash" in
Printf.printf "%s\n" (escape_backslash raw)