492: OsStr and OsString
Difficulty: 1 Level: Intermediate Platform-native strings that may not be UTF-8 โ the correct types for file paths and environment variables.The Problem This Solves
File paths and environment variables are not UTF-8 strings on all operating systems. On Windows, paths are UTF-16 and can contain sequences that aren't valid Unicode. On Unix, paths are arbitrary byte sequences โ only `/` and null are special. A file named with arbitrary bytes is legal. If you read a file path into a `String`, Rust will panic or return an error if the path contains non-UTF-8 bytes. That's correct behavior โ but it means your program can't handle all valid paths on the OS it's running on. `OsStr` and `OsString` are Rust's bridge types: they hold the OS-native representation losslessly. You pass them to filesystem APIs. When you need to display or manipulate them as text, you explicitly convert โ and decide what to do if they're not valid UTF-8.The Intuition
A post box that accepts any parcel the postal service can deliver, not just parcels you've opened and verified. `OsString` holds whatever the OS hands you. `String` is what you get after you've unwrapped it and confirmed the contents are text. You convert at the boundary, deliberately.How It Works in Rust
1. Receiving from the OS โ `std::env` and `std::fs` return `OsStr`/`OsString`:use std::ffi::{OsStr, OsString};
let path: &OsStr = std::path::Path::new("/tmp/data").as_os_str();
let var: OsString = std::env::var_os("PATH").unwrap();
2. Convert to `&str` (may fail):
if let Some(s) = path.to_str() {
println!("utf-8 path: {}", s);
}
3. Lossy conversion โ always succeeds, replaces bad bytes with `U+FFFD`:
let display = var.to_string_lossy(); // Cow<str>
4. Building paths โ use `Path` and `PathBuf` which wrap `OsStr`/`OsString`:
let mut p = std::path::PathBuf::from("/home/user");
p.push("documents");
p.set_extension("txt");
5. Passing to C APIs โ convert to `CStr`/`CString` for FFI:
use std::ffi::CString;
let c = CString::new(path.to_str().unwrap()).unwrap();
What This Unlocks
- Portability โ code that handles `OsStr` correctly works on all platforms, including unusual file names.
- Correct CLI tools โ argument parsers, file walkers, and env-var readers that don't silently drop valid inputs.
- FFI safety โ understanding the `OsStr` โ `CStr` pipeline is essential for calling C filesystem APIs.
Key Differences
| Concept | OCaml | Rust |
|---|---|---|
| Native string | `string` (bytes, no encoding) | `OsStr` (platform-native) |
| Filesystem paths | `string` | `Path` / `PathBuf` (wraps `OsStr`) |
| UTF-8 check | Manual | `OsStr::to_str()` โ `Option<&str>` |
| Lossy display | Manual | `to_string_lossy()` โ `Cow<str>` |
// 492. OsStr and OsString
use std::ffi::{OsStr, OsString};
use std::path::Path;
use std::env;
fn main() {
// OsStr from &str (always works for valid UTF-8)
let os: &OsStr = OsStr::new("hello.txt");
println!("os_str: {:?}", os);
println!("to_str: {:?}", os.to_str()); // Some("hello.txt")
// OsString โ owned
let mut owned = OsString::from("hello");
owned.push(" world");
println!("owned: {:?}", owned);
// Path yields OsStr
let path = Path::new("/home/user/file.txt");
let ext: &OsStr = path.extension().unwrap();
println!("extension as OsStr: {:?}", ext);
println!("extension as &str: {:?}", ext.to_str());
// Environment variables return OsString (may be non-UTF-8 on Unix)
match env::var_os("HOME") {
Some(v) => println!("HOME={}", v.to_string_lossy()),
None => println!("HOME not set"),
}
match env::var("PATH") {
Ok(v) => println!("PATH starts: {}", &v[..v.len().min(50)]),
Err(e) => println!("PATH: {}", e),
}
// to_string_lossy: replaces non-UTF-8 bytes with U+FFFD
let os2 = OsString::from("valid utf-8");
let lossy = os2.to_string_lossy();
println!("lossy: {}", lossy);
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn test_osstr_roundtrip() { let s="hello"; let os=OsStr::new(s); assert_eq!(os.to_str(),Some(s)); }
#[test] fn test_path_ext() { let p=Path::new("f.rs"); assert_eq!(p.extension(),Some(OsStr::new("rs"))); }
#[test] fn test_os_string() { let s=OsString::from("hi"); assert_eq!(s.to_string_lossy(),"hi"); }
}
(* 492. OsStr โ OCaml note *)
(* OCaml uses UTF-8 strings on Unix; file system may have non-UTF-8 names *)
(* Demonstrate with Sys module *)
let () =
Printf.printf "os: %s\n" Sys.os_type;
Printf.printf "cwd: %s\n" (Sys.getcwd ());
(match Sys.getenv_opt "HOME" with
| Some h -> Printf.printf "HOME=%s\n" h
| None -> print_string "HOME not set\n");
(* Filename operations *)
let f = "/tmp/test.txt" in
Printf.printf "basename: %s\n" (Filename.basename f);
Printf.printf "dirname: %s\n" (Filename.dirname f);
Printf.printf "check_suffix .txt: %b\n" (Filename.check_suffix f ".txt")