๐Ÿฆ€ Functional Rust

492: OsStr and OsString

Difficulty: 1 Level: Intermediate Platform-native strings that may not be UTF-8 โ€” the correct types for file paths and environment variables.

The Problem This Solves

File paths and environment variables are not UTF-8 strings on all operating systems. On Windows, paths are UTF-16 and can contain sequences that aren't valid Unicode. On Unix, paths are arbitrary byte sequences โ€” only `/` and null are special. A file named with arbitrary bytes is legal. If you read a file path into a `String`, Rust will panic or return an error if the path contains non-UTF-8 bytes. That's correct behavior โ€” but it means your program can't handle all valid paths on the OS it's running on. `OsStr` and `OsString` are Rust's bridge types: they hold the OS-native representation losslessly. You pass them to filesystem APIs. When you need to display or manipulate them as text, you explicitly convert โ€” and decide what to do if they're not valid UTF-8.

The Intuition

A post box that accepts any parcel the postal service can deliver, not just parcels you've opened and verified. `OsString` holds whatever the OS hands you. `String` is what you get after you've unwrapped it and confirmed the contents are text. You convert at the boundary, deliberately.

How It Works in Rust

1. Receiving from the OS โ€” `std::env` and `std::fs` return `OsStr`/`OsString`:
use std::ffi::{OsStr, OsString};

let path: &OsStr = std::path::Path::new("/tmp/data").as_os_str();
let var: OsString = std::env::var_os("PATH").unwrap();
2. Convert to `&str` (may fail):
if let Some(s) = path.to_str() {
    println!("utf-8 path: {}", s);
}
3. Lossy conversion โ€” always succeeds, replaces bad bytes with `U+FFFD`:
let display = var.to_string_lossy(); // Cow<str>
4. Building paths โ€” use `Path` and `PathBuf` which wrap `OsStr`/`OsString`:
let mut p = std::path::PathBuf::from("/home/user");
p.push("documents");
p.set_extension("txt");
5. Passing to C APIs โ€” convert to `CStr`/`CString` for FFI:
use std::ffi::CString;
let c = CString::new(path.to_str().unwrap()).unwrap();

What This Unlocks

Key Differences

ConceptOCamlRust
Native string`string` (bytes, no encoding)`OsStr` (platform-native)
Filesystem paths`string``Path` / `PathBuf` (wraps `OsStr`)
UTF-8 checkManual`OsStr::to_str()` โ†’ `Option<&str>`
Lossy displayManual`to_string_lossy()` โ†’ `Cow<str>`
// 492. OsStr and OsString
use std::ffi::{OsStr, OsString};
use std::path::Path;
use std::env;

fn main() {
    // OsStr from &str (always works for valid UTF-8)
    let os: &OsStr = OsStr::new("hello.txt");
    println!("os_str: {:?}", os);
    println!("to_str: {:?}", os.to_str()); // Some("hello.txt")

    // OsString โ€” owned
    let mut owned = OsString::from("hello");
    owned.push(" world");
    println!("owned: {:?}", owned);

    // Path yields OsStr
    let path = Path::new("/home/user/file.txt");
    let ext: &OsStr = path.extension().unwrap();
    println!("extension as OsStr: {:?}", ext);
    println!("extension as &str:  {:?}", ext.to_str());

    // Environment variables return OsString (may be non-UTF-8 on Unix)
    match env::var_os("HOME") {
        Some(v) => println!("HOME={}", v.to_string_lossy()),
        None    => println!("HOME not set"),
    }

    match env::var("PATH") {
        Ok(v)  => println!("PATH starts: {}", &v[..v.len().min(50)]),
        Err(e) => println!("PATH: {}", e),
    }

    // to_string_lossy: replaces non-UTF-8 bytes with U+FFFD
    let os2 = OsString::from("valid utf-8");
    let lossy = os2.to_string_lossy();
    println!("lossy: {}", lossy);
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test] fn test_osstr_roundtrip() { let s="hello"; let os=OsStr::new(s); assert_eq!(os.to_str(),Some(s)); }
    #[test] fn test_path_ext()        { let p=Path::new("f.rs"); assert_eq!(p.extension(),Some(OsStr::new("rs"))); }
    #[test] fn test_os_string()       { let s=OsString::from("hi"); assert_eq!(s.to_string_lossy(),"hi"); }
}
(* 492. OsStr โ€“ OCaml note *)
(* OCaml uses UTF-8 strings on Unix; file system may have non-UTF-8 names *)
(* Demonstrate with Sys module *)
let () =
  Printf.printf "os: %s\n" Sys.os_type;
  Printf.printf "cwd: %s\n" (Sys.getcwd ());
  (match Sys.getenv_opt "HOME" with
   | Some h -> Printf.printf "HOME=%s\n" h
   | None -> print_string "HOME not set\n");
  (* Filename operations *)
  let f = "/tmp/test.txt" in
  Printf.printf "basename: %s\n" (Filename.basename f);
  Printf.printf "dirname:  %s\n" (Filename.dirname f);
  Printf.printf "check_suffix .txt: %b\n" (Filename.check_suffix f ".txt")