ExamplesBy LevelBy TopicLearning Paths
164 Advanced

Number Parser

Functional Programming

Tutorial

The Problem

Floating-point numbers in text formats (JSON, CSV, scientific data) require parsing optional sign, integer digits, optional decimal point and fractional digits, and optional exponent notation (1.5e-10). Each component is optional or required in a specific combination. This example builds a full floating-point parser using combinators, demonstrating how complex lexical rules reduce to composed simple rules with clear, testable components.

🎯 Learning Outcomes

  • • Build a complete floating-point parser with sign, integral, fractional, and exponent parts
  • • Learn how opt and many1 combine to handle optional and required components
  • • Understand the string-then-convert pattern: collect the number string, then call str::parse
  • • See how combinator parsers map directly to BNF grammar rules
  • Code Example

    fn float_string<'a>() -> Parser<'a, &'a str> {
        Box::new(|input: &'a str| {
            let bytes = input.as_bytes();
            let mut pos = 0;
            if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
            while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
            // ... decimal, exponent ...
            Ok((&input[..pos], &input[pos..]))
        })
    }

    Key Differences

  • Precision shortcut: OCaml's take_while1 + float_of_string is concise but permissive; Rust's combinator parser is strict but verbose.
  • Exception vs. Result: OCaml's float_of_string raises Failure on invalid input; Rust's str::parse::<f64>() returns Result, propagated via ?.
  • Buffer efficiency: OCaml's take_while1 works directly on the buffer; Rust's combinator version collects Vec<char> before converting.
  • Locale: Both use the C locale for decimal parsing (. as decimal separator); locale-aware parsing requires additional handling.
  • OCaml Approach

    Angstrom provides a direct approach:

    let number =
      take_while1 (fun c -> Char.is_digit c || c = '.' || c = 'e' || c = 'E'
                            || c = '+' || c = '-')
      >>| float_of_string
    

    This is a common shortcut, though it accepts invalid strings like "1.2.3" that float_of_string rejects with an exception. A stricter combinator parser follows the BNF more closely.

    Full Source

    //! # Number Parser
    //!
    //! Parse integers and floats from `&str` with validation and error handling.
    //!
    //! Four approaches mirror the OCaml source:
    //!   * [`parse_int_safe`] — delegate to the standard library's `str::parse`.
    //!   * [`parse_int_custom`] — scan digits ourselves, rejecting any non-digit.
    //!   * [`parse_int_with_sign`] — extend the custom scanner with an optional `+`/`-` prefix.
    //!   * [`parse_float_safe`] — standard-library float parsing, same shape as the integer version.
    //!
    //! Each function returns `Result<T, String>` so callers can distinguish success from
    //! a malformed input and get a human-readable message back.
    
    /// Parse an unsigned decimal integer using the standard library.
    ///
    /// Returns `Err` with a human-readable message if the input is not a valid `i64`.
    pub fn parse_int_safe(s: &str) -> Result<i64, String> {
        s.parse::<i64>()
            .map_err(|_| format!("Not a valid integer: {s}"))
    }
    
    /// Parse an unsigned decimal integer by scanning each character.
    ///
    /// Any non-digit (including a leading sign or a trailing letter) is rejected.
    /// An empty input is also rejected, matching the OCaml version.
    pub fn parse_int_custom(s: &str) -> Result<i64, String> {
        if s.is_empty() || !s.bytes().all(|b| b.is_ascii_digit()) {
            return Err(format!("Invalid characters: {s}"));
        }
        s.bytes()
            .try_fold(0i64, |acc, b| {
                acc.checked_mul(10)?.checked_add(i64::from(b - b'0'))
            })
            .ok_or_else(|| format!("Invalid characters: {s}"))
    }
    
    /// Parse a decimal integer with an optional leading `+` or `-`.
    pub fn parse_int_with_sign(s: &str) -> Result<i64, String> {
        let (sign, digits) = match s.as_bytes().first() {
            Some(b'-') => (-1, &s[1..]),
            Some(b'+') => (1, &s[1..]),
            _ => (1, s),
        };
    
        if digits.is_empty() || !digits.bytes().all(|b| b.is_ascii_digit()) {
            return Err(if sign == -1 {
                format!("Invalid negative number: {s}")
            } else {
                format!("Invalid positive number: {s}")
            });
        }
    
        parse_int_custom(digits).map(|n| sign * n).map_err(|_| {
            if sign == -1 {
                format!("Invalid negative number: {s}")
            } else {
                format!("Invalid positive number: {s}")
            }
        })
    }
    
    /// Parse a floating-point number using the standard library.
    pub fn parse_float_safe(s: &str) -> Result<f64, String> {
        s.parse::<f64>()
            .map_err(|_| format!("Not a valid float: {s}"))
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn int_safe_accepts_valid_input() {
            assert_eq!(parse_int_safe("42"), Ok(42));
            assert_eq!(parse_int_safe("-17"), Ok(-17));
            assert_eq!(parse_int_safe("0"), Ok(0));
        }
    
        #[test]
        fn int_safe_rejects_invalid_input() {
            assert_eq!(
                parse_int_safe("abc"),
                Err("Not a valid integer: abc".to_string())
            );
            assert_eq!(parse_int_safe(""), Err("Not a valid integer: ".to_string()));
            assert!(parse_int_safe("1.5").is_err());
        }
    
        #[test]
        fn int_custom_accepts_only_digits() {
            assert_eq!(parse_int_custom("123"), Ok(123));
            assert_eq!(parse_int_custom("0"), Ok(0));
        }
    
        #[test]
        fn int_custom_rejects_non_digits() {
            assert_eq!(
                parse_int_custom("12a3"),
                Err("Invalid characters: 12a3".to_string())
            );
            assert_eq!(
                parse_int_custom(""),
                Err("Invalid characters: ".to_string())
            );
            assert!(
                parse_int_custom("-5").is_err(),
                "sign belongs to parse_int_with_sign"
            );
        }
    
        #[test]
        fn int_with_sign_handles_prefixes() {
            assert_eq!(parse_int_with_sign("+5"), Ok(5));
            assert_eq!(parse_int_with_sign("-5"), Ok(-5));
            assert_eq!(parse_int_with_sign("5"), Ok(5));
        }
    
        #[test]
        fn int_with_sign_reports_direction() {
            assert_eq!(
                parse_int_with_sign("-abc"),
                Err("Invalid negative number: -abc".to_string())
            );
            assert_eq!(
                parse_int_with_sign("+abc"),
                Err("Invalid positive number: +abc".to_string())
            );
            assert_eq!(
                parse_int_with_sign("abc"),
                Err("Invalid positive number: abc".to_string())
            );
        }
    
        #[test]
        fn float_safe_accepts_valid_input() {
            assert_eq!(parse_float_safe("2.5"), Ok(2.5));
            assert_eq!(parse_float_safe("-2.0"), Ok(-2.0));
            assert_eq!(parse_float_safe("1e10"), Ok(1e10));
        }
    
        #[test]
        fn float_safe_rejects_invalid_input() {
            assert_eq!(
                parse_float_safe("abc"),
                Err("Not a valid float: abc".to_string())
            );
        }
    
        #[test]
        fn int_custom_detects_overflow() {
            // 2^63 = 9223372036854775808, one past i64::MAX.
            assert!(parse_int_custom("9223372036854775808").is_err());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn int_safe_accepts_valid_input() {
            assert_eq!(parse_int_safe("42"), Ok(42));
            assert_eq!(parse_int_safe("-17"), Ok(-17));
            assert_eq!(parse_int_safe("0"), Ok(0));
        }
    
        #[test]
        fn int_safe_rejects_invalid_input() {
            assert_eq!(
                parse_int_safe("abc"),
                Err("Not a valid integer: abc".to_string())
            );
            assert_eq!(parse_int_safe(""), Err("Not a valid integer: ".to_string()));
            assert!(parse_int_safe("1.5").is_err());
        }
    
        #[test]
        fn int_custom_accepts_only_digits() {
            assert_eq!(parse_int_custom("123"), Ok(123));
            assert_eq!(parse_int_custom("0"), Ok(0));
        }
    
        #[test]
        fn int_custom_rejects_non_digits() {
            assert_eq!(
                parse_int_custom("12a3"),
                Err("Invalid characters: 12a3".to_string())
            );
            assert_eq!(
                parse_int_custom(""),
                Err("Invalid characters: ".to_string())
            );
            assert!(
                parse_int_custom("-5").is_err(),
                "sign belongs to parse_int_with_sign"
            );
        }
    
        #[test]
        fn int_with_sign_handles_prefixes() {
            assert_eq!(parse_int_with_sign("+5"), Ok(5));
            assert_eq!(parse_int_with_sign("-5"), Ok(-5));
            assert_eq!(parse_int_with_sign("5"), Ok(5));
        }
    
        #[test]
        fn int_with_sign_reports_direction() {
            assert_eq!(
                parse_int_with_sign("-abc"),
                Err("Invalid negative number: -abc".to_string())
            );
            assert_eq!(
                parse_int_with_sign("+abc"),
                Err("Invalid positive number: +abc".to_string())
            );
            assert_eq!(
                parse_int_with_sign("abc"),
                Err("Invalid positive number: abc".to_string())
            );
        }
    
        #[test]
        fn float_safe_accepts_valid_input() {
            assert_eq!(parse_float_safe("2.5"), Ok(2.5));
            assert_eq!(parse_float_safe("-2.0"), Ok(-2.0));
            assert_eq!(parse_float_safe("1e10"), Ok(1e10));
        }
    
        #[test]
        fn float_safe_rejects_invalid_input() {
            assert_eq!(
                parse_float_safe("abc"),
                Err("Not a valid float: abc".to_string())
            );
        }
    
        #[test]
        fn int_custom_detects_overflow() {
            // 2^63 = 9223372036854775808, one past i64::MAX.
            assert!(parse_int_custom("9223372036854775808").is_err());
        }
    }

    Deep Comparison

    Comparison: Example 164 — Number Parser

    Imperative scanner

    OCaml:

    let float_string : string parser = fun input ->
      let buf = Buffer.create 16 in
      let pos = ref 0 in
      let len = String.length input in
      if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
        Buffer.add_char buf input.[!pos]; incr pos end;
      while !pos < len && is_digit input.[!pos] do
        Buffer.add_char buf input.[!pos]; incr pos done;
      (* ... decimal, exponent ... *)
      Ok (Buffer.contents buf, String.sub input !pos (len - !pos))
    

    Rust:

    fn float_string<'a>() -> Parser<'a, &'a str> {
        Box::new(|input: &'a str| {
            let bytes = input.as_bytes();
            let mut pos = 0;
            if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
            while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
            // ... decimal, exponent ...
            Ok((&input[..pos], &input[pos..]))
        })
    }
    

    String to float conversion

    OCaml:

    float_of_string "3.14"  (* 3.14 *)
    

    Rust:

    "3.14".parse::<f64>()  // Ok(3.14)
    

    Exercises

  • Add exponent parsing: "1.5e-10", "2.0E+3" should parse correctly.
  • Implement a strict JSON number parser that rejects leading zeros ("01" is invalid in JSON).
  • Write a parser for rational numbers in the form "3/4"(3, 4) as a pair of integers.
  • Open Source Repos