Dear R Core Team and R-devel Community,
I hope this message finds you well. I am writing to propose an enhancement to
the `as.character()` function in R's base package to address an inconsistency
with `as.numeric()` when handling high-precision floating-point numbers. This
issue has practical implications for code reliability, especially in scientific
computing and data analysis, and I believe a small adjustment could align the
behavior more closely with modern user expectations and R's evolving use cases.
Problem Description
The current behavior of `as.character()` and `as.numeric()` leads to logical
inconsistencies when converting high-precision decimal strings. For example,
consider the string `"7.999999999999999111822"` (22 significant digits):
- `as.numeric("7.999999999999999111822")` converts this to a double-precision
floating-point number (per IEEE 754), which is stored as approximately
`7.9999999999999991118` (verifiable with `print(x, digits = 20)`). The
difference from 8 (`8 - x ≈ 8.88178e-16`) is slightly greater than half the
machine epsilon (`0.5 * .Machine$double.eps ≈ 1.11e-16`), so it is not rounded
to `8.0`.
- However, `as.character(as.numeric("7.999999999999999111822"))` returns `"8"`,
simplifying the value and losing the small difference. This leads to a
mismatch: `x < 8` is `TRUE`, but `as.numeric(as.character(x)) == 8` is also
`TRUE`.
This inconsistency arises because `as.numeric()` preserves the precision of the
IEEE 754 double (up to ~15-17 decimal digits), while `as.character()` defaults
to a human-readable simplification, often rounding to the nearest integer when
the difference is below its internal display threshold.
Proposed Solution
I suggest either of the following enhancements to improve consistency:
1. Swap the Functionality of `format()` and `as.character()`:
- Redefine `as.character(x)` to inherit `format()`'s behavior,
providing a default precision (e.g., `digits = 17`) to match the effective
decimal precision of double-precision floats. This would output
`"7.99999999999999911"` for the example above.
- Redefine `format(x)` to inherit `as.character()`'s current
behavior, serving as a utility for concise, human-readable output (e.g., `"8"`).
- Naming would then align with intent: `as.character()` for type
conversion with precision, `format()` for formatting adjustments.
2. Add a `digits` Parameter to `as.character()`:
- Extend `as.character()` to accept a `digits` argument
(defaulting to `NULL` for current behavior, or e.g., `17` for precision
matching). Example:
x <- as.numeric("7.999999999999999111822")
as.character(x, digits = 17) # "7.99999999999999911"
as.character(x)
# "8" (current default)
- This would allow users to opt for precise conversion while
preserving backward compatibility.
Rationale
- Consistency: `as.numeric()` and `as.character()` are similarly named base
functions, suggesting they should follow analogous precision rules. The current
discrepancy violates the expectation of round-trip consistency (string →
numeric → string).
- Modern Use Cases: With R's growing use in scientific computing and data
science, high-precision handling is increasingly critical. The proposed change
aligns R with tools like Python (`str(float(x))` retains more precision) and
NumPy.
- User Experience: Explicit control via `digits` or a redefined
`as.character()` would reduce confusion, especially for users relying on type
conversion for logical operations.
Use Case
Consider a data validation script:
s1 <- "7.999999999999999111822"
x <- as.numeric(s1)
if (x < 8) print("Less than 8") # TRUE, correct
if (as.numeric(as.character(x)) == 8) print("Equal to 8") # TRUE,
inconsistent
The second condition fails due to `as.character(x)` simplifying to `"8"`. With
the proposed change (e.g., `as.character(x, digits = 17)`), both conditions
would align with the stored value (`< 8`).
Implementation Considerations
- Backward Compatibility: Option 2 (adding `digits`) is less disruptive,
allowing existing code to use the default `as.character()` behavior. Option 1
requires a transition period or deprecation notice.
- Performance: High-precision formatting may have minor overhead, but this is
negligible for modern hardware.
- Documentation: Clear guidance on the new `digits` parameter or redefined
roles would be essential.
Next Steps
I would be happy to assist with testing or drafting a patch if this proposal
gains traction. Please let me know your thoughts or any additional
considerations. This issue was identified with the help of Grok (xAI), and I
believe community feedback could refine the approach.
Thank you for your time and the incredible work on R!
Best regards
龙华
[email protected]
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel