kosiew commented on code in PR #20268:
URL: https://github.com/apache/datafusion/pull/20268#discussion_r3153507212
##########
datafusion/spark/src/function/string/format_string.rs:
##########
@@ -2332,6 +2381,24 @@ impl FloatBits for f64 {
}
}
+/// Inserts thousands separators (`,`) into the integer part of a numeric
string.
+/// For example, `"1234567.89"` becomes `"1,234,567.89"`.
+fn insert_thousands_separator(number: &str) -> String {
Review Comment:
`insert_thousands_separator` is a really nice addition. It feels like
something we could reuse more broadly. I noticed `format_unsigned` still has
its own inline grouping logic for decimal integers. Maybe that call site could
switch over to this helper as well? It would reduce duplication and keep
grouping behavior consistent across numeric formatters.
##########
datafusion/spark/src/function/string/format_string.rs:
##########
@@ -2385,4 +2454,225 @@ mod tests {
Ok(())
}
+
+ #[test]
+ fn test_insert_thousands_separator() {
+ assert_eq!(insert_thousands_separator("1234567.89"), "1,234,567.89");
+ assert_eq!(insert_thousands_separator("123.45"), "123.45");
+ assert_eq!(insert_thousands_separator("1234"), "1,234");
+ assert_eq!(insert_thousands_separator("12"), "12");
+ assert_eq!(insert_thousands_separator("0.5"), "0.5");
+ assert_eq!(
+ insert_thousands_separator("1234567890.1234"),
+ "1,234,567,890.1234"
+ );
+ assert_eq!(insert_thousands_separator("1000"), "1,000");
+ assert_eq!(insert_thousands_separator("100"), "100");
+ }
+
+ #[test]
+ fn test_grouping_separator_float() -> Result<()> {
Review Comment:
Would it be worth adding a couple more coverage cases around width and sign
interactions with the new flag? Something like `%0,10.2f` and `%(,10.2f`, plus
the decimal equivalents. The main grouping and scientific cases are well
covered, but padding and sign ordering are often where formatter behavior can
get a bit tricky.
##########
datafusion/spark/src/function/string/format_string.rs:
##########
@@ -1690,6 +1690,23 @@ impl ConversionSpecifier {
}
fn format_float(&self, writer: &mut String, value: f64) -> Result<()> {
+ // Java/Spark reject the grouping separator flag with scientific
notation
Review Comment:
This new ',' + scientific notation validation looks good, but I noticed the
same logic now exists in both `format_float` and `format_decimal`. Would it
make sense to pull this into a small helper on `ConversionSpecifier`? That way
the Spark or Java compatibility rule lives in one place and it reduces the risk
of the float and decimal paths drifting over time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]