jhorstmann commented on code in PR #9619:
URL: https://github.com/apache/arrow-rs/pull/9619#discussion_r3008791709
##########
parquet/src/column/writer/encoder.rs:
##########
@@ -325,24 +341,35 @@ impl<T: DataType> ColumnValueEncoder for
ColumnValueEncoderImpl<T> {
}
}
+// Get min and max values for all values in `iter`.
+//
+// For floating point we need to compare NaN values until we encounter a
non-NaN
+// value which then becomes the new min/max. After this, only non-NaN values
are
+// evaluated. If all values are NaN, then the min/max NaNs as determined by
+// IEEE 754 total order are returned.
Review Comment:
Agree, this method is on the hot path. I had a look at optimizing it, but
could not get the compiler to generate nice auto-vectorized code for
nan-handling yet. I think we can try optimizations in a followup, it would be
more important to get the semantics correct first and make sure there are tests
for edge cases.
<del>
In that regard, about this requirement
> If all values are NaN, then the min/max NaNs as determined by
// IEEE 754 total order are returned.
Does the current code correctly distinguish different NaN payloads according
to their sign and bit patterns?
</del>
(Solved, github was hiding the changes to `compare_greater` in `mod.rs`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]