jackye1995 commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1496277335
########## open-api/rest-catalog-open-api.yaml: ########## @@ -3324,6 +3348,184 @@ components: type: integer format: int64 + BooleanTypeValue: + type: boolean + + IntegerTypeValue: + type: integer + + LongTypeValue: + type: integer + format: int64 + + FloatTypeValue: Review Comment: I did some experiments, I think the current SingleValueParser implementation has some problems. The Jackson writeNumber(float/double) method just writes the string form of the number by calling [Float/Double.toString()](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#toString-double-) method internally. The Java doc linked provides an explanation of how the string conversion is done. The conversion fundamentally have 2 problems, (1) data that is too big or too small (outside 10^-3 to 10^7 range) is written in scientific notation, and the JSON representation will be a string but not number. For example, `10^20` is written as `"1.0E20"`. (2) the result is lossy, because it is the nearest approximation to the true value. This is not serializing the float/double to the exact decimal representation as we discussed above. For example, there is a very small chance that value `3.0` is sometimes serialized as `2.99999999999999`, and when deserialized back it is probably still the 3.0 double value, but sometimes it will be just 2.99999999999999. This becomes a correctness issue for use cases like row-level filtering, where user can define a filter against a double like `a < 3` and that can produce unexpected result. We actually saw this exact issue in the past in LakeFormation row-level filtering with Athena, so I suggest us be very cautious here. In general, I think achieving the true decimal representation will be actually more spacial and computationally intensive than just storing the binary representation. We can easily store the binary form of a double by [Double.doubleToRawLongBits](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#doubleToRawLongBits-double-) to store a long value in the serialized form, and deserialize it back using the reverse `longBitsToDouble` method. I think we should consider using this approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org