jackye1995 commented on code in PR #9717:
URL: https://github.com/apache/iceberg/pull/9717#discussion_r1496277335


##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3324,6 +3348,184 @@ components:
           type: integer
           format: int64
 
+    BooleanTypeValue:
+      type: boolean
+
+    IntegerTypeValue:
+      type: integer
+
+    LongTypeValue:
+      type: integer
+      format: int64
+
+    FloatTypeValue:

Review Comment:
   I did some experiments, I think the current SingleValueParser implementation 
has some problems.
   
   The Jackson writeNumber(float/double) method just writes the string form of 
the number by calling 
[Float/Double.toString()](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#toString-double-)
 method internally. The Java doc linked provides an explanation of how the 
string conversion is done.
   
   The conversion fundamentally have 2 problems, (1) data that is too big or 
too small (outside 10^-3 to 10^7 range) is written in scientific notation, and 
the JSON representation will be a string but not number. For example, `10^20` 
is written as `"1.0E20"`. (2) the result is lossy, because it is the nearest 
approximation to the true value. This is not serializing the float/double to 
the exact decimal representation as we discussed above.
   
   For example, there is a very small chance that value `3.0` is sometimes 
serialized as `2.99999999999999`, and when deserialized back it is probably 
still the 3.0 double value, but sometimes it will be just 2.99999999999999. 
This becomes a correctness issue for use cases like row-level filtering, where 
user can define a filter against a double like `a < 3` and that can produce 
unexpected result. We actually saw this exact issue in the past in 
LakeFormation row-level filtering with Athena, so I suggest us be very cautious 
here.
   
   In general, I think achieving the true decimal representation will be 
actually more spacial and computationally intensive than just storing the 
binary representation. We can easily store the binary form of a double by 
[Double.doubleToRawLongBits](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#doubleToRawLongBits-double-)
 to store a long value in the serialized form, and deserialize it back using 
the reverse `longBitsToDouble` method. I think we should consider using this 
approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to