rdblue commented on code in PR #9008:
URL: https://github.com/apache/iceberg/pull/9008#discussion_r1520384546


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -595,11 +613,17 @@ private static String sanitizeString(CharSequence value, 
long now, int today) {
         Literal<Integer> date = Literal.of(value).to(Types.DateType.get());
         return sanitizeDate(date.value(), today);
       } else if (TIMESTAMP.matcher(value).matches()) {
-        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.withoutZone());
-        return sanitizeTimestamp(ts.value(), now);
+        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.microsWithoutZone());
+        return sanitizeTimestamp(Types.TimestampType.Unit.MICROS, ts.value(), 
now);
+      } else if (TIMESTAMPNS.matcher(value).matches()) {
+        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.nanosWithoutZone());
+        return sanitizeTimestamp(Types.TimestampType.Unit.NANOS, ts.value(), 
now);
       } else if (TIMESTAMPTZ.matcher(value).matches()) {
-        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.withZone());
-        return sanitizeTimestamp(ts.value(), now);
+        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.microsWithZone());
+        return sanitizeTimestamp(Types.TimestampType.Unit.MICROS, ts.value(), 
now);
+      } else if (TIMESTAMPTZNS.matcher(value).matches()) {
+        Literal<Long> ts = 
Literal.of(value).to(Types.TimestampType.nanosWithZone());
+        return sanitizeTimestamp(Types.TimestampType.Unit.NANOS, ts.value(), 
now);

Review Comment:
   I don't quite understand the need to make this change. The `TIMESTAMP` 
matcher already accepts 9 digits and then converts to micros, ignoring 
nanosecond values. This PR now distinguishes between `TIMESTAMP` (max precision 
6) and `TIMESTAMP_NS` (precision 7 to 9) but in the end, the values are parsed 
and the nanosecond component is discarded using `nanosToMicros`.
   
   Why go to this trouble? Can't these values be parsed into a microsecond 
timestamp and then sanitized without making changes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to