rdblue commented on code in PR #9008:
URL: https://github.com/apache/iceberg/pull/9008#discussion_r1558261406


##########
api/src/main/java/org/apache/iceberg/transforms/Bucket.java:
##########
@@ -54,6 +54,7 @@ static <T, B extends Bucket<T> & SerializableFunction<T, 
Integer>> B get(
         return (B) new BucketInteger(numBuckets);
       case TIME:
       case TIMESTAMP:
+      case TIMESTAMP_NANO:

Review Comment:
   I think this is going to be a problem. We will very likely allow type 
promotion from `timestamp` to `timestamp_ns`. If a table were bucketed by a 
timestamp column (which is allowed by the spec) then existing partition values 
would no longer be correct. For other type promotion cases, the spec aligns the 
hash function (see Appendix A). For example, `bucket(int)` is defined as `i -> 
bucket((long) i)` so that the values match.
   
   I don't think that we can do that in this case because the hash function is 
already defined for the lower-precision type (micros). We could potentially map 
1,000 values in nanos to the same bucket. That doesn't seem like a good way to 
distribute data, but we may need to compromise to avoid a problem here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to