Fokko commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1423535271
########## crates/iceberg/src/spec/transform.rs: ########## @@ -126,17 +126,20 @@ pub enum Transform { impl Transform { /// Get the return type of transform given the input type. /// Returns `None` if it can't be transformed. - pub fn result_type(&self, input_type: &Type) -> Option<Type> { + pub fn result_type(&self, input_type: &Type) -> Result<Type> { Review Comment: I think there is a difference, let me explain! In Iceberg it is possible to have your custom transform, for example if you want to do something very specific, or you want to influence the distribution when it comes to partitioning. It can be that the transform is not available in the client, and then we cast it to an `UnknownTransform`: https://github.com/apache/iceberg-python/blob/8c8abb5c4c258e32941110a9ce0938e1328290b3/pyiceberg/transforms.py#L688 What this essentially means is that we don't know how it is transformed, and we cannot apply speedups. For example, `bucket[16](34) = 3` then we know in which bucket the data is that we're looking for. But if there is some custom partitioning then we just have to look through all the buckets. To this specific case of the type, in PyIceberg we return a `StringType()`, that's borrowed from Java where you just cast it to a string. I don't think that's used anywhere since we don't call this on the `UnknownTransform`. The Type cannot be None, so I think having `Result` is the right thing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org