Re: [PR] feat: Expression system. [iceberg-rust]

via GitHub Tue, 26 Dec 2023 00:46:51 -0800


Fokko commented on code in PR #132:
URL: https://github.com/apache/iceberg-rust/pull/132#discussion_r1436335985



##########
crates/iceberg/src/expr/mod.rs:
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module contains expressions used in apache iceberg.
+
+mod bound;
+pub use bound::*;
+mod unbound;
+pub use unbound::*;
+
+/// Operators used in expressions.
+#[allow(missing_docs)]
+pub enum Operator {

Review Comment:
   Sorry for the short review last time, let me give a bit more context here. I 
hope it helps to understand my concern.
   
   The main challenge with using statistics to prune datafiles is that they are 
truncated. This is both the case for Iceberg statistics and Parquet statistics 
as well. By default on Iceberg a `truncate[16]` is applied to keep the 
manifests file sizes within reasonable bounds without losing the ability to 
prune string columns (strings are the main one, but could apply to [other types 
as well](https://iceberg.apache.org/spec/#truncate-transform-details)).
   
   If we start pruning based on two columns we need to have a big test suite to 
make sure that everything works correctly (there are a lot of edge cases such 
as UTF8) overflow. I also don't think that comparing two columns is the most 
commonly used scenario, so I'd rather postpone it for now.
   
   What I think that's going to be asked for in the foreseeable future is 
applying transforms, such as `cast(created_at AS date) == '2021-01-01'`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Expression system. [iceberg-rust]

Reply via email to