omerhadari commented on issue #933:
URL: https://github.com/apache/iceberg-rust/issues/933#issuecomment-2639525990

   Thank you @liurenjie1024 for the elaboration! Is this an issue in the Java 
implementation as well, or does it have a way to express functions?
   
   Copying a comment from my PR because maybe it makes more sense to discuss in 
the issue. Note the point about how `CAST` expressions are handled, I think 
this bug is a bit more worrying because it can potentially cause incorrect 
query results, not just slower runtimes.
   
   Regarding your suggested alternative calculation, this is actually what I 
did on my part to work around the issue, but didn't want to implement here 
because I'm new to the project and did not know if this is too workaround-y.
   
   Here is my comment from the PR itself:
   
   I wanted to ask, is there a way to express function within iceberg 
predicates? Is this even desired? The reason this could be beneficial is that 
sometimes you need access to the column value and then you could perform much 
better manifest elimination. A few examples I have in this context:
   
   * `TO_DATE` essentially converts the column to Timestamp, and then truncates 
to the nearest day. I cannot easily do that in the context of generating the 
predicate
   * `TO_TIMESTAMP` accepts format for strings, but I see no way to pass the 
format inside the predicate and use it correctly.
   
   This also reveals what I think is a bug. In `datafusion` (as well as many 
engines) when you cast for example a string to a `DATE`, it truncates to the 
nearest day. Currently in the conversion function - the expression is simply 
extracted from the `Cast`.
   
   See here:
   <img width="225" alt="image" 
src="https://github.com/user-attachments/assets/e96403c7-95f7-4990-89b5-596aee758027";
 />
   
   If I understand correctly, this could cause wrong results for example for 
the query 
   `SELECT * FROM table WHERE date_col > CAST('2025-01-01T00:10:00' AS DATE)`
   
   would result in the predicate `date_col > '2025-01-01T00:10:00'` which will 
filter out data files where `2025-01-01T00:00:00 < date_col < 
2025-01-01T00:10:00` even though they are supposed to be included.
   
   Would appreciate some guidance about how to tackle this issue of propagating 
more information, I don't think it makes sense in the scope of this PR but 
maybe I am missing something basic.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to