omerhadari commented on issue #933: URL: https://github.com/apache/iceberg-rust/issues/933#issuecomment-2639525990
Thank you @liurenjie1024 for the elaboration! Is this an issue in the Java implementation as well, or does it have a way to express functions? Copying a comment from my PR because maybe it makes more sense to discuss in the issue. Note the point about how `CAST` expressions are handled, I think this bug is a bit more worrying because it can potentially cause incorrect query results, not just slower runtimes. Regarding your suggested alternative calculation, this is actually what I did on my part to work around the issue, but didn't want to implement here because I'm new to the project and did not know if this is too workaround-y. Here is my comment from the PR itself: I wanted to ask, is there a way to express function within iceberg predicates? Is this even desired? The reason this could be beneficial is that sometimes you need access to the column value and then you could perform much better manifest elimination. A few examples I have in this context: * `TO_DATE` essentially converts the column to Timestamp, and then truncates to the nearest day. I cannot easily do that in the context of generating the predicate * `TO_TIMESTAMP` accepts format for strings, but I see no way to pass the format inside the predicate and use it correctly. This also reveals what I think is a bug. In `datafusion` (as well as many engines) when you cast for example a string to a `DATE`, it truncates to the nearest day. Currently in the conversion function - the expression is simply extracted from the `Cast`. See here: <img width="225" alt="image" src="https://github.com/user-attachments/assets/e96403c7-95f7-4990-89b5-596aee758027" /> If I understand correctly, this could cause wrong results for example for the query `SELECT * FROM table WHERE date_col > CAST('2025-01-01T00:10:00' AS DATE)` would result in the predicate `date_col > '2025-01-01T00:10:00'` which will filter out data files where `2025-01-01T00:00:00 < date_col < 2025-01-01T00:10:00` even though they are supposed to be included. Would appreciate some guidance about how to tackle this issue of propagating more information, I don't think it makes sense in the scope of this PR but maybe I am missing something basic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org