huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761680652
@atifiu File statistics are not accurate and can't be used any more if you use filters. For example, you have table (col int), the max of col is 100, and the min is 0, so the statistics file is ``` max min 100 1 ``` If you have `SELECT MAX(col) FROM table`, we can check the statistics file and simple return 100, but if you have `SELECT MAX(col) FROM table WHERE col < 70`, we can't use the statistics file any more. We only know that the `MAX(col)` is smaller than 70, but we have no idea what value it is, so have to compute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org