nateagr opened a new issue, #12428: URL: https://github.com/apache/iceberg/issues/12428
### Query engine _No response_ ### Question Hello! After migrating some of our parquet tables (in Hive) to Iceberg (still parquet), I've noticed that reading the new Iceberg tables with Spark is much slower (at least / 4) than reading from the initial parquet tables. I've been trying to understand why we see such slowdown and it seems that Iceberg don't push the predicates to the parquet reader. I've written a unit test where I read one of our new Iceberg table with Spark and I always see the NoOp row group filter in the parquet reader. However, when reading one of our initial parquet tables, I see a row group filter that actually filter row groups based on statistics, dictionaries ... Is my understanding correct? If yes, I've read many times that Iceberg supports predicate pushdown so when is it down? After reading the parquet files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org