mudit-97 opened a new pull request, #9479: URL: https://github.com/apache/iceberg/pull/9479
Hi all, We are planning to onboard Iceberg as our Tableformat stack in our Data Platform. For that purpose, we were doing some benchmarking between Hudi and Iceberg vs Vanilla Parquet based Dataframe reads in spark and we found out that Iceberg is not pushing the filters to the parquet layer and it is relying on row group based filtering which is a customized pattern for Iceberg However we saw that amount of data read will be higher in that case, we tried to push down filters till the parquet layer also and we saw benefits in some queries. We wanted to know the comments on this PR whether this makes sense to the community or should this be avoided and rely on only row group filtering for Iceberg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org