[PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

via GitHub Mon, 15 Jan 2024 23:56:03 -0800


mudit-97 opened a new pull request, #9479:
URL: https://github.com/apache/iceberg/pull/9479


   Hi all,
   
   We are planning to onboard Iceberg as our Tableformat stack in our Data 
Platform. For that purpose, we were doing some benchmarking between Hudi and 
Iceberg vs Vanilla Parquet based Dataframe reads in spark and we found out that 
Iceberg is not pushing the filters to the parquet layer and it is relying on 
row group based filtering which is a customized pattern for Iceberg
   
   However we saw that amount of data read will be higher in that case, we 
tried to push down filters till the parquet layer also and we saw benefits in 
some queries.
   
   We wanted to know the comments on this PR whether this makes sense to the 
community or should this be avoided and rely on only row group filtering for 
Iceberg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

Reply via email to