[I] Spark iceberg runtime - predicate pushdown in parquet reader [iceberg]

via GitHub Fri, 28 Feb 2025 17:33:38 -0800


nateagr opened a new issue, #12428:
URL: https://github.com/apache/iceberg/issues/12428


   ### Query engine
   
   _No response_
   
   ### Question
   
   Hello!
   
   After migrating some of our parquet tables (in Hive) to Iceberg (still 
parquet), I've noticed that reading the new Iceberg tables with Spark is much 
slower (at least / 4) than reading from the initial parquet tables. I've been 
trying to understand why we see such slowdown and it seems that Iceberg don't 
push the predicates to the parquet reader. I've written a unit test where I 
read one of our new Iceberg table with Spark and I always see the NoOp row 
group filter in the parquet reader. However, when reading one of our initial 
parquet tables, I see a row group filter that actually filter row groups based 
on statistics, dictionaries ...
   Is my understanding correct? If yes, I've read many times that Iceberg 
supports predicate pushdown so when is it down? After reading the parquet 
files? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Spark iceberg runtime - predicate pushdown in parquet reader [iceberg]

Reply via email to