Re: [PR] Spark 4.0: Make `SparkBatch.createReaderFactory` customizable [iceberg]

via GitHub Tue, 01 Jul 2025 20:38:31 -0700


zhztheplayer commented on PR #13433:
URL: https://github.com/apache/iceberg/pull/13433#issuecomment-3026256229


   @pvary 
   
   > The DeleteFilter is pushed down in the Parquet vectorized reader case.
   
   Yes, it seemed like so, but based on what I can see, the filter is still 
processed within row-based manner in the columnar reader code:
   
   
https://github.com/apache/iceberg/blob/f24f0c093e55743c291b5835cd19931d1ca787d0/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java#L110-L135
   
   Gluten doesn't handle the delete files in this way, so ideally for Gluten's 
case we can just avoid inheriting from the `SupportsDeleteFilter` interface? I 
can do more study around here, but the intuition is that we have to have a way 
to get the original delete files information so it can be pushed to C++ code.
   
   By the way, I am not sure if the community is interested with adding a 
Velox-based reader / writer implementation in the code base? Gluten + Iceberg 
development and test could be made easy, and it could also help Iceberg's new 
reader / writer API evolve in place. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 4.0: Make `SparkBatch.createReaderFactory` customizable [iceberg]

Reply via email to