Re: [PR] Spark 4.0: Make `SparkBatch.createReaderFactory` customizable [iceberg]

via GitHub Tue, 01 Jul 2025 03:18:20 -0700


zhztheplayer commented on PR #13433:
URL: https://github.com/apache/iceberg/pull/13433#issuecomment-3023201062


   @pvary Thank you for the response!
   
   > @zhztheplayer: What is the information you need when you push the 
execution to Velox?
   
   This is a utility in Gluten that wraps the essential information for 
offloading: 
[GlutenIcebergSourceUtil](https://github.com/apache/incubator-gluten/blob/4884911532fae2c4ea5c2a77c8f615973cde22e9/gluten-iceberg/src/main/scala/org/apache/iceberg/spark/source/GlutenIcebergSourceUtil.scala#L39-L85),
 where the original `SparkInputPartition` is captured and translated.
   
   > Could you just create your own reader/writer implementing the proposed 
ReadBuilder/WriterBuilder interfaces in 
https://github.com/apache/iceberg/pull/12774?
   
   I was actually carefully evaluating on the new APIs, but so far they don't 
feel like a proper place to let Gluten plugin. Gluten actually offloads more 
information to native for processing, including the row delete files, which 
means they are processed in native so the JVM-side row-based `DeleteFilter` 
will not be needed.
   
   I actually want to see more code from Iceberg can be reused for Gluten but 
at this moment it looks a bit challenging. Though I will keep tracking on that. 
Meanwhile a developer API like what this PR proposes will definitely help a lot 
in our case, since by leveraging it we will have the maximized flexibility to 
create the Spark partition reader with all the information we can access from 
the Iceberg Spark partition. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 4.0: Make `SparkBatch.createReaderFactory` customizable [iceberg]

Reply via email to