Re: [PR] Core: Interface based DataFile reader and writer API [iceberg]

via GitHub Wed, 09 Apr 2025 07:54:22 -0700


pvary commented on PR #12298:
URL: https://github.com/apache/iceberg/pull/12298#issuecomment-2788854677


   > Overall, +1 from me 🎉
   > 
   > I made a prototype Lance implementation here: 
[84bf5c5](https://github.com/apache/iceberg/commit/84bf5c53bc5ea19101bb7f21d72f24666c2b3804)
   
   @westonpace: Thanks for the feedback, I really appreciate that you took time 
to implement the API for Lance and shared your learnings!
   
   > That being said, I think a really cool addition in the future would be a 
base implementation that uses Arrow. As long as a reader/writer can 
produce/consume VectorSchemaRoot and it puts the field ids in the Arrow field 
schema, then 80% of the glue code will be provided for them. The name mapping, 
field id handling, constant handling, and spark<->arrow conversion could all be 
part of the base implementation.
   
   Are you suggesting that we should use Arrow as an intermediate format? So 
basically Iceberg should implement the transformations between an Arrow 
`VectorSchemaRoot` to the engine specific `ObjectModel`s (Generic/Spark/Flink), 
and the File Formats should implement the transformation between the File 
Format internal model and the Arrow `VectorSchemaRoot`? What do you think about 
the overhead (memory/CPU) of the double transformation? Do you have experience 
with this on the hot path for reading/writing? I specifically tried to avoid 
the double transformation to ensure that the performance doesn't suffer.
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Interface based DataFile reader and writer API [iceberg]

Reply via email to