[PR] Iceberg/Comet integration POC [iceberg]

via GitHub Thu, 29 Feb 2024 18:34:35 -0800


huaxingao opened a new pull request, #9841:
URL: https://github.com/apache/iceberg/pull/9841


   This PR shows how I will integrate 
[Comet](https://github.com/apache/arrow-datafusion-comet) with iceberg. The PR 
doesn't compile yet because we haven't released Comet yet. Also, Comet doesn't 
have Spark3.5 support yet so I am doing this on 3.4, but we will add 3.5 
support in Comet. 
   
   In `VectorizedSparkParquetReaders.buildReader`, if Comet library is 
available, a `CometIcebergColumnarBatchReader` will be created, which will use 
Comet batch reader to read data. We can also add a property later to control 
whether we want to use Comet or not.
   
   The logic in `CometIcebergVectorizedReaderBuilder` is very similar to 
`VectorizedReaderBuilder`. It builds Comet column reader instead of iceberg 
column reader.
   
   The delete logic in `CometIcebergColumnarBatchReader` is exactly the same as 
the one in `ColumnarBatchReader`.  I will extract the common code and put the 
common code in a base class. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Iceberg/Comet integration POC [iceberg]

Reply via email to