huaxingao opened a new pull request, #9841: URL: https://github.com/apache/iceberg/pull/9841
This PR shows how I will integrate [Comet](https://github.com/apache/arrow-datafusion-comet) with iceberg. The PR doesn't compile yet because we haven't released Comet yet. Also, Comet doesn't have Spark3.5 support yet so I am doing this on 3.4, but we will add 3.5 support in Comet. In `VectorizedSparkParquetReaders.buildReader`, if Comet library is available, a `CometIcebergColumnarBatchReader` will be created, which will use Comet batch reader to read data. We can also add a property later to control whether we want to use Comet or not. The logic in `CometIcebergVectorizedReaderBuilder` is very similar to `VectorizedReaderBuilder`. It builds Comet column reader instead of iceberg column reader. The delete logic in `CometIcebergColumnarBatchReader` is exactly the same as the one in `ColumnarBatchReader`. I will extract the common code and put the common code in a base class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org