Re: [PR] Iceberg/Comet integration POC [iceberg]

via GitHub Tue, 30 Apr 2024 14:38:20 -0700


aokolnychyi commented on code in PR #9841:
URL: https://github.com/apache/iceberg/pull/9841#discussion_r1585438330



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java:
##########
@@ -32,23 +32,27 @@
 import org.apache.iceberg.orc.ORC;
 import org.apache.iceberg.parquet.Parquet;
 import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.spark.ParquetReaderType;
 import org.apache.iceberg.spark.data.vectorized.VectorizedSparkOrcReaders;
 import org.apache.iceberg.spark.data.vectorized.VectorizedSparkParquetReaders;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.spark.sql.vectorized.ColumnarBatch;
 
 abstract class BaseBatchReader<T extends ScanTask> extends 
BaseReader<ColumnarBatch, T> {
   private final int batchSize;
+  private final ParquetReaderType parquetReaderType;

Review Comment:
   Instead of passing this variable, let's create `BatchReadConf` and pass a 
reference to it here. I'd consider adding `batchReadConf()` method in 
`SparkReadConf` with the following fields (if you want, we can add a builder 
for `BatchReadConf` as well).
   
   ```
   class BatchReadConf {
     int orcBatchSize() {...}
     ParquetReaderType parquetReaderType() {...}
     int parquetBatchSize() {...}
     boolean cometLazyMaterializationEnabled() {...}
     ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Iceberg/Comet integration POC [iceberg]

Reply via email to