Re: [PR] Iceberg/Comet integration POC [iceberg]

via GitHub Sun, 29 Dec 2024 10:01:34 -0800


huaxingao commented on code in PR #9841:
URL: https://github.com/apache/iceberg/pull/9841#discussion_r1899171737



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorBuilder.java:
##########
@@ -46,8 +46,8 @@ public ColumnVector build(VectorHolder holder, int numRows) {
       } else {
         throw new IllegalStateException("Unknown dummy vector holder: " + 
holder);
       }
-    } else if (rowIdMapping != null) {
-      return new ColumnVectorWithFilter(holder, rowIdMapping);
+    } else if (withDelete) {

Review Comment:
   The new approach is to load all data vectors first, and then apply delete 
logic to the ColumnarBatch by mutating all the ColumnVectors in place via 
setRowIdMapping. Initially, all the rowIdMapping are null, and later, the 
rowIdMapping will be computed and set on all the ColumnVectors. Since the 
rowIdMapping are set later, at this point, the rowIdMapping is null, we can't 
use `rowIdMapping != null` to decide to whether to construct a 
ColumnVectorWithFilter or regular `IcebergArrowColumnVector`. Instead, 
`withDelete` (which is computed using `deletes != null` in 
ColumnarBatchReader.readDataToColumnVectors`) is used. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Iceberg/Comet integration POC [iceberg]

Reply via email to