Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

via GitHub Fri, 25 Oct 2024 16:01:42 -0700


huaxingao commented on code in PR #11390:
URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817436902



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java:
##########
@@ -125,4 +129,28 @@ private CloseableIterable<ColumnarBatch> newOrcIterable(
         .withNameMapping(nameMapping())
         .build();
   }
+
+  private Schema vectorizationSchema(SparkDeleteFilter deleteFilter) {
+    // For pos delete, deleteFilter has appended _pos to the required schema.
+    // For example, SELECT id, data FROM test, the requested schema is id and 
data. If there
+    // is position delete, deleteFilter will append _pos to the schema so the 
schema becomes
+    // id, data and _pos. However, vectorization reader only needs to read the 
requested columns,
+    // i.e. id and data, so we want to remove the _pos from the schema when 
building the
+    // vectorization reader. Before removing _pos, we need to make sure _pos 
is not explicitly
+    // selected in the query.
+    if (deleteFilter != null) {
+      if (deleteFilter.hasPosDeletes() && expectedSchema().findType("_pos") == 
null) {

Review Comment:
   Yes, we do. Changed to `MetadataColumns.ROW_POSITION.name()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

Reply via email to