Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

via GitHub Fri, 25 Oct 2024 16:31:21 -0700


huaxingao commented on code in PR #11390:
URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817447914



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java:
##########
@@ -81,14 +84,15 @@ private CloseableIterable<ColumnarBatch> newParquetIterable(
       SparkDeleteFilter deleteFilter) {
     // get required schema if there are deletes
     Schema requiredSchema = deleteFilter != null ? 
deleteFilter.requiredSchema() : expectedSchema();
+    Schema vectorizationSchema = vectorizationSchema(deleteFilter);
 
     return Parquet.read(inputFile)
         .project(requiredSchema)
         .split(start, length)
         .createBatchedReaderFunc(
             fileSchema ->
                 VectorizedSparkParquetReaders.buildReader(
-                    requiredSchema, fileSchema, idToConstant, deleteFilter))
+                    vectorizationSchema, fileSchema, idToConstant, 
deleteFilter))

Review Comment:
   Not exactly. 
   If no deletes, it's `expectedSchema`. 
   If it's equality delete, it's `deleteFilter.requiredSchema()`, because it 
could be `expectedSchema` + equality filter column. For example:
   ```
   SELECT id FROM table
   ```
   supposed the equality delete has data == 'aaa'
   then we do need to read the `data` column too, so it's 
`deleteFilter.requiredSchema()`, which is `id` + `data`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

Reply via email to