Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817603690 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable(

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817447914 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable(

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
viirya commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817444036 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable( Spa

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817436902 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -125,4 +129,28 @@ private CloseableIterable newOrcIterable( .

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
viirya commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817429106 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -125,4 +129,28 @@ private CloseableIterable newOrcIterable( .wit

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438965356 @pvary Thank you for your suggestion! You're correct that adding such a test would help prevent future changes from inadvertently affecting this behavior without notice. Currently, Sp

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
dramaticlly commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438914981 > @huaxingao its a good find, im just wondering, where do we add _pos to the schema? Can we just not do it there? Just curious if its possible I think it might be from here h

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438938087 @szehon-ho I think we still need the `_pos` in the `requiredSchema` to build [`posAccessor`](https://github.com/apache/iceberg/blob/main/data/src/main/java/org/apache/iceberg/data/Dele

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
szehon-ho commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438907910 @huaxingao its a good find, im just wondering, where do we add _pos to the schema? Can we just not do it there? Just curious if its possible -- This is an automated message from t

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
pvary commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2437760336 @huaxingao: I'm not an expert in the Spark codebase, but I think having a test which fails before the change and succeeds after the change would be nice. Otherwise we risk future PRs chan

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-24 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2436567828 also cc @flyrain -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-24 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2436559370 cc @szehon-ho @pvary @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-24 Thread via GitHub
huaxingao opened a new pull request, #11390: URL: https://github.com/apache/iceberg/pull/11390 In Spark batch reading, Iceberg reads additional columns when there are delete files. For instance, if we have a table `test (int id, string data)` and a query `SELECT id FROM test`, the reques