Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-03 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2512236843 @flyrain I will have a follow-up PR. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2513346295 Thanks @flyrain for reviewing and merging the PR! Also thanks @singhpk234 for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
flyrain merged PR #11551: URL: https://github.com/apache/iceberg/pull/11551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2512632326 @flyrain Thanks for the quick reply. I will have a follow-up PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2512581027 @flyrain I think this over. The `missingIds` could be from [`ROW_POSITION.fieldId()`](https://github.com/apache/iceberg/blob/main/data/src/main/java/org/apache/iceberg/data/DeleteFilte

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-01 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1865324477 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -245,5 +247,16 @@ void applyEqDelete(ColumnarBatch column

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-01 Thread via GitHub
flyrain commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1865274246 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -245,5 +247,16 @@ void applyEqDelete(ColumnarBatch columnar

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-01 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1865249456 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -622,6 +623,50 @@ public void testPosDeletesOnParquetFileWithM

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-01 Thread via GitHub
flyrain commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1865218949 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -622,6 +623,50 @@ public void testPosDeletesOnParquetFileWithMul

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-30 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1864760863 ## data/src/main/java/org/apache/iceberg/data/DeleteFilter.java: ## @@ -73,6 +74,7 @@ protected DeleteFilter( boolean needRowPosCol) { this.filePath = f

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-30 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1864760757 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/VectorizedSparkParquetReaders.java: ## @@ -125,4 +126,25 @@ protected VectorizedReader v

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-30 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1864760687 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -622,6 +624,41 @@ public void testPosDeletesOnParquetFileWithM

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-29 Thread via GitHub
flyrain commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1863894343 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -245,5 +259,16 @@ void applyEqDelete(ColumnarBatch columnar

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-18 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1847131830 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extends

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-18 Thread via GitHub
singhpk234 commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1847025083 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extend

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-18 Thread via GitHub
singhpk234 commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1847025083 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extend

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-18 Thread via GitHub
singhpk234 commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1847025083 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extend

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-15 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2479588343 cc @flyrain @szehon-ho @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-14 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1843189307 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extends

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-14 Thread via GitHub
singhpk234 commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1842720884 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extend

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-14 Thread via GitHub
singhpk234 commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1842720884 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -45,11 +45,23 @@ public class ColumnarBatchReader extend

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-14 Thread via GitHub
huaxingao commented on code in PR #11551: URL: https://github.com/apache/iceberg/pull/11551#discussion_r1842606127 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -622,6 +624,41 @@ public void testPosDeletesOnParquetFileWithM

[PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-11-14 Thread via GitHub
huaxingao opened a new pull request, #11551: URL: https://github.com/apache/iceberg/pull/11551 In Equality Delete, we build `ColumnarBatchReader` for the equality delete filter columns to read their values and determine which rows are deleted. If these filter columns are not among the reque