Re: [PR] Spark3.5: Add 'skip_file_list' option to RewriteTablePathProcedure for optional file-list generation [iceberg]

2025-04-18 Thread via GitHub
szehon-ho commented on code in PR #12844: URL: https://github.com/apache/iceberg/pull/12844#discussion_r2051407254 ## api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java: ## @@ -86,6 +86,16 @@ public interface RewriteTablePath extends Action

Re: [PR] Spark3.5: Add 'skip_file_list' option to RewriteTablePathProcedure for optional file-list generation [iceberg]

2025-04-18 Thread via GitHub
slfan1989 commented on PR #12844: URL: https://github.com/apache/iceberg/pull/12844#issuecomment-2816562105 > Interesting, is it all that you need to do Hive -> Iceberg conversion. Seems simple and make sense to me. cc @flyrain @dramaticlly for any thoughts @szehon-ho Thank you for yo

Re: [I] Hive metastore 4.0.1 remove deprecated thrift APIs [iceberg-python]

2025-04-18 Thread via GitHub
Fokko commented on issue #1222: URL: https://github.com/apache/iceberg-python/issues/1222#issuecomment-2816554420 @rcsmith27 That would be great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Spark3.5: Add 'skip_file_list' option to RewriteTablePathProcedure for optional file-list generation [iceberg]

2025-04-18 Thread via GitHub
szehon-ho commented on PR #12844: URL: https://github.com/apache/iceberg/pull/12844#issuecomment-2816544324 Interesting, is it all that you need to do Hive -> Iceberg conversion. Seems simple and make sense to me. cc @flyrain @dramaticlly for any thoughts -- This is an automated message

Re: [I] `s3.force-virtual-addressing` don't work [iceberg-python]

2025-04-18 Thread via GitHub
helmiazizm closed issue #1922: `s3.force-virtual-addressing` don't work URL: https://github.com/apache/iceberg-python/issues/1922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: Adopt ObjectProvider in Table [iceberg-rust]

2025-04-18 Thread via GitHub
liurenjie1024 commented on code in PR #1227: URL: https://github.com/apache/iceberg-rust/pull/1227#discussion_r2051359067 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -656,7 +658,29 @@ impl ManifestFile { /// Load [`Manifest`]. /// /// This method will also

Re: [I] [DISCUSS] A catalog loader api. [iceberg-rust]

2025-04-18 Thread via GitHub
liurenjie1024 commented on issue #1228: URL: https://github.com/apache/iceberg-rust/issues/1228#issuecomment-2816484998 > One thing I'm concerned about is that there may be a specific parameter to create the catalog, e.g. [S3Catalog](https://github.com/apache/iceberg-rust/blob/609b792e3f85c

Re: [PR] `validation_history` and `ancestors_between` [iceberg-python]

2025-04-18 Thread via GitHub
sungwy commented on code in PR #1935: URL: https://github.com/apache/iceberg-python/pull/1935#discussion_r2051346707 ## pyiceberg/table/snapshots.py: ## @@ -429,3 +429,13 @@ def ancestors_of(current_snapshot: Optional[Snapshot], table_metadata: TableMeta if snapshot.pa

[I] [Consult] planTask tasks a lot of time, consult for how to accelerate this [iceberg]

2025-04-18 Thread via GitHub
littleDrew opened a new issue, #12845: URL: https://github.com/apache/iceberg/issues/12845 ### Query engine Here I write and read iceberg table with spark, i mainly do fo following operation - insert data with merge into SQL, here `write.merge.mode='merge-on-read'`, this oper

Re: [I] Support IsolationLevels and Concurrency Safety Validation Checks [iceberg-python]

2025-04-18 Thread via GitHub
jayceslesar commented on issue #819: URL: https://github.com/apache/iceberg-python/issues/819#issuecomment-2816430126 @guptaakashdeep @sungwy see https://github.com/apache/iceberg-python/pull/1935 which should be the building blocks needed to crank out the 4 Sub-issues -- This is an auto

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
morhidi commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2051301525 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -173,50 +190,10 @@ public void testTableScan

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
morhidi commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2051301180 ## docs/docs/flink-configuration.md: ## @@ -102,30 +102,30 @@ env.getConfig() `Read option` has the highest priority, followed by `Flink configuration` and then `

Re: [PR] Spark 3.5 row lineage [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12736: URL: https://github.com/apache/iceberg/pull/12736#discussion_r2051292495 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/iceberg/spark/extensions/RemoveRowLineageOutputFromOriginalTable.scala: ## @@ -0,0 +1,56 @@ +/* + *

Re: [PR] API: Speed up Timestamps#toHumanString [iceberg]

2025-04-18 Thread via GitHub
github-actions[bot] commented on PR #12447: URL: https://github.com/apache/iceberg/pull/12447#issuecomment-2816385436 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] API: Speed up Timestamps#toHumanString [iceberg]

2025-04-18 Thread via GitHub
github-actions[bot] closed pull request #12447: API: Speed up Timestamps#toHumanString URL: https://github.com/apache/iceberg/pull/12447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Spark-3.5: Add spark action to compute partition stats [iceberg]

2025-04-18 Thread via GitHub
github-actions[bot] commented on PR #12450: URL: https://github.com/apache/iceberg/pull/12450#issuecomment-2816385445 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
stevenzwu commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2051255683 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -173,50 +190,10 @@ public void testTableSc

[PR] Spark3.5: Add 'skip_file_list' option to RewriteTablePathProcedure for optional file-list generation [iceberg]

2025-04-18 Thread via GitHub
slfan1989 opened a new pull request, #12844: URL: https://github.com/apache/iceberg/pull/12844 This is a minor feature improvement. The background is that we are using `RewriteTablePathProcedure` to convert Hive tables to Iceberg tables, as detailed in #12762. `RewriteTablePathProcedure` ge

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
aihuaxu commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2051252315 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct access

Re: [PR] Core: use ALL_VERSIONS constant in TestBase [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on PR #12748: URL: https://github.com/apache/iceberg/pull/12748#issuecomment-2816348914 Thanks @sullis for dealing with all my comments, and @reevik and @manuzhang for reviewing! -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Core: use ALL_VERSIONS constant in TestBase [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer merged PR #12748: URL: https://github.com/apache/iceberg/pull/12748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spark3.4: Backport ProcedureInput for MigrateTableProcedure And SnapshotTableProcedure (#12782 #12783) [iceberg]

2025-04-18 Thread via GitHub
slfan1989 commented on PR #12837: URL: https://github.com/apache/iceberg/pull/12837#issuecomment-2816315955 @manuzhang @huaxingao Thank you very much for helping to review the code! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2051174438 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -174,6 +194,27 @@ public static ParquetValueReader recordReader( ret

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2025-04-18 Thread via GitHub
stevenzwu commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r2050965696 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/TestDataFileRewriteCommitter.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apa

[PR] feat(playground): Add S3Tables catalog support (#1161) [iceberg-rust]

2025-04-18 Thread via GitHub
ananthaksr opened a new pull request, #1229: URL: https://github.com/apache/iceberg-rust/pull/1229 Allow configuring S3Tables catalog type in the playground CLI config. ## Which issue does this PR close? Closes https://github.com/apache/iceberg-rust/issues/1161 ## What ch

Re: [PR] API: Use normalized JSON path to identify Variant fields [iceberg]

2025-04-18 Thread via GitHub
ajantha-bhat commented on PR #12835: URL: https://github.com/apache/iceberg/pull/12835#issuecomment-2816245897 Will cherry-pick this to the next RC once merged. cc: @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Core: Add test cases for row lineage metadata [iceberg]

2025-04-18 Thread via GitHub
rdblue opened a new pull request, #12843: URL: https://github.com/apache/iceberg/pull/12843 This PR adds additional test cases and fixes for core row lineage metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat: validation history [iceberg-python]

2025-04-18 Thread via GitHub
jayceslesar commented on code in PR #1935: URL: https://github.com/apache/iceberg-python/pull/1935#discussion_r2051112627 ## pyiceberg/table/snapshots.py: ## @@ -255,6 +255,14 @@ def manifests(self, io: FileIO) -> List[ManifestFile]: """Return the manifests for the give

Re: [PR] AWS: Use custom Execution interceptor to support multiple storage credentials [iceberg]

2025-04-18 Thread via GitHub
singhpk234 commented on PR #12827: URL: https://github.com/apache/iceberg/pull/12827#issuecomment-2816174946 I see, thanks for the explanation @danielcweeks if we are **_sure_** that we don't want to have large number of prefixes and this is only there to suppport case is like _support case

Re: [PR] feat: validation history [iceberg-python]

2025-04-18 Thread via GitHub
jayceslesar commented on code in PR #1935: URL: https://github.com/apache/iceberg-python/pull/1935#discussion_r2051103956 ## pyiceberg/table/snapshots.py: ## @@ -255,6 +255,14 @@ def manifests(self, io: FileIO) -> List[ManifestFile]: """Return the manifests for the give

Re: [PR] Site: Remove Iceberg Summit Link from the Homepage [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on PR #12842: URL: https://github.com/apache/iceberg/pull/12842#issuecomment-2816155935 Thanks @singhpk234 and @danielcweeks for review, same time next year ? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Site: Remove Iceberg Summit Link from the Homepage [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer merged PR #12842: URL: https://github.com/apache/iceberg/pull/12842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

[PR] feat: validation history [iceberg-python]

2025-04-18 Thread via GitHub
jayceslesar opened a new pull request, #1935: URL: https://github.com/apache/iceberg-python/pull/1935 # Rationale for this change Adds `validation_history` that will be used in support of https://github.com/apache/iceberg-python/issues/819 # Are these changes te

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051086168 ## bigquery/src/test/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreTestUtils.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on PR #12808: URL: https://github.com/apache/iceberg/pull/12808#issuecomment-2816148094 Thank you all @nastra @ebyhr @kravikumar @gkalra18 for your review. I addressed all your comments feel free to add more :) -- This is an automated message from the Apache Git Se

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051081131 ## bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java: ## @@ -0,0 +1,620 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Site: Remove Iceberg Summit Link from the Homepage [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on PR #12842: URL: https://github.com/apache/iceberg/pull/12842#issuecomment-2816139845 😢 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051080534 ## bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClient.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051079641 ## bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClient.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051079976 ## bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java: ## @@ -0,0 +1,620 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Catalog: Add BigQuery Metastore Catalog Support [iceberg]

2025-04-18 Thread via GitHub
talatuyarer commented on code in PR #12808: URL: https://github.com/apache/iceberg/pull/12808#discussion_r2051079331 ## bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java: ## @@ -0,0 +1,620 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Infra: Add 1.8.x to protected branch [iceberg]

2025-04-18 Thread via GitHub
danielcweeks commented on PR #12830: URL: https://github.com/apache/iceberg/pull/12830#issuecomment-2816125276 We haven't done this for any of the other branches, help me understand the motivation for this (maybe include a description?). -- This is an automated message from the Apache Git

[PR] Site: Remove Iceberg Summit Link from the Homepage [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer opened a new pull request, #12842: URL: https://github.com/apache/iceberg/pull/12842 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Support native Pydantic schemas [iceberg-python]

2025-04-18 Thread via GitHub
potatochipcoconut commented on issue #1934: URL: https://github.com/apache/iceberg-python/issues/1934#issuecomment-2816103494 Came up with naive attempt, open to feedback. Not sure how it would handle e.g. int vs long, float vs double, etc ``` import builtins import datetime imp

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2051043116 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -161,6 +162,25 @@ public static ParquetValueReader position() { return new Pos

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2051039187 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -237,55 +236,37 @@ public ParquetValueReader struct( int fieldD = ty

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2051039187 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -237,55 +236,37 @@ public ParquetValueReader struct( int fieldD = ty

Re: [PR] AWS: Use custom Execution interceptor to support multiple storage credentials [iceberg]

2025-04-18 Thread via GitHub
danielcweeks commented on PR #12827: URL: https://github.com/apache/iceberg/pull/12827#issuecomment-2816031634 @singhpk234 > My only concern with the following was each client maintaining its own connection pool, with large number of prefixes, which would not be scalable IMHO > an

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2051001564 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -322,6 +333,88 @@ public void setPageSource(PageReadStore pageStore) { } }

Re: [PR] Parquet: Add variant array reader in Parquet [iceberg]

2025-04-18 Thread via GitHub
aihuaxu commented on code in PR #12512: URL: https://github.com/apache/iceberg/pull/12512#discussion_r2050999864 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetVariantReaders.java: ## @@ -332,6 +346,57 @@ public void setPageSource(PageReadStore pageStore) { }

Re: [PR] Spark: Add _row_id and _last_updated_sequence_number readers [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12836: URL: https://github.com/apache/iceberg/pull/12836#discussion_r2050851734 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -322,6 +333,88 @@ public void setPageSource(PageReadStore pageStore) {

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
morhidi commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2050973783 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -210,6 +210,43 @@ public void testIncrementa

Re: [PR] Core: use ALL_VERSIONS constant in TestBase [iceberg]

2025-04-18 Thread via GitHub
sullis commented on PR #12748: URL: https://github.com/apache/iceberg/pull/12748#issuecomment-2815976489 Rebased. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[PR] Adding new rewrite manifest spark action to accept custom partition order [iceberg]

2025-04-18 Thread via GitHub
zachdisc opened a new pull request, #12840: URL: https://github.com/apache/iceberg/pull/12840 **Note** this is a fresh PR replacing https://github.com/apache/iceberg/pull/9731. It had too much accumulated conflicts and changes, I rebased and messed it up. This is a clean start with all pre

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050962969 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct access

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050961138 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct access

Re: [PR] Adding new rewrite manifest spark action to accept custom partition order [iceberg]

2025-04-18 Thread via GitHub
zachdisc commented on PR #11881: URL: https://github.com/apache/iceberg/pull/11881#issuecomment-2815964169 Reopening -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Avoid Avro recursive schema for Variant schema. [iceberg]

2025-04-18 Thread via GitHub
aihuaxu commented on PR #12459: URL: https://github.com/apache/iceberg/pull/12459#issuecomment-2815963045 @flyrain Can you help review this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050953304 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050954034 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050953304 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050953304 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050950859 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
rodmeneses commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2050952065 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -210,6 +210,43 @@ public void testIncreme

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050950859 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2050951788 ## format/spec.md: ## @@ -648,6 +648,21 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct

Re: [PR] Build and test hive-metastore with Hive 2, 3 and 4 with a single source set [iceberg]

2025-04-18 Thread via GitHub
danielcweeks commented on PR #12721: URL: https://github.com/apache/iceberg/pull/12721#issuecomment-2815929135 @wypoon I appreciate the effort put into getting this working, but I'm concerned about the approach, complexity, and unintended impacts this approach may have. I see the goa

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2025-04-18 Thread via GitHub
stevenzwu commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r2045041541 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache So

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
rdblue merged PR #12672: URL: https://github.com/apache/iceberg/pull/12672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on PR #12672: URL: https://github.com/apache/iceberg/pull/12672#issuecomment-2815924365 Thanks for the reviews, @RussellSpitzer and @danielcweeks! I'm going to merge this so that we can get working on the next set of changes, including #12836. -- This is an automated mes

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
rodmeneses commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2050916538 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/ContinuousSplitPlannerImpl.java: ## @@ -180,6 +180,15 @@ private ContinuousEnumeratio

Re: [PR] Spec: Update row lineage requirements for upgrading tables [iceberg]

2025-04-18 Thread via GitHub
RussellSpitzer commented on code in PR #12781: URL: https://github.com/apache/iceberg/pull/12781#discussion_r2049413879 ## format/spec.md: ## @@ -786,9 +790,11 @@ Notes: First Row ID Assignment -When adding a new data manifest file, its `first_row_id` field is assigned

[I] Support Concurrency Safety Validation: Implement `validateNoNewDeletesForDataFiles` [iceberg-python]

2025-04-18 Thread via GitHub
sungwy opened a new issue, #1931: URL: https://github.com/apache/iceberg-python/issues/1931 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Build and test hive-metastore with Hive 2, 3 and 4 with a single source set [iceberg]

2025-04-18 Thread via GitHub
wypoon commented on PR #12721: URL: https://github.com/apache/iceberg/pull/12721#issuecomment-2814205860 @danielcweeks I think you misunderstand me. I'm not advocating doing what the Spark and Flink modules now do, which is separate source sets for each version. I have worked hard to g

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2047915344 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/data/TestHelpers.java: ## @@ -887,11 +887,14 @@ public static void asMetadataRecord(GenericData.Record file

Re: [PR] Core/REST: generify AuthSessionCache [iceberg]

2025-04-18 Thread via GitHub
github-actions[bot] commented on PR #12562: URL: https://github.com/apache/iceberg/pull/12562#issuecomment-2814237675 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Flink: backport fix TriggerManager to unlock task execution when previous job left an orphaned lock for Flink 1.19 [iceberg]

2025-04-18 Thread via GitHub
Guosmilesmile commented on code in PR #12801: URL: https://github.com/apache/iceberg/pull/12801#discussion_r2049714118 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/TriggerManager.java: ## @@ -189,6 +189,9 @@ public void initializeState(Functio

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
danielcweeks commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2049744572 ## core/src/test/java/org/apache/iceberg/TestManifestWriterVersions.java: ## @@ -213,27 +228,125 @@ public void testV2ManifestRewriteWithInheritance() throws IO

Re: [PR] spec: Variant lower/upper bounds [iceberg]

2025-04-18 Thread via GitHub
aihuaxu commented on code in PR #12658: URL: https://github.com/apache/iceberg/pull/12658#discussion_r2049538827 ## format/spec.md: ## @@ -648,6 +648,9 @@ Notes: 5. The `content_offset` and `content_size_in_bytes` fields are used to reference a specific blob for direct access

Re: [I] [discuss] PyIceberg Near-Term Roadmap [iceberg-python]

2025-04-18 Thread via GitHub
jayceslesar commented on issue #1856: URL: https://github.com/apache/iceberg-python/issues/1856#issuecomment-2815867005 Some loose ideas in terms of any rust integration: Fancy CI work to enable testing python bindings from the rust repo directly against tests from this one (when/whe

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
danielcweeks commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2049743934 ## core/src/test/java/org/apache/iceberg/TestManifestWriterVersions.java: ## @@ -213,27 +228,125 @@ public void testV2ManifestRewriteWithInheritance() throws IO

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-04-18 Thread via GitHub
Fokko commented on PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#issuecomment-2813873855 @sungwy I've included setting the default value in this PR in `set_default_value`. PTAL :) -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Use assumeThat instead of assumeTrue [iceberg]

2025-04-18 Thread via GitHub
slfan1989 commented on code in PR #12822: URL: https://github.com/apache/iceberg/pull/12822#discussion_r2049753143 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -222,7 +221,7 @@ public void testCreateExistingNamespace() { @Test public void tes

Re: [PR] feat: validate snapshot write compatibility [iceberg-python]

2025-04-18 Thread via GitHub
sungwy commented on PR #1772: URL: https://github.com/apache/iceberg-python/pull/1772#issuecomment-2814294884 I've created some subtasks on https://github.com/apache/iceberg-python/issues/819 that will help us implement the required validation functions that we can invoke to check that no

Re: [PR] Spark3.4: Backport ProcedureInput for MigrateTableProcedure And SnapshotTableProcedure (#12782 #12783) [iceberg]

2025-04-18 Thread via GitHub
huaxingao merged PR #12837: URL: https://github.com/apache/iceberg/pull/12837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark3.4: Backport ProcedureInput for MigrateTableProcedure And SnapshotTableProcedure (#12782 #12783) [iceberg]

2025-04-18 Thread via GitHub
huaxingao commented on PR #12837: URL: https://github.com/apache/iceberg/pull/12837#issuecomment-2815863278 Merged. Thanks @slfan1989 for the PR! Also thanks @manuzhang for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] feat: Add rest catalog support for cli [iceberg-rust]

2025-04-18 Thread via GitHub
liurenjie1024 commented on PR #1220: URL: https://github.com/apache/iceberg-rust/pull/1220#issuecomment-2815447484 cc @Xuanwo @sdd I've resolved conflicts, PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
stevenzwu commented on PR #12839: URL: https://github.com/apache/iceberg/pull/12839#issuecomment-2815856582 maybe also update the doc on starting strategy https://iceberg.apache.org/docs/nightly/flink-configuration/#read-options -- This is an automated message from the Apache Git Servic

Re: [PR] Flink: Add StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT_EXCLUSIVE [iceberg]

2025-04-18 Thread via GitHub
stevenzwu commented on code in PR #12839: URL: https://github.com/apache/iceberg/pull/12839#discussion_r2050853542 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/source/StreamingStartingStrategy.java: ## @@ -34,6 +34,13 @@ public enum StreamingStartingStrategy {

Re: [PR] Spark 3.5 row lineage [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12736: URL: https://github.com/apache/iceberg/pull/12736#discussion_r2050867791 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRowLineagePropagation.java: ## @@ -0,0 +1,441 @@ +/* + * Licensed to the

Re: [PR] Core: Pass storage credentials from LoadTableResponse to FileIO [iceberg]

2025-04-18 Thread via GitHub
tedyu commented on code in PR #12591: URL: https://github.com/apache/iceberg/pull/12591#discussion_r2050867855 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java: ## @@ -547,4 +563,28 @@ private boolean recoverObject(ObjectVersion version, String bucket) { retu

Re: [PR] Spark 3.5 row lineage [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12736: URL: https://github.com/apache/iceberg/pull/12736#discussion_r2050866505 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRowLineagePropagation.java: ## @@ -0,0 +1,441 @@ +/* + * Licensed to the

Re: [PR] [1.8.x] Add 1.8.x to protected branch [iceberg]

2025-04-18 Thread via GitHub
nastra commented on code in PR #12826: URL: https://github.com/apache/iceberg/pull/12826#discussion_r2048420100 ## .asf.yaml: ## @@ -34,7 +34,7 @@ github: rebase: true protected_branches: -main: +1.8.x: Review Comment: wouldn't we need to update this for al

Re: [PR] Scan Delete Support Part 4: Delete File Loading; Skeleton for Processing [iceberg-rust]

2025-04-18 Thread via GitHub
sdd commented on code in PR #982: URL: https://github.com/apache/iceberg-rust/pull/982#discussion_r2048330115 ## crates/iceberg/src/delete_vector.rs: ## @@ -38,6 +40,10 @@ impl DeleteVector { _ => DeleteVectorIterator { inner: None }, } } + +pub fn

Re: [I] Convert the Hive table partition deletion syntax to Iceberg partition deletion syntax. [iceberg]

2025-04-18 Thread via GitHub
deniskuzZ commented on issue #12753: URL: https://github.com/apache/iceberg/issues/12753#issuecomment-2812227265 @guixiaowen, you can use `DELETE FROM %s WHERE ..` SQL in Hive-4.x as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2049522685 ## core/src/test/java/org/apache/iceberg/TestRowLineageMetadata.java: ## @@ -359,12 +359,40 @@ public void testReplace() { table.newRewrite().deleteFile(filePa

Re: [PR] Core: Support first-row-id for manifests and manifest lists [iceberg]

2025-04-18 Thread via GitHub
rdblue commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2049521699 ## core/src/test/java/org/apache/iceberg/TestRowLineageAssignment.java: ## @@ -0,0 +1,672 @@ +/* + * + * * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Spark 3.5: Use ProcedureInput for SnapshotTableProcedure. [iceberg]

2025-04-18 Thread via GitHub
nastra commented on PR #12783: URL: https://github.com/apache/iceberg/pull/12783#issuecomment-2811853778 thanks @slfan1989 for improving this and @manuzhang for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] feat:add init expression interface. [iceberg-cpp]

2025-04-18 Thread via GitHub
Fokko commented on PR #58: URL: https://github.com/apache/iceberg-cpp/pull/58#issuecomment-2812067720 Thanks @yingcai-cy for working on this, and thanks @alonesniper, @gty404, @lidavidm, @wgtmac, @Xuanwo and @zhjwpku for the review 🙌 Great to have this in! -- This is an automated message

Re: [PR] Spark 3.5 row lineage [iceberg]

2025-04-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #12736: URL: https://github.com/apache/iceberg/pull/12736#discussion_r2050842658 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -327,7 +327,11 @@ public void pruneColumns(StructType requeste

  1   2   >