Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
zhongqishang commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683819074 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java: ## @@ -204,7 +207,7 @@ public void testCommitTxn() throws Except

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1683757259 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3786,173 @@ components: type: integer description: "List of equality field IDs"

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
zhongqishang commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683742893 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Ov

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683681004 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683661895 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683661895 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683659774 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -175,7 +184,37 @@ public Statistics estimateStatistics() { protected Stat

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683659065 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -189,9 +192,8 @@ protected Statistics estimateStatistics(Snapshot snapshot)

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683657327 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -175,7 +184,37 @@ public Statistics estimateStatistics() { protected Stat

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683657244 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -90,4 +90,8 @@ private SparkSQLProperties() {} public static final Stri

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683657141 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -175,7 +184,37 @@ public Statistics estimateStatistics() { protected Stat

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-18 Thread via GitHub
huaxingao commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1683657052 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -90,4 +90,8 @@ private SparkSQLProperties() {} public static final Stri

Re: [PR] Remove unnecessary class-level synchronized in ManifestFiles [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar merged PR #10544: URL: https://github.com/apache/iceberg/pull/10544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Remove unnecessary class-level synchronized in ManifestFiles [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on PR #10544: URL: https://github.com/apache/iceberg/pull/10544#issuecomment-2237848340 Sorry for the late review on this @findepi , I was going through https://github.com/apache/iceberg/pull/10494 and trying to understand more about this path and saw this change

Re: [I] Unable to use GlueCatalog in flink environments without hadoop [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3044: URL: https://github.com/apache/iceberg/issues/3044#issuecomment-2237833788 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] [ICEBERG-FLINK]support read hive configuration from HIVE_HOME&HIVE_CONF_DIR env [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3034: URL: https://github.com/apache/iceberg/pull/3034#issuecomment-2237833731 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Arrow: FIXED type support [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3029: URL: https://github.com/apache/iceberg/pull/3029#issuecomment-2237833681 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] #2468 fix the catalog interface cast exception. [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3032: URL: https://github.com/apache/iceberg/pull/3032#issuecomment-2237833702 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Spark: Add Spark extension for table encryption key [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3013: URL: https://github.com/apache/iceberg/pull/3013#issuecomment-2237833626 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3012: URL: https://github.com/apache/iceberg/pull/3012#issuecomment-2237833605 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] GitHub Actions Run Twice on Initial Push in Forks [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3003: URL: https://github.com/apache/iceberg/issues/3003#issuecomment-2237833543 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] AvroSchemaUtil.buildAvroProjection produces duplicate "rnull" record schemas for optional fields [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3005: URL: https://github.com/apache/iceberg/issues/3005#issuecomment-2237833560 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] truncate's width (W) , W (width) Only supports integer , Why not supported long type [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #2993: URL: https://github.com/apache/iceberg/issues/2993#issuecomment-2237833525 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] 使用bucket函数创建表后,向里面批量导入数据会报错 [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #2958: URL: https://github.com/apache/iceberg/issues/2958#issuecomment-2237833474 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Add more description to the Write-audit-publish feature [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] closed issue #2802: Add more description to the Write-audit-publish feature URL: https://github.com/apache/iceberg/issues/2802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Revert #2960 and commit no-op partition replacement operations [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #3043: URL: https://github.com/apache/iceberg/pull/3043#issuecomment-2237833775 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Help needed in migrating parquet file to iceberg table. [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3042: URL: https://github.com/apache/iceberg/issues/3042#issuecomment-2237833753 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] HiveTableTest#testDropWithoutPurgeLeavesTableData seems to be flaky on CI [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3033: URL: https://github.com/apache/iceberg/issues/3033#issuecomment-2237833719 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] i can't import class which start with org.apache.iceberg.relocated [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3028: URL: https://github.com/apache/iceberg/issues/3028#issuecomment-2237833661 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Is Iceberg support ranger partition? [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3026: URL: https://github.com/apache/iceberg/issues/3026#issuecomment-2237833648 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Core: How to store the data of a table two months ago to oss [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3011: URL: https://github.com/apache/iceberg/issues/3011#issuecomment-2237833593 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] How do I realize the upsert of flink sql through setting, in iceberg 0.12.0 [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #3009: URL: https://github.com/apache/iceberg/issues/3009#issuecomment-2237833575 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark: remove object storage data path in destination table for snapshot table action [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #2966: URL: https://github.com/apache/iceberg/pull/2966#issuecomment-2237833493 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Handle OVERWRITE snapshot on spark streaming for table v1 [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on PR #2944: URL: https://github.com/apache/iceberg/pull/2944#issuecomment-2237833458 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Add more description to the Write-audit-publish feature [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #2802: URL: https://github.com/apache/iceberg/issues/2802#issuecomment-2237833306 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Alter table to generic types [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] closed issue #2791: Alter table to generic types URL: https://github.com/apache/iceberg/issues/2791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Alter table to generic types [iceberg]

2024-07-18 Thread via GitHub
github-actions[bot] commented on issue #2791: URL: https://github.com/apache/iceberg/issues/2791#issuecomment-2237833284 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Formal verification discovers potential consistency issue [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on issue #10720: URL: https://github.com/apache/iceberg/issues/10720#issuecomment-2237831914 @Vanlightly I was focused on the validation path, and perhaps this may be where the formal verification model is missing. We already set a field which will fail if

Re: [PR] Docs: Add note on write distribution change when adding local order [iceberg]

2024-07-18 Thread via GitHub
szehon-ho commented on PR #10647: URL: https://github.com/apache/iceberg/pull/10647#issuecomment-2237794775 To confirm , you mean the code could change to not override the default distribution mode in this scenario right? Yea I think that in line with our original idea here. -- This is

Re: [PR] Docs: Add note on write distribution change when adding local order [iceberg]

2024-07-18 Thread via GitHub
aokolnychyi commented on PR #10647: URL: https://github.com/apache/iceberg/pull/10647#issuecomment-2237793353 I replied on the dev list. Let me know if you all agree, @manuzhang @RussellSpitzer @szehon-ho. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
danielcweeks commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683602812 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
danielcweeks commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683601891 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Core: Support appending files with different specs [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar merged PR #9860: URL: https://github.com/apache/iceberg/pull/9860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Repair manifest action [iceberg]

2024-07-18 Thread via GitHub
danielcweeks commented on PR #10445: URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2237752049 Thanks for the assessment @amogh-jahagirdar. @szehon-ho, yes I think we do want to support the work you did and add it to this action. Overall, this was focused on fixing broken

Re: [PR] Spec: Clarify time travel implementation in Iceberg [iceberg]

2024-07-18 Thread via GitHub
rdblue commented on code in PR #8982: URL: https://github.com/apache/iceberg/pull/8982#discussion_r1683583050 ## format/spec.md: ## @@ -1370,3 +1370,16 @@ Writing v2 metadata: * `sort_columns` was removed Note that these requirements apply when writing data to a v2 table

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on PR #10526: URL: https://github.com/apache/iceberg/pull/10526#issuecomment-2237743264 @zhongqishang @pvary I have a uber question. let's say checkpoint N was cancelled or timed out and checkpoint N+1 completed successfully. In this case, do we know all the writer

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683576927 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java: ## @@ -204,7 +207,7 @@ public void testCommitTxn() throws Exception

[PR] Bump getdaft from 0.2.29 to 0.2.31 [iceberg-python]

2024-07-18 Thread via GitHub
dependabot[bot] opened a new pull request, #942: URL: https://github.com/apache/iceberg-python/pull/942 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.2.29 to 0.2.31. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] Bump getdaft from 0.2.29 to 0.2.30 [iceberg-python]

2024-07-18 Thread via GitHub
dependabot[bot] closed pull request #940: Bump getdaft from 0.2.29 to 0.2.30 URL: https://github.com/apache/iceberg-python/pull/940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Bump getdaft from 0.2.29 to 0.2.30 [iceberg-python]

2024-07-18 Thread via GitHub
dependabot[bot] commented on PR #940: URL: https://github.com/apache/iceberg-python/pull/940#issuecomment-2237724308 Superseded by #942. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683550064 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683536734 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683536734 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683518028 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683534328 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] feat(visitors): Implement basic boolean expression visitors [iceberg-go]

2024-07-18 Thread via GitHub
zeroshade commented on code in PR #108: URL: https://github.com/apache/iceberg-go/pull/108#discussion_r1683521368 ## table/metadata.go: ## @@ -156,6 +166,42 @@ type commonMetadata struct { Refs map[string]SnapshotRef `json:"refs"` } +func (c *commonMeta

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683519816 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1683518028 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Overr

Re: [PR] feat(visitors): Implement basic boolean expression visitors [iceberg-go]

2024-07-18 Thread via GitHub
zeroshade commented on code in PR #108: URL: https://github.com/apache/iceberg-go/pull/108#discussion_r1683516529 ## table/metadata.go: ## @@ -156,6 +166,42 @@ type commonMetadata struct { Refs map[string]SnapshotRef `json:"refs"` } +func (c *commonMeta

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
rdblue commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1683515066 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -136,30 +169,33 @@ private boolean checkTasks() { } } - return !close

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
rdblue commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1683506642 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -88,16 +92,26 @@ private ParallelIterator( @Override public void close() {

Re: [PR] feat(visitors): Implement basic boolean expression visitors [iceberg-go]

2024-07-18 Thread via GitHub
Fokko commented on code in PR #108: URL: https://github.com/apache/iceberg-go/pull/108#discussion_r1675971993 ## exprs.go: ## @@ -538,11 +557,11 @@ func (up *unboundUnaryPredicate) Bind(schema *Schema, caseSensitive bool) (Boole // fast case optimizations switch

Re: [PR] Core, Spark: Spark writes/actions should only perform cleanup if failure is cleanable [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on PR #10373: URL: https://github.com/apache/iceberg/pull/10373#issuecomment-2237525207 > instead of normalizing non-cleanable failures to CommitStateUnknownExcpetion now engines, in this case Spark will explicitly handle CleanableFailure @amogh-jahagirdar thanks

Re: [PR] Core, Spark: Spark writes/actions should only perform cleanup if failure is cleanable [iceberg]

2024-07-18 Thread via GitHub
stevenzwu commented on PR #10373: URL: https://github.com/apache/iceberg/pull/10373#issuecomment-2237511439 @amogh-jahagirdar sorry, I have been OOO for a while > it may be better to go ahead and change the engines to explicitly handle CleanableFailure. I'll look into this! it

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
danielcweeks commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1683411769 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [I] Formal verification discovers potential consistency issue [iceberg]

2024-07-18 Thread via GitHub
Vanlightly commented on issue #10720: URL: https://github.com/apache/iceberg/issues/10720#issuecomment-2237357673 Thanks, I'll take a look and see if I can repro in a test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Formal verification discovers potential consistency issue [iceberg]

2024-07-18 Thread via GitHub
amogh-jahagirdar commented on issue #10720: URL: https://github.com/apache/iceberg/issues/10720#issuecomment-2237339155 Thanks Jack, if you're looking for a place to write a test the `TestConflictValidation` class https://github.com/apache/iceberg/blob/main/spark/v3.5/spark-extensions/src/t

[I] Formal verification discovers potential consistency issue [iceberg]

2024-07-18 Thread via GitHub
Vanlightly opened a new issue, #10720: URL: https://github.com/apache/iceberg/issues/10720 ### Apache Iceberg version 1.5.2 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I do a lot of formal verification of distributed systems and I h

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1683284743 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2237194797 Thanks @odysa - I must be going crazy, I thought I tried that already but you were right, that worked! 👍🏼 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [AWS] S3FileIO - Add Cross-Region Bucket Access [iceberg]

2024-07-18 Thread via GitHub
sfc-gh-schen commented on PR #9804: URL: https://github.com/apache/iceberg/pull/9804#issuecomment-2237119397 Hi, what is blocking from merging this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Feature/write to branch [iceberg-python]

2024-07-18 Thread via GitHub
vinjai opened a new pull request, #941: URL: https://github.com/apache/iceberg-python/pull/941 Fixes: https://github.com/apache/iceberg-python/issues/306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-18 Thread via GitHub
karuppayya commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1683209368 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Core: Refactor ZOrderByteUtils [iceberg]

2024-07-18 Thread via GitHub
RussellSpitzer commented on PR #10624: URL: https://github.com/apache/iceberg/pull/10624#issuecomment-2237069008 I think this is generally fine, but I do think the internal methods being called should probably be renamed rather that continuing to use the "long" and "double" method names.

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
odysa commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2236795680 @sdd `JoinHandle` in `Tokio` and `async-std` have different return types. In [Tokio](https://docs.rs/tokio/latest/src/tokio/runtime/task/join.rs.html#324) ```rs impl Future for J

Re: [PR] [DOCS] Fix link on Concepts page [iceberg]

2024-07-18 Thread via GitHub
nastra merged PR #10718: URL: https://github.com/apache/iceberg/pull/10718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
snazy commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1682988347 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-18 Thread via GitHub
RussellSpitzer commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1682980871 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[PR] Build: verify `gradle-wrapper jar` integrity [iceberg]

2024-07-18 Thread via GitHub
snazy opened a new pull request, #10719: URL: https://github.com/apache/iceberg/pull/10719 Verifies the integrity of the `gradle-wrapper.jar` by checking the sha256 checksum and storing it locally. This also ensures that the `gradle-wrapper.jar` automatically matches the Gradle version.

Re: [PR] [DOCS] Fix link on Concepts page [iceberg]

2024-07-18 Thread via GitHub
gaborkaszab commented on PR #10718: URL: https://github.com/apache/iceberg/pull/10718#issuecomment-2236664941 > @gaborkaszab does the link work when you deploy the website locally as described in https://github.com/apache/iceberg/blob/main/site/README.md? Yes, I deployed the page loca

Re: [PR] [DOCS] Fix link on Concepts page [iceberg]

2024-07-18 Thread via GitHub
nastra commented on PR #10718: URL: https://github.com/apache/iceberg/pull/10718#issuecomment-2236643716 @gaborkaszab does the link work when you deploy the website locally as described in https://github.com/apache/iceberg/blob/main/site/README.md? -- This is an automated message from the

[PR] [DOCS] Fix link on Concepts page [iceberg]

2024-07-18 Thread via GitHub
gaborkaszab opened a new pull request, #10718: URL: https://github.com/apache/iceberg/pull/10718 The link to the catalog properties was broken. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-18 Thread via GitHub
pvary commented on PR #10526: URL: https://github.com/apache/iceberg/pull/10526#issuecomment-2236539203 @zhongqishang: Finally found time to start the upstream discussion: https://lists.apache.org/thread/n5c85hd7psf2tmgych6scmynonscp2q4 -- This is an automated message from the Apache Git

Re: [PR] Support convert orc timestamptz [iceberg]

2024-07-18 Thread via GitHub
ming95 commented on PR #9905: URL: https://github.com/apache/iceberg/pull/9905#issuecomment-2236449363 > On #9784 you mentioned: > > > I think this is because hive and spark treat `timestamp` data type as timestamp with time zone and the orc file format is also stored as orc `tim

Re: [PR] Spark: Added ability to add uuid suffix to the table location in Hive catalog [iceberg]

2024-07-18 Thread via GitHub
deniskuzZ commented on PR #2850: URL: https://github.com/apache/iceberg/pull/2850#issuecomment-2236310737 > Finally I put together this in my head with the same issue in Hive: > > * https://issues.apache.org/jira/browse/HIVE-24445 > > Wouldn't this solve both the issues? >

Re: [I] write.metadata.metrics.max-inferred-column-defaults doesn't work for rewrite_data_file? [iceberg]

2024-07-18 Thread via GitHub
nk1506 commented on issue #10707: URL: https://github.com/apache/iceberg/issues/10707#issuecomment-2236218110 Hi @chenwyi2 , As I mentioned before if `write.metadata.metrics.default` has been configured it won't honour `write.metadata.metrics.max-inferred-column-defaults`. Since in your ca

Re: [PR] Fixes RemoveOrphanFiles delete files unexpected [iceberg]

2024-07-18 Thread via GitHub
findepi commented on PR #2890: URL: https://github.com/apache/iceberg/pull/2890#issuecomment-2236196722 > In this patch, we only compare the pure path (remove the schema and authority) when doing the `leftanti join`. This sounds reasonable to me, but there is one caveat. The table

Re: [PR] Fixes RemoveOrphanFiles delete files unexpected [iceberg]

2024-07-18 Thread via GitHub
findepi commented on code in PR #2890: URL: https://github.com/apache/iceberg/pull/2890#discussion_r1682626007 ## spark/src/test/java/org/apache/iceberg/actions/MockFileSystem.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
findepi commented on PR #10691: URL: https://github.com/apache/iceberg/pull/10691#issuecomment-2236165774 > i will address first two now, addressed. @rdblue PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682603568 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -20,65 +20,69 @@ import java.io.Closeable; import java.io.IOException; +import java.uti

Re: [I] An interesting name proposed: riceberg [iceberg-rust]

2024-07-18 Thread via GitHub
Xuanwo closed issue #449: An interesting name proposed: riceberg URL: https://github.com/apache/iceberg-rust/issues/449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682590569 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -88,16 +92,26 @@ private ParallelIterator( @Override public void close() {

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
findepi commented on PR #10691: URL: https://github.com/apache/iceberg/pull/10691#issuecomment-2236063813 Thanks @rdblue for your thorough review. I applied style fixes and outstanding items are - resuming background tasks earlier https://github.com/apache/iceberg/pull/10691#discus

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-18 Thread via GitHub
jeesou commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1682528208 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-18 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682523817 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -88,16 +92,26 @@ private ParallelIterator( @Override public void close() {

Re: [PR] Update the version in deprecation messages [iceberg]

2024-07-18 Thread via GitHub
findepi commented on PR #10715: URL: https://github.com/apache/iceberg/pull/10715#issuecomment-2236014654 thank you @amogh-jahagirdar ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1682309951 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +