Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
nastra commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2184567240 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -224,7 +235,9 @@ List filterManifests(Schema tableSchema, List manife private boolean ca

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
nastra commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2184551512 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -1130,6 +1132,11 @@ protected ManifestReader newManifestReader(ManifestFile manifest) {

Re: [PR] site: update daft docs [iceberg]

2025-07-03 Thread via GitHub
nastra merged PR #13463: URL: https://github.com/apache/iceberg/pull/13463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] S3: Add LegacyMd5Plugin to S3 client builder [iceberg]

2025-07-03 Thread via GitHub
nastra commented on code in PR #12264: URL: https://github.com/apache/iceberg/pull/12264#discussion_r2184495084 ## aws/src/main/java/org/apache/iceberg/aws/s3/DefaultS3FileIOAwsClientFactory.java: ## @@ -46,6 +46,7 @@ public void initialize(Map properties) { public S3Client s

Re: [PR] core: Support DV for partition stats [iceberg]

2025-07-03 Thread via GitHub
nastra commented on code in PR #13425: URL: https://github.com/apache/iceberg/pull/13425#discussion_r2184490432 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -109,7 +114,12 @@ public void liveEntry(ContentFile file, Snapshot snapshot) { break;

Re: [I] Table metadata corruption during parallel upsert operations [iceberg-python]

2025-07-03 Thread via GitHub
arul-cc commented on issue #2120: URL: https://github.com/apache/iceberg-python/issues/2120#issuecomment-3034456297 Thanks @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Table metadata corruption during parallel upsert operations [iceberg-python]

2025-07-03 Thread via GitHub
arul-cc closed issue #2120: Table metadata corruption during parallel upsert operations URL: https://github.com/apache/iceberg-python/issues/2120 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on PR #13106: URL: https://github.com/apache/iceberg/pull/13106#issuecomment-3034287282 @RussellSpitzer @amogh-jahagirdar thanks for the review. - I have tuned the test cases you pointed to make them meaningful, please take a look at each inline reply for details.

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2184185956 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/AncestorsOfProcedure.java: ## @@ -62,18 +67,23 @@ protected AncestorsOfProcedure doBuild() {

Re: [PR] Implement snapshot expiration and related unit tests [iceberg-python]

2025-07-03 Thread via GitHub
ForeverAngry closed pull request #2170: Implement snapshot expiration and related unit tests URL: https://github.com/apache/iceberg-python/pull/2170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Implement snapshot expiration and related unit tests [iceberg-python]

2025-07-03 Thread via GitHub
ForeverAngry opened a new pull request, #2170: URL: https://github.com/apache/iceberg-python/pull/2170 # Rationale for this change Removing unused member variable. # Are these changes tested? yes. # Are there any user-facing changes? No.

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2184173549 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestFastForwardBranchProcedure.java: ## @@ -167,18 +167,21 @@ public void testInvalid

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2184173549 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestFastForwardBranchProcedure.java: ## @@ -167,18 +167,21 @@ public void testInvalid

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2184170197 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRollbackToSnapshotProcedure.java: ## @@ -250,27 +250,33 @@ public void testInvali

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2184169715 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCherrypickSnapshotProcedure.java: ## @@ -168,22 +168,26 @@ public void testInvali

Re: [PR] core: Support DV for partition stats [iceberg]

2025-07-03 Thread via GitHub
ajantha-bhat commented on code in PR #13425: URL: https://github.com/apache/iceberg/pull/13425#discussion_r2184121767 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -109,7 +114,12 @@ public void liveEntry(ContentFile file, Snapshot snapshot) { brea

[PR] update daft links [iceberg-python]

2025-07-03 Thread via GitHub
ccmao1130 opened a new pull request, #2169: URL: https://github.com/apache/iceberg-python/pull/2169 # Rationale for this change We recently changed our site domain so want to update all Daft documentation links. And noticed that our package should be updated from `

Re: [PR] feat: basic table scan planning [iceberg-cpp]

2025-07-03 Thread via GitHub
gty404 commented on code in PR #112: URL: https://github.com/apache/iceberg-cpp/pull/112#discussion_r2183983587 ## src/iceberg/table_scan.cc: ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See

Re: [PR] Encryption for REST catalog [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #13224: URL: https://github.com/apache/iceberg/pull/13224#issuecomment-3033994276 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Enhanced License and Notice Report Generation [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #13220: URL: https://github.com/apache/iceberg/pull/13220#issuecomment-3033994222 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Encryption integration and test [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #13066: URL: https://github.com/apache/iceberg/pull/13066#issuecomment-3033994122 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Build: Bump com.aliyun.oss:aliyun-sdk-oss from 3.10.2 to 3.18.2 [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #12968: URL: https://github.com/apache/iceberg/pull/12968#issuecomment-3033994081 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] 在使用flink执行任务的时候报错,可以帮我看一下吗 [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] commented on issue #11822: URL: https://github.com/apache/iceberg/issues/11822#issuecomment-3033993853 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] 在使用flink执行任务的时候报错,可以帮我看一下吗 [iceberg]

2025-07-03 Thread via GitHub
github-actions[bot] closed issue #11822: 在使用flink执行任务的时候报错,可以帮我看一下吗 URL: https://github.com/apache/iceberg/issues/11822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[PR] site: update daft docs [iceberg]

2025-07-03 Thread via GitHub
ccmao1130 opened a new pull request, #13463: URL: https://github.com/apache/iceberg/pull/13463 We recently changed our site domain so want to update all Daft documentation links (I think my last PR https://github.com/apache/iceberg/pull/12860 didn't make it into release so all the li

Re: [PR] [1.9.x] Cherry pick Stop retrying on 502 / 504 [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on PR #13461: URL: https://github.com/apache/iceberg/pull/13461#issuecomment-3033899350 Merged, Thanks @singhpk234 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [1.9.x] Cherry pick Stop retrying on 502 / 504 [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer merged PR #13461: URL: https://github.com/apache/iceberg/pull/13461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spec: Add DV information in overview [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13189: URL: https://github.com/apache/iceberg/pull/13189#discussion_r2183702710 ## format/spec.md: ## @@ -101,10 +101,10 @@ Inheriting the sequence number from manifest metadata allows writing a new manif Row-level deletes are stored in del

Re: [PR] Spark: Use native table FileIO instead of Hadoop to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
singhpk234 commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183858089 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,24 @@ private String rebuildMetadata() {

Re: [PR] [1.9.x] Cherry pick Stop retrying on 502 / 504 [iceberg]

2025-07-03 Thread via GitHub
singhpk234 commented on PR #13461: URL: https://github.com/apache/iceberg/pull/13461#issuecomment-3033835294 cc @amogh-jahagirdar @RussellSpitzer @stevenzwu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Spark: Use native table FileIO instead of Hadoop to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183847443 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,24 @@ private String rebuildMetadata(

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183832620 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestWriteAborts.java: ## @@ -121,7 +121,7 @@ public void testBatchAppend() t

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183827290 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -44,19 +46,37 @@ import org.apache.spark.sql.connector.catalog.Iden

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183806637 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -44,19 +46,37 @@ import org.apache.spark.sql.connector.catalog

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183806637 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -44,19 +46,37 @@ import org.apache.spark.sql.connector.catalog

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183804631 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/AncestorsOfProcedure.java: ## @@ -62,18 +67,23 @@ protected AncestorsOfProcedure doBuild(

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183801608 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteManifestsProcedure.java: ## @@ -265,7 +266,7 @@ public void testRew

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183798986 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestFastForwardBranchProcedure.java: ## @@ -167,18 +167,21 @@ public void test

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183797309 ## spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -76,7 +71,7 @@ case class

Re: [I] Plugin class for catalog 'spark_catalog' does not implement CatalogPlugin: org.apache.spark.sql.hive.HiveSessionCatalog. [iceberg]

2025-07-03 Thread via GitHub
RussellSpitzer commented on issue #13460: URL: https://github.com/apache/iceberg/issues/13460#issuecomment-3033757352 I'm not sure this is related to Iceberg, but the code you are using assumes that the Spark instance is setup with the default spark_catalog which uses a Hive based implement

Re: [I] BigQuery: bigquery/src does not exist in the latest release (1.9.1) [iceberg]

2025-07-03 Thread via GitHub
ebyhr commented on issue #13456: URL: https://github.com/apache/iceberg/issues/13456#issuecomment-3033752679 I'm not sure why this is marked as a bug. Iceberg 1.9.1 doesn't contain `BigQueryMetastoreCatalog` as far as I know. You will need to wait for 1.10.0 release. -- This is an autom

Re: [PR] Spark-3.5: Add spark action to compute partition stats [iceberg]

2025-07-03 Thread via GitHub
amogh-jahagirdar merged PR #12450: URL: https://github.com/apache/iceberg/pull/12450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Spark: Use native table FileIO instead of Hadoop to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
singhpk234 commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183779315 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,24 @@ private String rebuildMetadata() {

Re: [I] [Spec] filed path update for Variant array metadata [iceberg]

2025-07-03 Thread via GitHub
aihuaxu commented on issue #13462: URL: https://github.com/apache/iceberg/issues/13462#issuecomment-3033728508 cc @rdblue @danielcweeks, @RussellSpitzer. Let me know if we have a solution to address this without making such spec change. -- This is an automated message from the Apache Gi

Re: [PR] Documented `row_filter` expressions [iceberg-python]

2025-07-03 Thread via GitHub
norton120 commented on PR #1862: URL: https://github.com/apache/iceberg-python/pull/1862#issuecomment-3033724018 > Thanks @norton120 I think this is a great addition to the docs 🙌 I left some small suggestions, let me know what you think of it 👍 @Fokko sorry, somehow this slipped th

[I] [Spec] filed path update for Variant array metadata [iceberg]

2025-07-03 Thread via GitHub
aihuaxu opened a new issue, #13462: URL: https://github.com/apache/iceberg/issues/13462 ### Proposed Change https://github.com/apache/iceberg/pull/12658 defines the spec for variant metadata. For a Variant column with the schema as follows: ``` { "event_type": "l

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183765078 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -1130,6 +1132,11 @@ protected ManifestReader newManifestReader(ManifestFile man

[PR] [1.9.x] Cherry pick Stop retrying on 502 / 504 [iceberg]

2025-07-03 Thread via GitHub
singhpk234 opened a new pull request, #13461: URL: https://github.com/apache/iceberg/pull/13461 ### About the change This change cherry-picks 502 / 504 not being retried to 1.9 branch so that we can proceed with 1.9.2 release -- This is an automated message from the Apach

Re: [PR] Cleanup in `expression-dsl.md` [iceberg-python]

2025-07-03 Thread via GitHub
Fokko commented on code in PR #2168: URL: https://github.com/apache/iceberg-python/pull/2168#discussion_r2183728459 ## mkdocs/docs/expression-dsl.md: ## @@ -151,20 +151,6 @@ age_in_range = Not( ) ``` -### Type Safety Review Comment: I think this is confusing, since this

[PR] Cleanup in `expression-dsl.md` [iceberg-python]

2025-07-03 Thread via GitHub
Fokko opened a new pull request, #2168: URL: https://github.com/apache/iceberg-python/pull/2168 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the Apache Git Ser

Re: [PR] Documented `row_filter` expressions [iceberg-python]

2025-07-03 Thread via GitHub
Fokko commented on PR #1862: URL: https://github.com/apache/iceberg-python/pull/1862#issuecomment-3033589429 I think this is very valuable, and let's merge this and follow up with a PR to clean up the two nits -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Documented `row_filter` expressions [iceberg-python]

2025-07-03 Thread via GitHub
Fokko merged PR #1862: URL: https://github.com/apache/iceberg-python/pull/1862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[I] Plugin class for catalog 'spark_catalog' does not implement CatalogPlugin: org.apache.spark.sql.hive.HiveSessionCatalog. [iceberg]

2025-07-03 Thread via GitHub
brunolnetto opened a new issue, #13460: URL: https://github.com/apache/iceberg/issues/13460 ### Query engine Spark ``` > pip show pyspark Name: pyspark Version: 4.0.0 Summary: Apache Spark Python API Home-page: https://github.com/apache/spark/tree/master/python

Re: [PR] core: Support DV for partition stats [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13425: URL: https://github.com/apache/iceberg/pull/13425#discussion_r2183690496 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -109,7 +114,12 @@ public void liveEntry(ContentFile file, Snapshot snapshot) { break;

Re: [PR] Spec: Add DV information in overview [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13189: URL: https://github.com/apache/iceberg/pull/13189#discussion_r2183702710 ## format/spec.md: ## @@ -101,10 +101,10 @@ Inheriting the sequence number from manifest metadata allows writing a new manif Row-level deletes are stored in del

Re: [PR] core: Support DV for partition stats [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13425: URL: https://github.com/apache/iceberg/pull/13425#discussion_r2183690496 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -109,7 +114,12 @@ public void liveEntry(ContentFile file, Snapshot snapshot) { break;

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183684054 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/BaseProcedure.java: ## @@ -178,6 +184,29 @@ protected InternalRow newInternalRow(Object... val

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183678138 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfU

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183677105 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfU

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183677105 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfU

Re: [PR] Spark: Use native table FileIO instead of Hadoop to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183670641 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,24 @@ private String rebuildMetadata() {

[PR] Added support for Bodo DataFrame [iceberg-python]

2025-07-03 Thread via GitHub
ehsantn opened a new pull request, #2167: URL: https://github.com/apache/iceberg-python/pull/2167 # Rationale for this change Adds support for Bodo DataFrame library, which is a drop in replacement for Pandas that accelerates and scales Python code automaticall

Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

2025-07-03 Thread via GitHub
Fokko commented on code in PR #13398: URL: https://github.com/apache/iceberg/pull/13398#discussion_r2183634318 ## api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java: ## @@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression unbound

Re: [PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183601029 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,42 @@ private String rebuildMetadata(

Re: [PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183601029 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,42 @@ private String rebuildMetadata(

Re: [PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183601029 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,42 @@ private String rebuildMetadata(

Re: [PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183594589 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,42 @@ private String rebuildMetadata(

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
jayceslesar commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3033347997 @Fokko @kevinjqliu do you think its worth setting up a roadmap for what should be candidates for rolling wheels from rust? Would really help focus efforts on lacking part

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
koenvo commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-309213 Totally agree. Lets start exploring the iceberg-rust codebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183558595 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfUsi

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183558595 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfUsi

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183558595 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfUsi

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
pan3793 commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183558595 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfUsi

Re: [PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13459: URL: https://github.com/apache/iceberg/pull/13459#discussion_r2183529878 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -312,22 +316,42 @@ private String rebuildMetadata() {

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
jayceslesar commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3033171165 > Honestly, I think it would be a better use of community resources to invest more in the iceberg-rust/datafusion path so that the bulk of this logic can be moved out

Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

2025-07-03 Thread via GitHub
manirajv06 commented on code in PR #13398: URL: https://github.com/apache/iceberg/pull/13398#discussion_r2183470286 ## api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java: ## @@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression un

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183446598 ## spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -76,7 +71,7 @@ case class Exte

Re: [PR] Spark 4.0: Migrate Iceberg Stored Procedures to Spark built-in implementations [iceberg]

2025-07-03 Thread via GitHub
szehon-ho commented on code in PR #13106: URL: https://github.com/apache/iceberg/pull/13106#discussion_r2183443385 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAncestorsOfProcedure.java: ## @@ -147,14 +147,16 @@ public void testAncestorOfU

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183335091 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -920,6 +920,8 @@ protected Map summary() { @Override public List apply(TableMet

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183335091 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -920,6 +920,8 @@ protected Map summary() { @Override public List apply(TableMet

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183335091 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -920,6 +920,8 @@ protected Map summary() { @Override public List apply(TableMet

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183335091 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -920,6 +920,8 @@ protected Map summary() { @Override public List apply(TableMet

Re: [I] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich commented on issue #13458: URL: https://github.com/apache/iceberg/issues/13458#issuecomment-3033030561 PR was opened here: https://github.com/apache/iceberg/pull/13459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Core: Keep track of data files to be removed for orphaned DV detection [iceberg]

2025-07-03 Thread via GitHub
stevenzwu commented on code in PR #13222: URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183270318 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -777,4 +778,40 @@ public void testNewDeleteFile() { .rewriteFiles(Sets.newSet(FIL

[PR] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich opened a new pull request, #13459: URL: https://github.com/apache/iceberg/pull/13459 RewriteTablePath leverages iceberg native IO for everything except for writing a "file list" file. For this task it currently leverages a Spark writer, which in turn uses Hadoop. This might

[I] Spark: Use ResolvingFileIO to save file list in RewriteTablePath [iceberg]

2025-07-03 Thread via GitHub
NikitaMatskevich opened a new issue, #13458: URL: https://github.com/apache/iceberg/issues/13458 ### Feature Request / Improvement RewriteTablePath leverages iceberg native IO for everything except for writing a "file list" file. For this task it currently leverages a Spark writer, w

Re: [PR] AWS: Add support to run all integration tests when S3 Analytics Accelerator is enabled [iceberg]

2025-07-03 Thread via GitHub
SanjayMarreddi commented on PR #13347: URL: https://github.com/apache/iceberg/pull/13347#issuecomment-3033021243 Hi @geruh, I have updated the PR with a new commit that enables running all the relevant integration tests with and without AAL as discussed above. Please have a look at it. Than

Re: [PR] feat(transaction): Add retry logic to transaction [iceberg-rust]

2025-07-03 Thread via GitHub
CTTY commented on PR #1484: URL: https://github.com/apache/iceberg-rust/pull/1484#issuecomment-3033003159 cc @liurenjie1024 @Xuanwo this PR is ready for review, PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

2025-07-03 Thread via GitHub
sriharshaj commented on PR #12634: URL: https://github.com/apache/iceberg/pull/12634#issuecomment-3032971075 Commenting to keep it alive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
corleyma commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3032961475 I think @Anton-Tarazi's original point -- creating a bunch of (Python object) filter expressions for every row in a large dataframe is going to be slow, and we do that befor

Re: [I] Kafka Connect sink fails to write snapshot when using dynamic routing with SMTs [iceberg]

2025-07-03 Thread via GitHub
rmoff commented on issue #13457: URL: https://github.com/apache/iceberg/issues/13457#issuecomment-3032947176 The problem here is the SMT. If I use dynamic routing and just rely on an existing field in the data, it works fine: ``` echo '{"target":"tmp.scratch","order_id": "001", "cu

[I] Kafka Connect sink fails to write snapshot when using dynamic routing with SMTs [iceberg]

2025-07-03 Thread via GitHub
rmoff opened a new issue, #13457: URL: https://github.com/apache/iceberg/issues/13457 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 ## Summary If I use ``` "iceberg.tables":"tmp.static_orders_json",

Re: [PR] Exclude org.apache.arrow.c.** from shading [iceberg]

2025-07-03 Thread via GitHub
huaxingao commented on PR #13410: URL: https://github.com/apache/iceberg/pull/13410#issuecomment-3032927845 We have more [discussion](https://github.com/apache/datafusion-comet/issues/1934) in Comet community and decided to keep the current arrow shading in iceberg. I will close this PR fo

Re: [PR] Exclude org.apache.arrow.c.** from shading [iceberg]

2025-07-03 Thread via GitHub
huaxingao closed pull request #13410: Exclude org.apache.arrow.c.** from shading URL: https://github.com/apache/iceberg/pull/13410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Spark 4.0: Support Spark Partial Limit Push Down [iceberg]

2025-07-03 Thread via GitHub
xiaoxuandev commented on PR #13451: URL: https://github.com/apache/iceberg/pull/13451#issuecomment-3032835154 @manuzhang That makes sense. I’ve updated the PR to target 4.0 only. We could backporting to 3.4 as well. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2025-07-03 Thread via GitHub
igorvoltaic commented on PR #11623: URL: https://github.com/apache/iceberg/pull/11623#issuecomment-3032806210 BTW, faced an issue with current type of routing. We are using datahub as metastore and there is no way to correctly map table to topic to produce linage in the metastore. -- Th

Re: [PR] refine: remove data_file_content in FileScanTask [iceberg-rust]

2025-07-03 Thread via GitHub
ZENOTME commented on PR #1485: URL: https://github.com/apache/iceberg-rust/pull/1485#issuecomment-3032757551 cc @liurenjie1024 @Xuanwo @Fokko @sdd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] refine: remove data_file_content in FileScanTask [iceberg-rust]

2025-07-03 Thread via GitHub
ZENOTME opened a new pull request, #1485: URL: https://github.com/apache/iceberg-rust/pull/1485 ## Which issue does this PR close? I think we don't need data_file_content in FileScanTask seems it's always be `Data` and the delete file will be stored in `deletes`. ## Wha

Re: [PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2025-07-03 Thread via GitHub
mun1r0b0t commented on PR #11623: URL: https://github.com/apache/iceberg/pull/11623#issuecomment-3032681544 Still here, still waiting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Core/REST: generify AuthSessionCache [iceberg]

2025-07-03 Thread via GitHub
danielcweeks commented on code in PR #12562: URL: https://github.com/apache/iceberg/pull/12562#discussion_r2183055209 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Manager.java: ## @@ -61,7 +61,7 @@ public class OAuth2Manager extends RefreshingAuthManager { private

  1   2   3   >