Re: [I] Add user-agent in GCSFileIO to identify Iceberg traffic on GCS [iceberg]

2025-06-30 Thread via GitHub
nastra closed issue #13393: Add user-agent in GCSFileIO to identify Iceberg traffic on GCS URL: https://github.com/apache/iceberg/issues/13393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Core/REST: generify AuthSessionCache [iceberg]

2025-06-30 Thread via GitHub
nastra commented on code in PR #12562: URL: https://github.com/apache/iceberg/pull/12562#discussion_r2176580216 ## .palantir/revapi.yml: ## @@ -1178,6 +1178,33 @@ acceptedBreaks: new: "class org.apache.iceberg.Metrics" justification: "Java serialization across vers

Re: [PR] Spark 4.0: Avoid relying on `SparkScan.hashCode()` for `SparkBatch.equals()` [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on PR #13437: URL: https://github.com/apache/iceberg/pull/13437#issuecomment-3022135511 @aokolnychyi Would you like to review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Core: introduce shared authentication refresh executor [iceberg]

2025-06-30 Thread via GitHub
nastra commented on code in PR #12563: URL: https://github.com/apache/iceberg/pull/12563#discussion_r2176566947 ## core/src/main/java/org/apache/iceberg/rest/auth/RefreshingAuthManager.java: ## @@ -18,73 +18,27 @@ */ package org.apache.iceberg.rest.auth; -import java.util.L

Re: [I] Request - Example with AWS s3 [iceberg-go]

2025-06-30 Thread via GitHub
gabrosys closed issue #469: Request - Example with AWS s3 URL: https://github.com/apache/iceberg-go/issues/469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Request - Example with AWS s3 [iceberg-go]

2025-06-30 Thread via GitHub
gabrosys commented on issue #469: URL: https://github.com/apache/iceberg-go/issues/469#issuecomment-3022090834 Thanks a lot for your support đŸ˜„ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Spark: Avoid using `Object.hashCode()` for equality of `SparkScan` implementations [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer opened a new pull request, #13437: URL: https://github.com/apache/iceberg/pull/13437 Currently `SparkBatch` relies on the parent instance of `SparkScan`'s `hashCode()` for checking the equality. https://github.com/apache/iceberg/blob/28b90ea1870643fcdb3afca5426656ab6caa8

Re: [PR] Spark: Registering tables to nonexistent target namespace leads to metadata deletion in HiveCatalog [iceberg]

2025-06-30 Thread via GitHub
nastra commented on code in PR #13434: URL: https://github.com/apache/iceberg/pull/13434#discussion_r2176556165 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/RegisterTableProcedure.java: ## @@ -86,6 +87,12 @@ public InternalRow[] call(InternalRow args) {

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
manuzhang commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3022067355 Are you able to connect to hive metastore for Spark tables without iceberg runtime? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
pan3793 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021935200 The setup looks correct to me, sorry, I may not be able to provide more effective suggestions. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
atinvento100 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021905978 I havent removed jars. And what i'm trying with is in pyspark and it isnt a spark submit. I have just installed pyspark and created a sparksession in local mode. Here is the s

Re: [PR] Read ManifestList V1 with V2 projection. [iceberg-rust]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1482: URL: https://github.com/apache/iceberg-rust/pull/1482#discussion_r2176483200 ## crates/iceberg/src/spec/manifest/entry.rs: ## @@ -563,6 +563,16 @@ pub(super) fn manifest_schema_v2(partition_type: &StructType) -> Result Vec { vec![ +

Re: [PR] Spark,Core: Refactor Delete OrphanFiles by moving common code from Spark to core [iceberg]

2025-06-30 Thread via GitHub
pvary commented on PR #13429: URL: https://github.com/apache/iceberg/pull/13429#issuecomment-3021855541 Hi @Guosmilesmile, This highlights, what I have missed during the review of the main PR. We need to have unit tests for the new API. Could you please create them? Thanks, Peter --

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
atinvento100 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021763423 @pan3793 , here's the full stacktrace ``` py4j.protocol.Py4JJavaError: An error occurred while calling o76.sql. : org.apache.iceberg.hive.RuntimeMetaException: Failed t

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
pan3793 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021789936 class `org.apache.thrift.transport.TFramedTransport` should exist at `libthrift-0.16.0.jar` ``` $ jar tf $SPARK_HOME/jars/libthrift-0.16.0.jar | grep TFramedTransport o

Re: [I] Spark Structured Streaming: Data not immediately visible after merge operations [iceberg]

2025-06-30 Thread via GitHub
nareshbab commented on issue #13431: URL: https://github.com/apache/iceberg/issues/13431#issuecomment-3021744943 @RussellSpitzer Here's the rudimentary code to replicate this. I hope this provides the clarity on the process Overall steps executed: - Start streaming pipeline

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
pan3793 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021722979 @atinvento100 what's the full stacktrace? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Spark 3.4: Backport #13061 fix for row lineage inheritance in distributed planning [iceberg]

2025-06-30 Thread via GitHub
stevenzwu merged PR #13436: URL: https://github.com/apache/iceberg/pull/13436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark 3.4: Backport #13061 fix for row lineage inheritance in distributed planning [iceberg]

2025-06-30 Thread via GitHub
stevenzwu commented on PR #13436: URL: https://github.com/apache/iceberg/pull/13436#issuecomment-3021701072 thanks @amogh-jahagirdar for the backport -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-30 Thread via GitHub
atinvento100 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-3021690071 @pan3793 , @manuzhang please help with issue faced when using iceberg-spark-runtime-4.0_2.13:1.10.0 jar. (Caused by: java.lang.ClassNotFoundException: org.apache.thrift

Re: [PR] Spark: Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on PR #13433: URL: https://github.com/apache/iceberg/pull/13433#issuecomment-3021626382 cc @pvary @huaxingao Would you like to have a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Spark: Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on PR #13433: URL: https://github.com/apache/iceberg/pull/13433#issuecomment-3021620518 The feature is now only effective for Spark 4.0. It could be ported to 3.4 and 3.5 after being landed. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Spark: Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on code in PR #13433: URL: https://github.com/apache/iceberg/pull/13433#discussion_r2175544751 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -19,11 +19,18 @@ package org.apache.iceberg.spark; import java.time

Re: [PR] Spark: Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on code in PR #13433: URL: https://github.com/apache/iceberg/pull/13433#discussion_r2175544751 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -19,11 +19,18 @@ package org.apache.iceberg.spark; import java.time

Re: [PR] Spark 3.5, 4.0: ERROR when executing DML queries with identifier fields [iceberg]

2025-06-30 Thread via GitHub
manuzhang commented on code in PR #13435: URL: https://github.com/apache/iceberg/pull/13435#discussion_r2176316357 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkMetadataColumns.java: ## @@ -316,6 +316,42 @@ public void testConflictingColumns() {

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-06-30 Thread via GitHub
manuzhang commented on PR #13194: URL: https://github.com/apache/iceberg/pull/13194#issuecomment-3021535378 cc @nastra @RussellSpitzer please help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] Build: Bump huggingface-hub from 0.33.0 to 0.33.1 [iceberg-python]

2025-06-30 Thread via GitHub
dependabot[bot] opened a new pull request, #2165: URL: https://github.com/apache/iceberg-python/pull/2165 Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.33.0 to 0.33.1. Release notes Sourced from https://github.com/huggingface/huggingface_hub/release

[PR] Build: Bump mypy-boto3-glue from 1.38.42 to 1.39.0 [iceberg-python]

2025-06-30 Thread via GitHub
dependabot[bot] opened a new pull request, #2164: URL: https://github.com/apache/iceberg-python/pull/2164 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.38.42 to 1.39.0. Release notes Sourced from https://github.com/youtype/mypy_boto3_builder/releases

[PR] Build: Bump mypy-boto3-dynamodb from 1.38.4 to 1.39.0 [iceberg-python]

2025-06-30 Thread via GitHub
dependabot[bot] opened a new pull request, #2163: URL: https://github.com/apache/iceberg-python/pull/2163 Bumps [mypy-boto3-dynamodb](https://github.com/youtype/mypy_boto3_builder) from 1.38.4 to 1.39.0. Release notes Sourced from https://github.com/youtype/mypy_boto3_builder/relea

[PR] Build: Bump pyroaring from 1.0.1 to 1.0.2 [iceberg-python]

2025-06-30 Thread via GitHub
dependabot[bot] opened a new pull request, #2162: URL: https://github.com/apache/iceberg-python/pull/2162 Bumps [pyroaring](https://github.com/Ezibenroc/PyRoaringBitMap) from 1.0.1 to 1.0.2. Release notes Sourced from https://github.com/Ezibenroc/PyRoaringBitMap/releases";>pyroarin

Re: [I] ERROR when executing UPDATE/DELETE queries in Iceberg 1.6.0: "Cannot add fieldId 1 as an identifier field" [iceberg]

2025-06-30 Thread via GitHub
szehon-ho commented on issue #11341: URL: https://github.com/apache/iceberg/issues/11341#issuecomment-3021413540 Trying a fix at #13435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Spark 3.5, 4.0: ERROR when executing DML queries with identifier fields [iceberg]

2025-06-30 Thread via GitHub
szehon-ho commented on PR #13435: URL: https://github.com/apache/iceberg/pull/13435#issuecomment-3021411446 FYI @manuzhang, @dramaticlly . Also @amogh-jahagirdar @huaxingao can you help take a look? -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] Spark 3.4: Backport #13061 fix for row lineage inheritance in distributed planning [iceberg]

2025-06-30 Thread via GitHub
amogh-jahagirdar opened a new pull request, #13436: URL: https://github.com/apache/iceberg/pull/13436 Backport #13061 fix for row lineage inheritance in distributed planning to 3.4 This backport is a clean backport -- This is an automated message from the Apache Git Service. To res

Re: [PR] Spark, Avro: Add support for row lineage in Avro reader [iceberg]

2025-06-30 Thread via GitHub
amogh-jahagirdar commented on code in PR #13070: URL: https://github.com/apache/iceberg/pull/13070#discussion_r2176182992 ## core/src/main/java/org/apache/iceberg/avro/ValueReaders.java: ## @@ -1235,4 +1265,64 @@ public void setRowPositionSupplier(Supplier posSupplier) {

Re: [D] Table Schema / Partition update via Rest Catalog [iceberg-rust]

2025-06-30 Thread via GitHub
GitHub user macpie closed a discussion: Table Schema / Partition update via Rest Catalog I am confused by if the Rest Catalog does support table updates or not. It seems that the catalog as a function [update_table](https://github.com/apache/iceberg-rust/blob/main/crates/catalog/rest/src/cat

Re: [PR] AWS: Fix DynamoDB and Glue integration test failures [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on PR #12718: URL: https://github.com/apache/iceberg/pull/12718#issuecomment-3021252522 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on PR #13194: URL: https://github.com/apache/iceberg/pull/13194#issuecomment-3021253005 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Proposal: IRC Events endpoint [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on PR #12584: URL: https://github.com/apache/iceberg/pull/12584#issuecomment-3021252467 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Spark: Avoid closing deserialized copies of shared resources like FileIO [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on PR #12868: URL: https://github.com/apache/iceberg/pull/12868#issuecomment-3021252861 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] AWS: Refactor DynamoDB and Glue properties into separated properties classes [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on PR #12722: URL: https://github.com/apache/iceberg/pull/12722#issuecomment-3021252636 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] commented on issue #7226: URL: https://github.com/apache/iceberg/issues/7226#issuecomment-3021252350 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2025-06-30 Thread via GitHub
github-actions[bot] closed issue #7226: Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions URL: https://github.com/apache/iceberg/issues/7226 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-30 Thread via GitHub
huaxingao commented on PR #13166: URL: https://github.com/apache/iceberg/pull/13166#issuecomment-3021235439 Merged. Thanks @himadripal for the PR! Thanks @szehon-ho for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-30 Thread via GitHub
huaxingao merged PR #13166: URL: https://github.com/apache/iceberg/pull/13166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark 3.5, 4.0: ERROR when executing DML queries with identifier fields [iceberg]

2025-06-30 Thread via GitHub
szehon-ho closed pull request #13435: Spark 3.5, 4.0: ERROR when executing DML queries with identifier fields URL: https://github.com/apache/iceberg/pull/13435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add encryption key support for v3 [iceberg-python]

2025-06-30 Thread via GitHub
rambleraptor commented on PR #2118: URL: https://github.com/apache/iceberg-python/pull/2118#issuecomment-3021137388 @kevinjqliu happy to add that e2e test! Would you prefer I wait for the release + add the test or just file an issue? I know v3 writing is blocked on this feature. -- This

Re: [PR] Row lineage fields for v3 [iceberg-python]

2025-06-30 Thread via GitHub
rambleraptor commented on PR #2129: URL: https://github.com/apache/iceberg-python/pull/2129#issuecomment-3021136694 @kevinjqliu happy to add that e2e test! Would you prefer I wait for the release + add the test or just file an issue. I know v3 writing is blocked on this feature. -- This

Re: [PR] Row lineage fields for v3 [iceberg-python]

2025-06-30 Thread via GitHub
rambleraptor commented on PR #2129: URL: https://github.com/apache/iceberg-python/pull/2129#issuecomment-3021136430 @kevinjqliu happy to add that e2e test! Would you prefer I wait for the release + add the test or just file an issue. I know v3 writing is blocked on this feature. -- This

[PR] Spark 3.5: ERROR when executing DML queries with identifier fields [iceberg]

2025-06-30 Thread via GitHub
szehon-ho opened a new pull request, #13435: URL: https://github.com/apache/iceberg/pull/13435 Fixes #11341 This fixes a bug introduced in https://github.com/apache/iceberg/pull/10547, where metadata tables are broken for tables with identifier columns. Metadata table schemas got th

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-30 Thread via GitHub
szehon-ho commented on code in PR #13166: URL: https://github.com/apache/iceberg/pull/13166#discussion_r2176120906 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/DaysFunction.java: ## @@ -70,6 +73,11 @@ public String name() { public DataType resultType

[PR] Read ManifestList V1 with V2 projection. [iceberg-rust]

2025-06-30 Thread via GitHub
rambleraptor opened a new pull request, #1482: URL: https://github.com/apache/iceberg-rust/pull/1482 ## Which issue does this PR close? - Closes #1471 ## What changes are included in this PR? On ManifestList data files in v1, this sets the default content-type

Re: [I] V3 Tracking issue [iceberg-python]

2025-06-30 Thread via GitHub
rambleraptor commented on issue #1818: URL: https://github.com/apache/iceberg-python/issues/1818#issuecomment-3020820458 @stevie9868 are you still planning on taking on the deletion vector work? I'm happy to contribute some cycles if you don't have any -- This is an automated message fro

Re: [PR] Build: Bump derby from 10.15.2.0 to 10.17.1.0 [iceberg]

2025-06-30 Thread via GitHub
Fokko commented on PR #13419: URL: https://github.com/apache/iceberg/pull/13419#issuecomment-3020768113 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Build: Bump com.azure:azure-sdk-bom from 1.2.31 to 1.2.35 [iceberg]

2025-06-30 Thread via GitHub
Fokko merged PR #13201: URL: https://github.com/apache/iceberg/pull/13201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [I] Read ManifestList V1 with V2 projection. [iceberg-rust]

2025-06-30 Thread via GitHub
Fokko commented on issue #1471: URL: https://github.com/apache/iceberg-rust/issues/1471#issuecomment-3020704939 @rambleraptor Sure thing! I wanted to work on this myself, but I'm occupied with other things to fix first. Let me know if you bump into anything. Happy to help! -- This is an

Re: [PR] Ignore partition fields that are dropped from the current-schema [iceberg]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #11868: URL: https://github.com/apache/iceberg/pull/11868#discussion_r2175923490 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java: ## @@ -583,4 +587,44 @@ private void createTable(St

Re: [PR] Ignore partition fields that are dropped from the current-schema [iceberg]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #11868: URL: https://github.com/apache/iceberg/pull/11868#discussion_r2175922779 ## core/src/main/java/org/apache/iceberg/Partitioning.java: ## @@ -239,7 +239,8 @@ public static StructType groupingKeyType(Schema schema, Collection specs = table.spe

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade merged PR #468: URL: https://github.com/apache/iceberg-go/pull/468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [I] Ensure absolute path when referencing any file paths [iceberg-python]

2025-06-30 Thread via GitHub
rambleraptor commented on issue #1730: URL: https://github.com/apache/iceberg-python/issues/1730#issuecomment-3020645758 Do you mind assigning this to me? Happy to take it on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Read ManifestList V1 with V2 projection. [iceberg-rust]

2025-06-30 Thread via GitHub
rambleraptor commented on issue #1471: URL: https://github.com/apache/iceberg-rust/issues/1471#issuecomment-3020633957 Can I have this assigned to me? Happy to take it on! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Spark: Table registration to nonexistent target namespace leads to metadata deletion in HiveCatalog [iceberg]

2025-06-30 Thread via GitHub
hsiang-c commented on code in PR #13434: URL: https://github.com/apache/iceberg/pull/13434#discussion_r2175876377 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java: ## @@ -28,8 +28,12 @@ public enum SparkCatalogConfig { "testhive",

Re: [PR] feat(table): Implement snapshot expiration [iceberg-go]

2025-06-30 Thread via GitHub
arnaudbriche commented on code in PR #401: URL: https://github.com/apache/iceberg-go/pull/401#discussion_r2175852857 ## table/updates.go: ## @@ -382,7 +390,85 @@ func NewRemoveSnapshotsUpdate(ids []int64) Update { } func (u *removeSnapshotsUpdate) Apply(builder *MetadataBuil

Re: [PR] Table registration to nonexistent target namespace leads to metadata deletion in HiveCatalog [iceberg]

2025-06-30 Thread via GitHub
hsiang-c commented on code in PR #13434: URL: https://github.com/apache/iceberg/pull/13434#discussion_r2175850296 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java: ## @@ -28,8 +28,12 @@ public enum SparkCatalogConfig { "testhive",

[PR] Table registration to nonexistent target namespace leads to metadata deletion in HiveCatalog [iceberg]

2025-06-30 Thread via GitHub
hsiang-c opened a new pull request, #13434: URL: https://github.com/apache/iceberg/pull/13434 Related to: https://github.com/apache/iceberg/issues/1533 ### Context - We're registering existing Iceberg tables to `HiveCatalog` and realize that the `metadata.json` files used are delet

Re: [PR] Spark 4.0: Row Lineage support [iceberg]

2025-06-30 Thread via GitHub
RussellSpitzer commented on code in PR #13310: URL: https://github.com/apache/iceberg/pull/13310#discussion_r2175791136 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java: ## @@ -177,7 +177,19 @@ public static Ob

Re: [I] Request - Example with AWS s3 [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade commented on issue #469: URL: https://github.com/apache/iceberg-go/issues/469#issuecomment-3020432654 @laskoviymishka We need to get around to adding in some examples for the docs :smile: -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] API: Support bucketing by struct [iceberg]

2025-06-30 Thread via GitHub
stevenzwu commented on PR #13430: URL: https://github.com/apache/iceberg/pull/13430#issuecomment-3020420405 Is this needed with the work on https://github.com/apache/iceberg/pull/12897? When multi-column bucketing is actually implemented, we probably also need to clarify the behavior

Re: [PR] Scan Delete Support Part 5: Positional Delete Parsing [iceberg-rust]

2025-06-30 Thread via GitHub
sdd commented on code in PR #1011: URL: https://github.com/apache/iceberg-rust/pull/1011#discussion_r2175684449 ## crates/iceberg/src/delete_vector.rs: ## @@ -38,6 +38,15 @@ impl DeleteVector { let outer = self.inner.bitmaps(); DeleteVectorIterator { outer, inn

Re: [I] Request - Example with AWS s3 [iceberg-go]

2025-06-30 Thread via GitHub
laskoviymishka commented on issue #469: URL: https://github.com/apache/iceberg-go/issues/469#issuecomment-3020329541 To see how to integrate with this library take a look [here](https://github.com/transferia/iceberg) 1. [Reader](https://github.com/transferia/iceberg/blob/main/storage.

Re: [PR] Scan Delete Support Part 5: Positional Delete Parsing [iceberg-rust]

2025-06-30 Thread via GitHub
sdd merged PR #1011: URL: https://github.com/apache/iceberg-rust/pull/1011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Scan Delete Support Part 5: Positional Delete Parsing [iceberg-rust]

2025-06-30 Thread via GitHub
sdd commented on code in PR #1011: URL: https://github.com/apache/iceberg-rust/pull/1011#discussion_r2175650926 ## crates/iceberg/src/delete_vector.rs: ## @@ -38,6 +38,15 @@ impl DeleteVector { let outer = self.inner.bitmaps(); DeleteVectorIterator { outer, inn

Re: [PR] Flink 2.0: Replace Caffeine maxSize cache with LRUCache [iceberg]

2025-06-30 Thread via GitHub
pvary commented on PR #13382: URL: https://github.com/apache/iceberg/pull/13382#issuecomment-3020218058 > using a time-based eviction policy, we have seen worse performance IIUC, we access this cache only a few times every checkpoint. Sum of (table x parallelism for the table). Not ve

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade commented on code in PR #468: URL: https://github.com/apache/iceberg-go/pull/468#discussion_r2175581133 ## partitions.go: ## @@ -293,3 +293,27 @@ func AssignFreshPartitionSpecIDs(spec *PartitionSpec, old, fresh *Schema) (Parti return NewPartitionSpec(newFiel

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
lliangyu-lin commented on code in PR #468: URL: https://github.com/apache/iceberg-go/pull/468#discussion_r2175578401 ## partitions.go: ## @@ -293,3 +293,27 @@ func AssignFreshPartitionSpecIDs(spec *PartitionSpec, old, fresh *Schema) (Parti return NewPartitionSpec(newF

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
lliangyu-lin commented on code in PR #468: URL: https://github.com/apache/iceberg-go/pull/468#discussion_r2175578401 ## partitions.go: ## @@ -293,3 +293,27 @@ func AssignFreshPartitionSpecIDs(spec *PartitionSpec, old, fresh *Schema) (Parti return NewPartitionSpec(newF

Re: [PR] AWS: Add support to run all integration tests when S3 Analytics Accelerator is enabled [iceberg]

2025-06-30 Thread via GitHub
SanjayMarreddi commented on PR #13347: URL: https://github.com/apache/iceberg/pull/13347#issuecomment-3020109000 > > @nastra @jackye1995 @geruh May I request for a review on this PR please? > > PS: There are few other PRs ( #13348, #13361 ) lined up depending on this. > > I don't r

Re: [I] Iceberg BatchScan & SparkDistributedDataScan to support `limit` pushdown [iceberg]

2025-06-30 Thread via GitHub
xiaoxuandev commented on issue #13383: URL: https://github.com/apache/iceberg/issues/13383#issuecomment-3020102980 Hi @devanshuraj , I am working on this, will raise the PR soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Spark Structured Streaming: Data not immediately visible after merge operations [iceberg]

2025-06-30 Thread via GitHub
RussellSpitzer commented on issue #13431: URL: https://github.com/apache/iceberg/issues/13431#issuecomment-3020082336 Could you elaborate a bit more? It sounds like the behavior is Batch N Starts Batch N Stops ForEach Executes and Writes Some Gap Time Here? Batch N+

Re: [PR] Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on code in PR #13433: URL: https://github.com/apache/iceberg/pull/13433#discussion_r2175544751 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -19,11 +19,18 @@ package org.apache.iceberg.spark; import java.time

Re: [PR] Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on code in PR #13433: URL: https://github.com/apache/iceberg/pull/13433#discussion_r2175544751 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -19,11 +19,18 @@ package org.apache.iceberg.spark; import java.time

Re: [PR] Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer commented on code in PR #13433: URL: https://github.com/apache/iceberg/pull/13433#discussion_r2175544751 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -19,11 +19,18 @@ package org.apache.iceberg.spark; import java.time

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade commented on code in PR #468: URL: https://github.com/apache/iceberg-go/pull/468#discussion_r2175541974 ## partitions.go: ## @@ -293,3 +293,27 @@ func AssignFreshPartitionSpecIDs(spec *PartitionSpec, old, fresh *Schema) (Parti return NewPartitionSpec(newFiel

[PR] Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-06-30 Thread via GitHub
zhztheplayer opened a new pull request, #13433: URL: https://github.com/apache/iceberg/pull/13433 A patch to make the API `SparkBatch.createReaderFactory` customizable. ### Reason User might need to customize the Spark partition reader in deep without going through Iceberg's b

Re: [PR] Spark, Avro: Add support for row lineage in Avro reader [iceberg]

2025-06-30 Thread via GitHub
stevenzwu commented on code in PR #13070: URL: https://github.com/apache/iceberg/pull/13070#discussion_r2175492438 ## core/src/main/java/org/apache/iceberg/avro/ValueReaders.java: ## @@ -1235,4 +1265,64 @@ public void setRowPositionSupplier(Supplier posSupplier) { this.c

Re: [PR] feat(partitions): Add support for get partition field name [iceberg-go]

2025-06-30 Thread via GitHub
lliangyu-lin commented on code in PR #468: URL: https://github.com/apache/iceberg-go/pull/468#discussion_r2175505847 ## partitions_test.go: ## @@ -189,3 +189,42 @@ func TestPartitionSpecToPath(t *testing.T) { assert.Equal(t, "my%23str%25bucket=my%2Bstr/other+str%2Bbucke

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175497964 ## pyiceberg/io/pyarrow.py: ## @@ -1814,9 +1814,12 @@ def struct( array = self._cast_if_needed(field, field_array) field_arrays.a

Re: [PR] AWS: Refactor S3FileIOProperties to use common builder interface [iceberg]

2025-06-30 Thread via GitHub
RussellSpitzer merged PR #13183: URL: https://github.com/apache/iceberg/pull/13183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175469014 ## pyiceberg/expressions/visitors.py: ## @@ -893,15 +893,28 @@ def visit_unbound_predicate(self, predicate: UnboundPredicate[L]) -> BooleanExpr raise Typ

Re: [PR] dep: pin transitive dep `google-cloud-storage >=2.0.0` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on PR #2161: URL: https://github.com/apache/iceberg-python/pull/2161#issuecomment-3019844000 I'm confused by both the dependabot PR, but also this one đŸ¤£ PyIceberg directly depends on `cachetools`: https://github.com/apache/iceberg-python/blob/5e975d569e243f1e67e8021a6

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175447965 ## tests/integration/test_reads.py: ## @@ -1024,3 +1025,31 @@ def test_scan_with_datetime(catalog: Catalog) -> None: df = table.scan(row_filter=LessThan("da

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175447526 ## pyiceberg/expressions/visitors.py: ## @@ -893,15 +893,28 @@ def visit_unbound_predicate(self, predicate: UnboundPredicate[L]) -> BooleanExpr raise Typ

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175445140 ## tests/io/test_pyarrow.py: ## @@ -2398,6 +2398,17 @@ def test_identity_partition_on_multi_columns() -> None: ) == arrow_table.sort_by([("born_year", "as

Re: [PR] Update schema projection to support `initial-defaults` [iceberg-python]

2025-06-30 Thread via GitHub
Fokko commented on code in PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#discussion_r2175440267 ## tests/integration/test_reads.py: ## @@ -1024,3 +1025,31 @@ def test_scan_with_datetime(catalog: Catalog) -> None: df = table.scan(row_filter=LessThan("da

Re: [PR] feat(table): Implement snapshot expiration [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade commented on code in PR #401: URL: https://github.com/apache/iceberg-go/pull/401#discussion_r2175434203 ## table/transaction.go: ## @@ -142,6 +143,120 @@ func (t *Transaction) SetProperties(props iceberg.Properties) error { return nil } +type expireSnapshot

[PR] dep: pin transitive dep `google-cloud-storage >=2.0.0` [iceberg-python]

2025-06-30 Thread via GitHub
kevinjqliu opened a new pull request, #2161: URL: https://github.com/apache/iceberg-python/pull/2161 # Rationale for this change Older versions of google libraries throw ``` E UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en

Re: [PR] Core, Data: File Format API interfaces [iceberg]

2025-06-30 Thread via GitHub
pvary commented on code in PR #12774: URL: https://github.com/apache/iceberg/pull/12774#discussion_r2175427527 ## core/src/main/java/org/apache/iceberg/io/ObjectModel.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

Re: [PR] feat(table): Implement snapshot expiration [iceberg-go]

2025-06-30 Thread via GitHub
zeroshade commented on code in PR #401: URL: https://github.com/apache/iceberg-go/pull/401#discussion_r2175426593 ## table/updates.go: ## @@ -382,7 +390,85 @@ func NewRemoveSnapshotsUpdate(ids []int64) Update { } func (u *removeSnapshotsUpdate) Apply(builder *MetadataBuilder

Re: [PR] refactor: add factory functions for primitive types [iceberg-cpp]

2025-06-30 Thread via GitHub
mapleFU commented on code in PR #134: URL: https://github.com/apache/iceberg-cpp/pull/134#discussion_r2175398602 ## src/iceberg/type.h: ## @@ -446,4 +446,48 @@ class ICEBERG_EXPORT UuidType : public PrimitiveType { /// @} +/// \defgroup type-factories Factory functions for

[PR] Docs: metadata deletion doc fix [iceberg]

2025-06-30 Thread via GitHub
yguy-ryft opened a new pull request, #13432: URL: https://github.com/apache/iceberg/pull/13432 The current documentation around metadata files, and when they become untracked and deleted, is a bit confusing IMO. Seems like there's quite a few threads in the community about this: https:

Re: [PR] feat: add avro reader to registry [iceberg-cpp]

2025-06-30 Thread via GitHub
mapleFU commented on code in PR #133: URL: https://github.com/apache/iceberg-cpp/pull/133#discussion_r2175372250 ## src/iceberg/file_reader.h: ## @@ -130,12 +76,12 @@ struct ICEBERG_EXPORT ReaderOptions { std::optional split; /// \brief The batch size to read. Only applies

  1   2   >