[PR] Build: Bump software.amazon.awssdk:bom from 2.31.63 to 2.31.68 [iceberg]

2025-06-21 Thread via GitHub
dependabot[bot] opened a new pull request, #13364: URL: https://github.com/apache/iceberg/pull/13364 Bumps software.amazon.awssdk:bom from 2.31.63 to 2.31.68. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=soft

[PR] Build: Bump testcontainers from 1.21.1 to 1.21.2 [iceberg]

2025-06-21 Thread via GitHub
dependabot[bot] opened a new pull request, #13363: URL: https://github.com/apache/iceberg/pull/13363 Bumps `testcontainers` from 1.21.1 to 1.21.2. Updates `org.testcontainers:testcontainers` from 1.21.1 to 1.21.2 Release notes Sourced from https://github.com/testcontainers/testco

[PR] Build: Bump datamodel-code-generator from 0.31.0 to 0.31.1 [iceberg]

2025-06-21 Thread via GitHub
dependabot[bot] opened a new pull request, #13362: URL: https://github.com/apache/iceberg/pull/13362 Bumps [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator) from 0.31.0 to 0.31.1. Release notes Sourced from https://github.com/koxudaxi/datamodel-code-

Re: [I] Before expiring snapshots is there need to provide history snapshot file statistics [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] commented on issue #11213: URL: https://github.com/apache/iceberg/issues/11213#issuecomment-2993835603 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] commented on PR #9335: URL: https://github.com/apache/iceberg/pull/9335#issuecomment-2993835582 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pull

[PR] Create devcontainer.json [iceberg-python]

2025-06-21 Thread via GitHub
Kelleretoro opened a new pull request, #2135: URL: https://github.com/apache/iceberg-python/pull/2135 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the Apache G

Re: [I] why spark ddl rename iceberg table name not change location? does it matter? [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] closed issue #10436: why spark ddl rename iceberg table name not change location? does it matter? URL: https://github.com/apache/iceberg/issues/10436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] k1 [iceberg-python]

2025-06-21 Thread via GitHub
Kelleretoro opened a new issue, #2134: URL: https://github.com/apache/iceberg-python/issues/2134 ### Feature Request / Improvement [https://catalog.cloudflarestorage.com/b0dc3a871242199bf154bc084ec0df45/k92](url) -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Spark 3.5: Verify base snapshot hasn't changed before commit in RemoveDanglingDeletesSparkAction [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] commented on PR #13120: URL: https://github.com/apache/iceberg/pull/13120#issuecomment-2993835681 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [I] why spark ddl rename iceberg table name not change location? does it matter? [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] commented on issue #10436: URL: https://github.com/apache/iceberg/issues/10436#issuecomment-2993835594 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Before expiring snapshots is there need to provide history snapshot file statistics [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] closed issue #11213: Before expiring snapshots is there need to provide history snapshot file statistics URL: https://github.com/apache/iceberg/issues/11213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] closed issue #11465: Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes URL: https://github.com/apache/iceberg/issues/11465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes [iceberg]

2025-06-21 Thread via GitHub
github-actions[bot] commented on issue #11465: URL: https://github.com/apache/iceberg/issues/11465#issuecomment-2993835621 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

[PR] perf: `table.add_files` and `inspect.files` [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar opened a new pull request, #2133: URL: https://github.com/apache/iceberg-python/pull/2133 Should close #2130 and #2132 I didn't see anywhere else where looping over manifest entries was parallelized I don't think so seems better to parallelize across manifests than within

Re: [PR] Table commit retries based on table properties [iceberg-python]

2025-06-21 Thread via GitHub
potatochipcoconut commented on PR #330: URL: https://github.com/apache/iceberg-python/pull/330#issuecomment-2993704719 @Buktoria is this still going to move forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Duplicate File Remediation [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on issue #2130: URL: https://github.com/apache/iceberg-python/issues/2130#issuecomment-2993695822 Looks like the performance hit comes from https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L850 -- This is an automated message from the A

Re: [PR] [Avro] Accept dict with only `'type': 'null'` as representation of `null` [iceberg-python]

2025-06-21 Thread via GitHub
kevinjqliu commented on PR #2109: URL: https://github.com/apache/iceberg-python/pull/2109#issuecomment-2993688033 @Tishj I was not able to find or generate an avro manifest list file to verify this. Do you have one? -- This is an automated message from the Apache Git Service. To respond

Re: [I] pyiceberg produces invalid avro if a partition name has an emoji (any non-alphanumeric character I guess, including dots or starting with digits) [iceberg-python]

2025-06-21 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-2993641323 we already have this helper function https://github.com/apache/iceberg-python/blob/89e71c36f26d1f3da48090ddfa137a698e2a06fc/pyiceberg/schema.py#L1364-L1374 -- Th

Re: [I] Support Concurrency Safety Validation: Implement `validateNoNewDeleteFiles` [iceberg-python]

2025-06-21 Thread via GitHub
gabeiglio commented on issue #1930: URL: https://github.com/apache/iceberg-python/issues/1930#issuecomment-2993677257 I'll start working on this one :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] How can I speed up batched add_files calls? [iceberg-python]

2025-06-21 Thread via GitHub
thijsheijden commented on issue #2132: URL: https://github.com/apache/iceberg-python/issues/2132#issuecomment-2993662177 It is most definitely that, I did not realise there was an option to turn off duplicate file checking ๐Ÿคฆ๐Ÿผ. Running it now every batch takes the same time, and it is 100x

Re: [PR] Flink: Supports delete orphan files in TableMaintenance [iceberg]

2025-06-21 Thread via GitHub
Guosmilesmile commented on code in PR #13302: URL: https://github.com/apache/iceberg/pull/13302#discussion_r2160070532 ## flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/TableReader.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software F

Re: [PR] feat: delete orphaned files [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on code in PR #1958: URL: https://github.com/apache/iceberg-python/pull/1958#discussion_r2160070215 ## pyiceberg/table/inspect.py: ## @@ -678,6 +689,32 @@ def all_manifests(self) -> "pa.Table": ) return pa.concat_tables(manifests_by_snapsh

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-21 Thread via GitHub
atinvento100 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-2993632194 Thank you @manuzhang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] feat: delete orphaned files [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on code in PR #1958: URL: https://github.com/apache/iceberg-python/pull/1958#discussion_r2160069635 ## pyiceberg/table/maintenance.py: ## @@ -0,0 +1,117 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [I] How can I speed up batched add_files calls? [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on issue #2132: URL: https://github.com/apache/iceberg-python/issues/2132#issuecomment-2993649449 It would be awesome if you could run https://github.com/benfred/py-spy and attach the produced flame graph! I want to bet that it is the `self._table.inspect.files()` loc

Re: [PR] feat: support pagination in `list_*` methods in rest catalog [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on PR #2089: URL: https://github.com/apache/iceberg-python/pull/2089#issuecomment-2993646960 lmk what you think this needs, imo if a user really needs to be able to make single requests they can write the thin layer to do that but if we want to provide the most basic l

Re: [PR] feat: delete orphaned files [iceberg-python]

2025-06-21 Thread via GitHub
jayceslesar commented on code in PR #1958: URL: https://github.com/apache/iceberg-python/pull/1958#discussion_r2160069692 ## pyiceberg/table/maintenance.py: ## @@ -0,0 +1,117 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [I] pyiceberg produces invalid avro if a partition name has an emoji (any non-alphanumeric character I guess, including dots or starting with digits) [iceberg-python]

2025-06-21 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-2993640670 ๐Ÿ˜Ž๐Ÿ˜Ž๐Ÿ˜Ž Thanks for reporting this issue! It looks like this is due to the avro naming convention https://avro.apache.org/docs/1.11.1/specification/#names

[I] How can I speed up batched add_files calls? [iceberg-python]

2025-06-21 Thread via GitHub
thijsheijden opened a new issue, #2132: URL: https://github.com/apache/iceberg-python/issues/2132 ### Question Hi! I am trying to add 1 million existing Parquet files to an Iceberg table using the `add_files` procedure. I am inserting in 1000 batches of 1000 files. Every batch takes

Re: [PR] Flink: Supports delete orphan files in TableMaintenance [iceberg]

2025-06-21 Thread via GitHub
Guosmilesmile commented on code in PR #13302: URL: https://github.com/apache/iceberg/pull/13302#discussion_r2160063006 ## core/src/main/java/org/apache/iceberg/actions/FileURI.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-21 Thread via GitHub
manuzhang commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-2993630279 The target date is end of June. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feature: expire snapshots action [iceberg-rust]

2025-06-21 Thread via GitHub
cmcarthur commented on PR #1455: URL: https://github.com/apache/iceberg-rust/pull/1455#issuecomment-2993589198 @liurenjie1024 thanks for the feedback! > This pr is too large to review. agreed, it is large, I will break it into at least two smaller PRs > I don't think it'

Re: [PR] feature: expire snapshots action [iceberg-rust]

2025-06-21 Thread via GitHub
liurenjie1024 commented on PR #1455: URL: https://github.com/apache/iceberg-rust/pull/1455#issuecomment-2993575061 Hi, @cmcarthur This pr is too large to review. I would suggest to split them into several small prs, for example, the `ExpireSnapshotAction` could be a good start. Also as ment

Re: [I] Proposal: Implement table maintenance operations [iceberg-rust]

2025-06-21 Thread via GitHub
liurenjie1024 commented on issue #1453: URL: https://github.com/apache/iceberg-rust/issues/1453#issuecomment-2993572461 Thanks @cmcarthur for raising this. I think it would be a great feature to add to this library. However, please note that currently in this repo there are several crates:

Re: [PR] feat(transaction): Remove current_table, updates, and requirements from Transaction [iceberg-rust]

2025-06-21 Thread via GitHub
liurenjie1024 merged PR #1451: URL: https://github.com/apache/iceberg-rust/pull/1451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

[PR] refine: refine manifest_evaluator to reject not explicitly [iceberg-rust]

2025-06-21 Thread via GitHub
ZENOTME opened a new pull request, #1462: URL: https://github.com/apache/iceberg-rust/pull/1462 ## Which issue does this PR close? Closes #1355 ## What changes are included in this PR? As discuss in https://github.com/apache/iceberg-rust/issues/1355#issue

Re: [PR] Flink: Supports delete orphan files in TableMaintenance [iceberg]

2025-06-21 Thread via GitHub
Guosmilesmile commented on code in PR #13302: URL: https://github.com/apache/iceberg/pull/13302#discussion_r2159817224 ## flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/DeleteOrphanFiles.java: ## @@ -0,0 +1,358 @@ +/* + * Licensed to the Apache Software

Re: [PR] feat: implement Primitive type Literal [iceberg-cpp]

2025-06-21 Thread via GitHub
Fokko commented on code in PR #117: URL: https://github.com/apache/iceberg-cpp/pull/117#discussion_r2160010262 ## src/iceberg/literal.h: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the N

Re: [I] Guidance Needed: Iceberg-Spark Runtime JAR for Apache Spark 4.0.0 [iceberg]

2025-06-21 Thread via GitHub
atinvento100 commented on issue #13358: URL: https://github.com/apache/iceberg/issues/13358#issuecomment-2993535089 Thank you for confirming. Would you be able to share an estimated release date or timeline for version 1.10.0? -- This is an automated message from the Apache Git Service. T

Re: [PR] feat: add support for avro to arrow data conversion [iceberg-cpp]

2025-06-21 Thread via GitHub
zhjwpku commented on code in PR #124: URL: https://github.com/apache/iceberg-cpp/pull/124#discussion_r2159975860 ## src/iceberg/avro/avro_data_util.cc: ## @@ -17,16 +17,383 @@ * under the License. */ +#include +#include +#include +#include +#include +#include +#inclu

Re: [I] Unable to use GlueCatalog in flink environments without hadoop [iceberg]

2025-06-21 Thread via GitHub
dyrnq commented on issue #3044: URL: https://github.com/apache/iceberg/issues/3044#issuecomment-2993483975 stare -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Flink: Decouple the iceberg integration work from hadoop libraries [iceberg]

2025-06-21 Thread via GitHub
dyrnq commented on issue #3117: URL: https://github.com/apache/iceberg/issues/3117#issuecomment-2993480009 stare -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] feat(transaction): Remove current_table, updates, and requirements from Transaction [iceberg-rust]

2025-06-21 Thread via GitHub
CTTY commented on code in PR #1451: URL: https://github.com/apache/iceberg-rust/pull/1451#discussion_r2159899760 ## crates/iceberg/src/transaction/mod.rs: ## @@ -157,41 +114,45 @@ impl Transaction { /// Commit transaction. pub async fn commit(mut self, catalog: &dyn

Re: [PR] fix(Catalog): Handle NotFound exception for missing metadata file [iceberg]

2025-06-21 Thread via GitHub
coded9 commented on code in PR #13143: URL: https://github.com/apache/iceberg/pull/13143#discussion_r2159971727 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -959,6 +959,16 @@ public void testLoadMissingTable() { .hasMessageStartingWith("Tabl

Re: [I] [feature request] docs for IRC catalog connection [iceberg-python]

2025-06-21 Thread via GitHub
james5418 commented on issue #2096: URL: https://github.com/apache/iceberg-python/issues/2096#issuecomment-299365 Hi @kevinjqliu, I would like to work on this issue! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use