Re: [PR] update PartitionSpec with snapshot'schema [iceberg]

2025-01-25 Thread via GitHub
lurnagao-dahua commented on PR #11196: URL: https://github.com/apache/iceberg/pull/11196#issuecomment-2614237241 > @lurnagao-dahua thank you for creating a PR, are there any plans to re-open this PR? I can reopen it at any time, but the community seems to have no plans. -- This is an a

Re: [PR] Spark 3.4: Backport Spark actions and procedures for RewriteTablePath [iceberg]

2025-01-25 Thread via GitHub
dramaticlly commented on PR #12111: URL: https://github.com/apache/iceberg/pull/12111#issuecomment-2614224927 [Previous spark CI build failure](https://github.com/apache/iceberg/actions/runs/12971614921/job/36178209205) on REST server port binding seem to be unrelated to this change -- T

Re: [PR] Spark 3.4: Backport Spark actions and procedures for RewriteTablePath [iceberg]

2025-01-25 Thread via GitHub
dramaticlly closed pull request #12111: Spark 3.4: Backport Spark actions and procedures for RewriteTablePath URL: https://github.com/apache/iceberg/pull/12111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Core, Spark: Exclude non live content file in RewriteTablePathUtil [iceberg]

2025-01-25 Thread via GitHub
dramaticlly opened a new pull request, #12006: URL: https://github.com/apache/iceberg/pull/12006 Instead of scanning all entries in data/manifest for identifying list of content files to copy, scan only the live one. This is essential to prevent rewrite table path to carry the files already

[PR] Spark 3.4: Backport Spark actions and procedures for RewriteTablePath [iceberg]

2025-01-25 Thread via GitHub
dramaticlly opened a new pull request, #12111: URL: https://github.com/apache/iceberg/pull/12111 Minor unit test modification to accommodate for difference between junit4 and junit5 Can you take a look? @amogh-jahagirdar -- This is an automated message from the Apache Git Service.

[PR] Build: Bump actions/stale from 9.0.0 to 9.1.0 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] opened a new pull request, #12110: URL: https://github.com/apache/iceberg/pull/12110 Bumps [actions/stale](https://github.com/actions/stale) from 9.0.0 to 9.1.0. Release notes Sourced from https://github.com/actions/stale/releases";>actions/stale's releases.

[PR] Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.6 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] opened a new pull request, #12109: URL: https://github.com/apache/iceberg/pull/12109 Bumps software.amazon.awssdk:bom from 2.29.50 to 2.30.6. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=softw

Re: [I] [DISCUSS] Columnar data protocol: Arrow or implement a new one? [iceberg-cpp]

2025-01-25 Thread via GitHub
wgtmac commented on issue #33: URL: https://github.com/apache/iceberg-cpp/issues/33#issuecomment-2614199668 Thanks @paleolimbot! I will keep you updated when we have made progress or need anything from nanoarrow. -- This is an automated message from the Apache Git Service. To respond to t

[PR] Build: Bump org.openapitools:openapi-generator-gradle-plugin from 7.10.0 to 7.11.0 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] opened a new pull request, #12108: URL: https://github.com/apache/iceberg/pull/12108 Bumps [org.openapitools:openapi-generator-gradle-plugin](https://github.com/OpenAPITools/openapi-generator) from 7.10.0 to 7.11.0. Release notes Sourced from https://github.com/Ope

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.2 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] commented on PR #11998: URL: https://github.com/apache/iceberg/pull/11998#issuecomment-2614204930 Superseded by #12109. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.2 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] closed pull request #11998: Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.2 URL: https://github.com/apache/iceberg/pull/11998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Build: Bump nessie from 0.101.3 to 0.102.2 [iceberg]

2025-01-25 Thread via GitHub
dependabot[bot] opened a new pull request, #12107: URL: https://github.com/apache/iceberg/pull/12107 Bumps `nessie` from 0.101.3 to 0.102.2. Updates `org.projectnessie.nessie:nessie-client` from 0.101.3 to 0.102.2 Updates `org.projectnessie.nessie:nessie-jaxrs-testextension` from 0.

Re: [PR] Core, Spark: Exclude non live content file in RewriteTablePathUtil [iceberg]

2025-01-25 Thread via GitHub
dramaticlly closed pull request #12006: Core, Spark: Exclude non live content file in RewriteTablePathUtil URL: https://github.com/apache/iceberg/pull/12006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Parquet: Clean up Parquet generic and internal readers [iceberg]

2025-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12102: URL: https://github.com/apache/iceberg/pull/12102#discussion_r1929650157 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -67,8 +77,18 @@ protected ParquetValueReader createReader( }

Re: [I] [DISCUSS] Columnar data protocol: Arrow or implement a new one? [iceberg-cpp]

2025-01-25 Thread via GitHub
wgtmac commented on issue #33: URL: https://github.com/apache/iceberg-cpp/issues/33#issuecomment-2614188731 > It also depends whether Arrow would be a public or private dependency for iceberg-cpp. If a public dependency, going for nanoarrow is certainly safer at this point. I'd say t

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
zhjwpku commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929632431 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

Re: [I] Flink: Fix flaky TestIcebergSourceFailover > testBoundedWithSavepoint [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on issue #10671: URL: https://github.com/apache/iceberg/issues/10671#issuecomment-2614145603 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Accessing S3 Express one zone bucket from pyiceberg [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on issue #10702: URL: https://github.com/apache/iceberg/issues/10702#issuecomment-2614145630 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Add linter for Markdown files [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on issue #10790: URL: https://github.com/apache/iceberg/issues/10790#issuecomment-2614145678 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] flink iceberg may occur duplication when succeed to write datafile and commit but checkpoint fail [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on issue #10765: URL: https://github.com/apache/iceberg/issues/10765#issuecomment-2614145663 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Accessing S3 Express one zone bucket from pyiceberg [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] closed issue #10702: Accessing S3 Express one zone bucket from pyiceberg URL: https://github.com/apache/iceberg/issues/10702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] AWS Glue skip validation does not work while using table overrrides [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on issue #10701: URL: https://github.com/apache/iceberg/issues/10701#issuecomment-2614145617 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on PR #9335: URL: https://github.com/apache/iceberg/pull/9335#issuecomment-2614145581 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark3.5 deprecate a few SparkCatalog APIs [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] closed pull request #11807: Spark3.5 deprecate a few SparkCatalog APIs URL: https://github.com/apache/iceberg/pull/11807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Spark3.5 deprecate a few SparkCatalog APIs [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] commented on PR #11807: URL: https://github.com/apache/iceberg/pull/11807#issuecomment-2614145701 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [I] AWS Glue skip validation does not work while using table overrrides [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] closed issue #10701: AWS Glue skip validation does not work while using table overrrides URL: https://github.com/apache/iceberg/issues/10701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2025-01-25 Thread via GitHub
github-actions[bot] closed pull request #9335: Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable URL: https://github.com/apache/iceberg/pull/9335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] [Draft] Spark: Action to remove missing files [iceberg]

2025-01-25 Thread via GitHub
wypoon opened a new pull request, #12106: URL: https://github.com/apache/iceberg/pull/12106 In case data and/or delete files are inadvertently deleted from the storage, an Iceberg table becomes unreadable. We provide a Spark action for "repairing" such a table, by removing the missing fi

Re: [PR] Implement column projection [iceberg-python]

2025-01-25 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1929627882 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1127,127 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_t

Re: [I] test [iceberg]

2025-01-25 Thread via GitHub
Sabbir02 closed issue #12104: test URL: https://github.com/apache/iceberg/issues/12104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr..

Re: [PR] feat(catalog): Initial implementation of sql catalog [iceberg-go]

2025-01-25 Thread via GitHub
kevinjqliu commented on PR #246: URL: https://github.com/apache/iceberg-go/pull/246#issuecomment-2614053350 looks like there are a lot of overlapping changing with #266, i can review again once thats merged and rebased -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929583064 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stat

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929580136 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stat

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929580136 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stat

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929580136 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stat

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929579528 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stat

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2025-01-25 Thread via GitHub
ajantha-bhat commented on PR #11216: URL: https://github.com/apache/iceberg/pull/11216#issuecomment-2614035260 @deniskuzZ: While designing the spec (https://iceberg.apache.org/spec/#partition-statistics-file), we have added `totalRecordCount` to represent the record count after applying the

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
mapleFU commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929570771 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
mapleFU commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929571875 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
mapleFU commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929571759 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
mapleFU commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929571759 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

Re: [I] Decouple building and serialization [iceberg-rust]

2025-01-25 Thread via GitHub
Sl1mb0 commented on issue #778: URL: https://github.com/apache/iceberg-rust/issues/778#issuecomment-2614009219 > A builder method for ManifestFile/ManifestList, instead of building them using writers only? Yes - I think that would be ideal. This would help distinguish between buildin

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-25 Thread via GitHub
zhjwpku commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929540361 ## src/iceberg/type_fwd.h: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the N

[PR] Core: Support removing keys from EnvironmentContext [iceberg]

2025-01-25 Thread via GitHub
rshkv opened a new pull request, #12103: URL: https://github.com/apache/iceberg/pull/12103 Tiny change to support removing keys from `EnvironmentContext`. Useful when a JVM is shared by multiple jobs and jobs want to express different environment properties. -- This is an automated

Re: [I] [DISCUSS] Columnar data protocol: Arrow or implement a new one? [iceberg-cpp]

2025-01-25 Thread via GitHub
paleolimbot commented on issue #33: URL: https://github.com/apache/iceberg-cpp/issues/33#issuecomment-2613988183 > I found it is super lightweight and convenient to simply use the bundled nanoarrow.hpp and nanoarrow.cc files generated by ci/scripts/bundle.py --cpp --symbol-namespace=iceberg

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [I] [DISCUSS] Columnar data protocol: Arrow or implement a new one? [iceberg-cpp]

2025-01-25 Thread via GitHub
pitrou commented on issue #33: URL: https://github.com/apache/iceberg-cpp/issues/33#issuecomment-2613946318 > I have also checked https://github.com/man-group/sparrow. TBH, I like its design and simplicity but a little bit concerned about its adoption outside of its sponsors. Maybe we need

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2025-01-25 Thread via GitHub
deniskuzZ commented on PR #11216: URL: https://github.com/apache/iceberg/pull/11216#issuecomment-2613938136 hi @ajantha-bhat, what is the purpose of `PartitionStats.totalRecordCount`? it's always 0 and there is no external setter either. Also `SnapshotSummary.TOTAL_FILE_SIZE_PROP` tracks

Re: [PR] Core: Partial Update [iceberg]

2025-01-25 Thread via GitHub
lurnagao-dahua commented on PR #6043: URL: https://github.com/apache/iceberg/pull/6043#issuecomment-2613927446 May I ask if you have already implemented some updates internally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: nan_value_counts support [iceberg-rust]

2025-01-25 Thread via GitHub
feniljain commented on code in PR #907: URL: https://github.com/apache/iceberg-rust/pull/907#discussion_r1929524955 ## crates/iceberg/src/writer/file_writer/parquet_writer.rs: ## @@ -396,13 +401,45 @@ impl ParquetWriter { impl FileWriter for ParquetWriter { async fn write(

Re: [PR] Enable pyiceberg.table.Table.add_files ns downcasting [iceberg-python]

2025-01-25 Thread via GitHub
fusion commented on PR #1572: URL: https://github.com/apache/iceberg-python/pull/1572#issuecomment-2613923916 The code linter has failed. Will adjust the code accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] [DISCUSS] Columnar data protocol: Arrow or implement a new one? [iceberg-cpp]

2025-01-25 Thread via GitHub
wgtmac commented on issue #33: URL: https://github.com/apache/iceberg-cpp/issues/33#issuecomment-2613864579 Thanks for all the suggestions above! This is my first time to play with nanoarrow. After some research, I found it is super lightweight and convenient to simply use the bundled

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-25 Thread via GitHub
advancedxy commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r192959 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition statis