Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-17 Thread via GitHub
huaxingao commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2153736817 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java: ## @@ -128,6 +133,25 @@ public String name() { public DataType result

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-17 Thread via GitHub
huaxingao commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2153733314 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,146 @@ public void testJoinsWithMismatchingPartit

Re: [PR] Allow PartitionField's field_id is missing in Iceberg v1 [iceberg-cpp]

2025-06-17 Thread via GitHub
Fokko commented on PR #121: URL: https://github.com/apache/iceberg-cpp/pull/121#issuecomment-2982779576 > I believe @Fokko's idea has already been implemented Nice, that's great to hear! @Smith-Cruise Could you fix the CI? :) -- This is an automated message from the Apache Git Servi

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-06-17 Thread via GitHub
aiborodin commented on code in PR #12424: URL: https://github.com/apache/iceberg/pull/12424#discussion_r2153697594 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundatio

[PR] Optimise RowData evolution [iceberg]

2025-06-17 Thread via GitHub
aiborodin opened a new pull request, #13340: URL: https://github.com/apache/iceberg/pull/13340 RowDataEvolver recomputes Flink RowType and field getters for every input record that needs to match a destination Iceberg table schema. Cache field getters and column converters to optimise RowDa

Re: [PR] Spark: Support Parquet dictionary encoded UUIDs [iceberg]

2025-06-17 Thread via GitHub
Fokko commented on PR #13324: URL: https://github.com/apache/iceberg/pull/13324#issuecomment-2982765349 @DinGo4DEV Yes, we have tests for plain encoded UUIDs, let me add one for dictionary encoded UUIDs as well 👍 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
singhpk234 commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153686204 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -456,6 +462,39 @@ public void testCloseChild() throws IOException { .doesNotThrow

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
XJDKC commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153649615 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorResponse =

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
sfc-gh-prsingh commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153634654 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorRespo

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
sfc-gh-rxing commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153649099 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorRespons

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
sfc-gh-rxing commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153649099 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorRespons

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
singhpk234 commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153635309 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorResponse

Re: [PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
sfc-gh-prsingh commented on code in PR #13339: URL: https://github.com/apache/iceberg/pull/13339#discussion_r2153634654 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -233,7 +233,7 @@ private static void throwFailure( ErrorResponse enrichedErrorRespo

Re: [I] iceberg table properties are saved in table metadata's properties field [iceberg-python]

2025-06-17 Thread via GitHub
Anton-Tarazi commented on issue #2064: URL: https://github.com/apache/iceberg-python/issues/2064#issuecomment-2982603395 I'd be happy to take a stab at fixing this for the sql catalog, but my understanding is that the iceberg reference jdbc implementation does not store table properties in

Re: [PR] feat(transaction): Implement TransactionAction for FastAppendAction [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on code in PR #1448: URL: https://github.com/apache/iceberg-rust/pull/1448#discussion_r2153527451 ## crates/iceberg/src/transaction/append.rs: ## @@ -16,43 +16,132 @@ // under the License. use std::collections::{HashMap, HashSet}; +use std::sync::Arc;

Re: [PR] fix: fix rewrite_not to process complex nested not [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on code in PR #1431: URL: https://github.com/apache/iceberg-rust/pull/1431#discussion_r2153524490 ## crates/iceberg/src/expr/visitors/rewrite_not.rs: ## @@ -0,0 +1,222 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] feat(transaction): Implement TransactionAction for FastAppendAction [iceberg-rust]

2025-06-17 Thread via GitHub
CTTY commented on code in PR #1448: URL: https://github.com/apache/iceberg-rust/pull/1448#discussion_r2153519378 ## crates/iceberg/src/transaction/append.rs: ## @@ -16,43 +16,132 @@ // under the License. use std::collections::{HashMap, HashSet}; +use std::sync::Arc; +use a

Re: [PR] feat(transaction): Implement TransactionAction for FastAppendAction [iceberg-rust]

2025-06-17 Thread via GitHub
CTTY commented on code in PR #1448: URL: https://github.com/apache/iceberg-rust/pull/1448#discussion_r2153519378 ## crates/iceberg/src/transaction/append.rs: ## @@ -16,43 +16,132 @@ // under the License. use std::collections::{HashMap, HashSet}; +use std::sync::Arc; +use a

Re: [PR] Allow PartitionField's field_id is missing in Iceberg v1 [iceberg-cpp]

2025-06-17 Thread via GitHub
Smith-Cruise commented on PR #121: URL: https://github.com/apache/iceberg-cpp/pull/121#issuecomment-2982430729 @wgtmac CI run failed, could you help me to rerun it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Partition file filtering logic is incorrect for logical `not()` function [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on issue #1355: URL: https://github.com/apache/iceberg-rust/issues/1355#issuecomment-2982417539 I agree with @ZENOTME that we should return Error in `not` visitor. > call rewrite_not in ManifestEvaluator::New to gurantee that there is not possible not in expres

Re: [PR] Build: Bump nessie from 0.104.1 to 0.104.2 [iceberg]

2025-06-17 Thread via GitHub
manuzhang commented on PR #13314: URL: https://github.com/apache/iceberg/pull/13314#issuecomment-2982395403 Then we are stuck at this nessie 0.104.1 until JDK 11 support is dropped. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] Core: Use Shared HttpClientContext to Persist "was-retried" Attribute [iceberg]

2025-06-17 Thread via GitHub
XJDKC opened a new pull request, #13339: URL: https://github.com/apache/iceberg/pull/13339 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
ajantha-bhat commented on PR #13329: URL: https://github.com/apache/iceberg/pull/13329#issuecomment-2982309570 Flink new flaky test: https://github.com/apache/iceberg/issues/13338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
ajantha-bhat closed pull request #13329: Core: Fix filed ids of partition stats file URL: https://github.com/apache/iceberg/pull/13329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[I] Flaky test TestIcebergSink > testTwoSinksInDisjointedDAG() [iceberg]

2025-06-17 Thread via GitHub
ajantha-bhat opened a new issue, #13338: URL: https://github.com/apache/iceberg/issues/13338 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 > Task :iceberg-flink:iceberg-flink-1.20:test ``` TestIcebergSink >

Re: [I] scan error if not embedded arrow schema in parquet [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 closed issue #1435: scan error if not embedded arrow schema in parquet URL: https://github.com/apache/iceberg-rust/issues/1435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] scan error if not embedded arrow schema in parquet [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on issue #1435: URL: https://github.com/apache/iceberg-rust/issues/1435#issuecomment-2982307688 > Thank you for a quick response. > > Ok, i see now. It look like i've confused absence of `arrow` scheme in metadata with absence of fields id's in parquet. I indee

Re: [I] Add `miri` to detect undefined behavior. [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on issue #446: URL: https://github.com/apache/iceberg-rust/issues/446#issuecomment-2982305534 > Hello, I'm new and would like to contribute to this project. If this task is still available, I'd be interested in working on it as it would help me familiarize myself wit

Re: [PR] feat(transaction): Implement TransactionAction for FastAppendAction [iceberg-rust]

2025-06-17 Thread via GitHub
liurenjie1024 commented on code in PR #1448: URL: https://github.com/apache/iceberg-rust/pull/1448#discussion_r2151820818 ## crates/iceberg/src/transaction/mod.rs: ## @@ -143,17 +142,15 @@ impl Transaction { /// Creates a fast append action. pub fn fast_append( -

Re: [PR] validate added data files for snapshot compatibility [iceberg-python]

2025-06-17 Thread via GitHub
kaushiksrini commented on code in PR #2050: URL: https://github.com/apache/iceberg-python/pull/2050#discussion_r2153431445 ## pyiceberg/table/update/validate.py: ## @@ -150,3 +178,60 @@ def _validate_deleted_data_files( if any(conflicting_entries): conflicting_snap

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-17 Thread via GitHub
szehon-ho commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2153381495 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,146 @@ public void testJoinsWithMismatchingPartit

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-17 Thread via GitHub
szehon-ho commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2153381495 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,146 @@ public void testJoinsWithMismatchingPartit

Re: [PR] fix(table): handle missing or nil stats + metadata field nil comparison [iceberg-go]

2025-06-17 Thread via GitHub
James-Gilbert- commented on code in PR #460: URL: https://github.com/apache/iceberg-go/pull/460#discussion_r2153303824 ## table/evaluators.go: ## @@ -732,6 +732,10 @@ func (m *inclusiveMetricsEval) TestRowGroup(rgmeta *metadata.RowGroupMetaData, c return

Re: [I] `catalog.load_table` raises Invalid JSON error [iceberg-python]

2025-06-17 Thread via GitHub
github-actions[bot] closed issue #1328: `catalog.load_table` raises Invalid JSON error URL: https://github.com/apache/iceberg-python/issues/1328 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Deleting namespaces and tables of JDBC Catalog [iceberg-python]

2025-06-17 Thread via GitHub
github-actions[bot] commented on issue #1400: URL: https://github.com/apache/iceberg-python/issues/1400#issuecomment-2982193867 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the A

Re: [I] Deleting namespaces and tables of JDBC Catalog [iceberg-python]

2025-06-17 Thread via GitHub
github-actions[bot] closed issue #1400: Deleting namespaces and tables of JDBC Catalog URL: https://github.com/apache/iceberg-python/issues/1400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] `catalog.load_table` raises Invalid JSON error [iceberg-python]

2025-06-17 Thread via GitHub
github-actions[bot] commented on issue #1328: URL: https://github.com/apache/iceberg-python/issues/1328#issuecomment-2982193901 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the A

Re: [I] Add Software Bill of Materials (SBOM) [iceberg]

2025-06-17 Thread via GitHub
github-actions[bot] commented on issue #11697: URL: https://github.com/apache/iceberg/issues/11697#issuecomment-2982189703 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Add Software Bill of Materials (SBOM) [iceberg]

2025-06-17 Thread via GitHub
github-actions[bot] closed issue #11697: Add Software Bill of Materials (SBOM) URL: https://github.com/apache/iceberg/issues/11697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] flink restore failed with filenotfound [iceberg]

2025-06-17 Thread via GitHub
github-actions[bot] commented on issue #6066: URL: https://github.com/apache/iceberg/issues/6066#issuecomment-2982189618 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Further refactor Parquet readers for v2 support [iceberg]

2025-06-17 Thread via GitHub
wypoon commented on code in PR #13290: URL: https://github.com/apache/iceberg/pull/13290#discussion_r2153315760 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedValuesReader.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[PR] source-ids is not supported on Iceberg v1,2 [iceberg-python]

2025-06-17 Thread via GitHub
rambleraptor opened a new pull request, #2114: URL: https://github.com/apache/iceberg-python/pull/2114 Closes #1547 # Rationale for this change The field `source-ids` is being introduced for Iceberg v3. According to the spec, it should not be supported by Iceberg v1 + v

Re: [PR] Further refactor Parquet readers for v2 support [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on PR #13290: URL: https://github.com/apache/iceberg/pull/13290#issuecomment-2981976101 I have some small questions about the Roadmap for where we go from here but this makes sense to me as a first step. As long as we are more or less copying the Spark approach I th

Re: [PR] Further refactor Parquet readers for v2 support [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13290: URL: https://github.com/apache/iceberg/pull/13290#discussion_r2153253088 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedValuesReader.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Further refactor Parquet readers for v2 support [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13290: URL: https://github.com/apache/iceberg/pull/13290#discussion_r2153251554 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedPlainValuesReader.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Further refactor Parquet readers for v2 support [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13290: URL: https://github.com/apache/iceberg/pull/13290#discussion_r2153223730 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedPageIterator.java: ## @@ -65,13 +64,13 @@ public void setAllPagesDictEncoded(boole

Re: [PR] build(deps): bump the gomod_updates group with 15 updates [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade merged PR #462: URL: https://github.com/apache/iceberg-go/pull/462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on PR #13191: URL: https://github.com/apache/iceberg/pull/13191#issuecomment-2981882586 Thanks @singhpk234 + @amogh-jahagirdar for Review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix(table): handle missing or nil stats + metadata field nil comparison [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade commented on code in PR #460: URL: https://github.com/apache/iceberg-go/pull/460#discussion_r2153191373 ## table/evaluators.go: ## @@ -732,6 +732,10 @@ func (m *inclusiveMetricsEval) TestRowGroup(rgmeta *metadata.RowGroupMetaData, c return fals

Re: [PR] fix(cli/rest): Add custom scope for rest cli when using with Oauth [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade commented on PR #461: URL: https://github.com/apache/iceberg-go/pull/461#issuecomment-2981884051 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] fix(cli/rest): Add custom scope for rest cli when using with Oauth [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade merged PR #461: URL: https://github.com/apache/iceberg-go/pull/461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer merged PR #13191: URL: https://github.com/apache/iceberg/pull/13191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Use gcp.NewHTTPClient and pass creds [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade merged PR #454: URL: https://github.com/apache/iceberg-go/pull/454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Use gcp.NewHTTPClient and pass creds [iceberg-go]

2025-06-17 Thread via GitHub
zeroshade commented on code in PR #454: URL: https://github.com/apache/iceberg-go/pull/454#discussion_r2153181812 ## io/gcs.go: ## @@ -46,6 +49,9 @@ func ParseGCSConfig(props map[string]string) *gcsblob.Options { if path := props[GCSKeyPath]; path != "" {

Re: [PR] Add deprecation message for source-id [iceberg-python]

2025-06-17 Thread via GitHub
rambleraptor closed pull request #2098: Add deprecation message for source-id URL: https://github.com/apache/iceberg-python/pull/2098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Spark: option to set predicate for filtering files in compaction [iceberg]

2025-06-17 Thread via GitHub
bryanck commented on code in PR #13327: URL: https://github.com/apache/iceberg/pull/13327#discussion_r2153152743 ## core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java: ## @@ -291,7 +306,9 @@ private StructLikeMap>> planFileGroups() { scan = scan

Re: [PR] Spark: option to set predicate for filtering files in compaction [iceberg]

2025-06-17 Thread via GitHub
bryanck commented on PR #13327: URL: https://github.com/apache/iceberg/pull/13327#issuecomment-2981776792 > [doubt] does this means the table contains both the files in remote region as well as files in region table is in ? do you create a replace operation when these files from remote are

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on PR #13191: URL: https://github.com/apache/iceberg/pull/13191#issuecomment-2981640927 @amogh-jahagirdar any other followups from you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Confusion around 'catalog' in the spark quickstart guide [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on issue #13312: URL: https://github.com/apache/iceberg/issues/13312#issuecomment-2981638372 Those links aren't working for me, can you link to the doc webpage? Or Paste the text you want to change? -- This is an automated message from the Apache Git Service. To r

Re: [PR] Add BigQuery Dependencies for Iceberg GCP Bundle [iceberg]

2025-06-17 Thread via GitHub
jbonofre commented on PR #13111: URL: https://github.com/apache/iceberg/pull/13111#issuecomment-2981577922 @talatuyarer thanks, it's better but instead of `NOTE: it's shaded in autovalue` can you please add the actual license of the shaded dependency ? We should have something like (f

[PR] fix config test when running locally [iceberg-python]

2025-06-17 Thread via GitHub
kevinjqliu opened a new pull request, #2113: URL: https://github.com/apache/iceberg-python/pull/2113 # Rationale for this change `Config().get_known_catalogs()` also reads my local `~/.pyiceberg.yaml` Follow up to #2088 # Are these changes tested?

Re: [PR] Spark: option to set predicate for filtering files in compaction [iceberg]

2025-06-17 Thread via GitHub
singhpk234 commented on code in PR #13327: URL: https://github.com/apache/iceberg/pull/13327#discussion_r2152902933 ## core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java: ## @@ -291,7 +306,9 @@ private StructLikeMap>> planFileGroups() { scan = s

Re: [PR] AWS, GCP: Fix double-checked-locking pattern in S3FileIO, GCSFileIO. [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #13276: URL: https://github.com/apache/iceberg/pull/13276#discussion_r2152868234 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java: ## @@ -434,9 +434,9 @@ private Map clientByPrefix() { if (null == clientByPrefix) {

Re: [PR] AWS, GCP: Fix double-checked-locking pattern in S3FileIO, GCSFileIO. [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar merged PR #13276: URL: https://github.com/apache/iceberg/pull/13276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] AWS, GCP: Fix double-checked-locking pattern in S3FileIO, GCSFileIO. [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar commented on PR #13276: URL: https://github.com/apache/iceberg/pull/13276#issuecomment-2981331400 Thanks @ChaladiMohanVamsi! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
ajantha-bhat commented on code in PR #13329: URL: https://github.com/apache/iceberg/pull/13329#discussion_r2152834762 ## parquet/src/test/java/org/apache/iceberg/TestParquetPartitionStatsHandler.java: ## @@ -18,9 +18,36 @@ */ package org.apache.iceberg; +import static org.a

Re: [PR] Spark-3.5: Add spark action to compute partition stats [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #12450: URL: https://github.com/apache/iceberg/pull/12450#discussion_r2152802240 ## api/src/main/java/org/apache/iceberg/actions/ComputePartitionStats.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Support ADLS with Pyarrow file IO [iceberg-python]

2025-06-17 Thread via GitHub
Fokko commented on PR #2111: URL: https://github.com/apache/iceberg-python/pull/2111#issuecomment-2981224114 @NikitaMatskevich Thanks for working on this, I know a lot of users are waiting for this. It looks like some tests are failing (you can run the linters locally using `make lint`), co

Re: [PR] Support ADLS with Pyarrow file IO [iceberg-python]

2025-06-17 Thread via GitHub
Fokko commented on code in PR #2111: URL: https://github.com/apache/iceberg-python/pull/2111#discussion_r2152806261 ## pyproject.toml: ## @@ -62,7 +62,7 @@ pyparsing = ">=3.1.0,<4.0.0" zstandard = ">=0.13.0,<1.0.0" tenacity = ">=8.2.3,<10.0.0" pyroaring = ">=1.0.0,<2.0.0" -py

[I] OAuth2: `OAuth2Manager#newSessionFromCredential` shouldn't pass `Authorization: Bearer xxx` from `parent` [iceberg]

2025-06-17 Thread via GitHub
szymonorz opened a new issue, #13337: URL: https://github.com/apache/iceberg/issues/13337 ### Apache Iceberg version 1.9.1 (latest release) ### Query engine Trino ### Please describe the bug 🐞 Hi, while trying to integrate Trino into the analytic stack at

Re: [PR] API: Override StructProjection toString with that of underlying struct [iceberg]

2025-06-17 Thread via GitHub
stevenzwu commented on code in PR #13251: URL: https://github.com/apache/iceberg/pull/13251#discussion_r2152748231 ## api/src/main/java/org/apache/iceberg/util/StructProjection.java: ## @@ -216,4 +216,9 @@ public T get(int pos, Class javaClass) { public void set(int pos, T

Re: [PR] API: Override StructProjection toString with that of underlying struct [iceberg]

2025-06-17 Thread via GitHub
stevenzwu commented on code in PR #13251: URL: https://github.com/apache/iceberg/pull/13251#discussion_r2152748231 ## api/src/main/java/org/apache/iceberg/util/StructProjection.java: ## @@ -216,4 +216,9 @@ public T get(int pos, Class javaClass) { public void set(int pos, T

Re: [PR] S3FILEIO integration with SSEC support in AAL [iceberg]

2025-06-17 Thread via GitHub
singhpk234 commented on code in PR #13335: URL: https://github.com/apache/iceberg/pull/13335#discussion_r2152743289 ## aws/src/main/java/org/apache/iceberg/aws/s3/AnalyticsAcceleratorUtil.java: ## @@ -59,14 +61,21 @@ private AnalyticsAcceleratorUtil() {} public static Seekabl

Re: [PR] Docs: Update Kafka Connect installation instructions and fix minor typos [iceberg]

2025-06-17 Thread via GitHub
arun-dhingra commented on PR #12994: URL: https://github.com/apache/iceberg/pull/12994#issuecomment-2981071597 I have noticed one more issue with the kafka connect document. Should I add to this branch or create a separate PR? -- This is an automated message from the Apache Git Service. T

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
ajantha-bhat commented on PR #13329: URL: https://github.com/apache/iceberg/pull/13329#issuecomment-2981014331 > off-topic: partition stats file format is highly coupled with table write.format. If CU is using ORC, he automatically loses the partition stats. Yes. In the initial propos

Re: [PR] Spec: Add manifests fields to Snapshot Summary metrics [iceberg]

2025-06-17 Thread via GitHub
Fokko commented on PR #13238: URL: https://github.com/apache/iceberg/pull/13238#issuecomment-2980933661 @manuzhang Thanks for raising this, could you maybe give a bit more background on the reason why this is useful to track? -- This is an automated message from the Apache Git Service. To

Re: [PR] Spec: Add manifests fields to Snapshot Summary metrics [iceberg]

2025-06-17 Thread via GitHub
manuzhang commented on PR #13238: URL: https://github.com/apache/iceberg/pull/13238#issuecomment-2980949611 @Fokko It's already in the Java implementation. This is to update the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Spec: Add manifests fields to Snapshot Summary metrics [iceberg]

2025-06-17 Thread via GitHub
Fokko commented on code in PR #13238: URL: https://github.com/apache/iceberg/pull/13238#discussion_r2152630636 ## format/spec.md: ## @@ -1826,6 +1826,10 @@ Snapshot summary can include metrics fields to track numeric stats of the snapsh | **`total-equality-deletes`**|

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
singhpk234 commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2152617025 ## .palantir/revapi.yml: ## @@ -1285,6 +1285,12 @@ acceptedBreaks: - code: "java.field.removedWithConstant" old: "field org.apache.iceberg.TablePropert

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-17 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2152612762 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java: ## @@ -128,6 +133,23 @@ public String name() { public DataType resul

Re: [PR] Flink: Dynamic Iceberg Sink: Add sink / core processing logic / benchmarking [iceberg]

2025-06-17 Thread via GitHub
mxm commented on PR #13304: URL: https://github.com/apache/iceberg/pull/13304#issuecomment-2980850525 Thanks for reviewing / merging @pvary! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [infra] publish rc to pypi as part of release process [iceberg-rust]

2025-06-17 Thread via GitHub
kevinjqliu commented on code in PR #1449: URL: https://github.com/apache/iceberg-rust/pull/1449#discussion_r2152578305 ## .github/workflows/publish.yml: ## @@ -20,7 +20,9 @@ name: Publish on: push: tags: - - "*" + # Trigger this workflow when tag follows the v

[PR] [infra] publish rc to pypi as part of release process [iceberg-rust]

2025-06-17 Thread via GitHub
kevinjqliu opened a new pull request, #1449: URL: https://github.com/apache/iceberg-rust/pull/1449 ## Which issue does this PR close? - Closes #1409. ## What changes are included in this PR? ## Are these changes tested? Yes, tested against perso

Re: [I] we should publish rc to pypi as part of the release process [iceberg-rust]

2025-06-17 Thread via GitHub
kevinjqliu commented on issue #1409: URL: https://github.com/apache/iceberg-rust/issues/1409#issuecomment-2980828442 hey @KrishnaSindhur thanks for volunteering to work on this. I've already started to work on this issue. Since it involves github CI, there's a lot of infrastructure set up t

Re: [PR] Flink: Dynamic Iceberg Sink: Add sink / core processing logic / benchmarking [iceberg]

2025-06-17 Thread via GitHub
pvary commented on PR #13304: URL: https://github.com/apache/iceberg/pull/13304#issuecomment-2980800408 Merged to main. Thanks for the PR @mxm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Flink: Dynamic Iceberg Sink: Add sink / core processing logic / benchmarking [iceberg]

2025-06-17 Thread via GitHub
pvary merged PR #13304: URL: https://github.com/apache/iceberg/pull/13304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Azure: KeyManagementClient implementation for Azure Key Vault [iceberg]

2025-06-17 Thread via GitHub
nandorKollar commented on PR #13186: URL: https://github.com/apache/iceberg/pull/13186#issuecomment-2980780411 @szlta you might be interested in this change, as you did similar for AWS and GCP. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2152527589 ## core/src/main/java/org/apache/iceberg/rest/ParserContext.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] [Avro] Accept dict with only `'type': 'null'` as representation of `null` [iceberg-python]

2025-06-17 Thread via GitHub
kevinjqliu commented on PR #2109: URL: https://github.com/apache/iceberg-python/pull/2109#issuecomment-2980757609 ty that makes sense, thanks for the pointer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [Avro] Accept dict with only `'type': 'null'` as representation of `null` [iceberg-python]

2025-06-17 Thread via GitHub
kevinjqliu commented on PR #2109: URL: https://github.com/apache/iceberg-python/pull/2109#issuecomment-2980758867 Looks like theres a linter issue, could you run `make lint` locally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [Avro] Accept dict with only `'type': 'null'` as representation of `null` [iceberg-python]

2025-06-17 Thread via GitHub
Tishj commented on PR #2109: URL: https://github.com/apache/iceberg-python/pull/2109#issuecomment-2980751917 The spec seems to indicate that this is valid: https://avro.apache.org/docs/1.11.1/specification/ Right under `Schema Declaration` > {"type": "typeName" ...attributes...}

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2152499787 ## core/src/main/java/org/apache/iceberg/rest/ParserContext.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2152486978 ## core/src/main/java/org/apache/iceberg/rest/ParserContext.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] [Avro] Accept dict with only `'type': 'null'` as representation of `null` [iceberg-python]

2025-06-17 Thread via GitHub
kevinjqliu commented on code in PR #2109: URL: https://github.com/apache/iceberg-python/pull/2109#discussion_r2152474286 ## pyiceberg/utils/schema_conversion.py: ## @@ -171,11 +171,11 @@ def _resolve_union( # This means that null has to come first: # https://av

Re: [PR] Core: Clean expired metadata even if there is no snapshot to expire [iceberg]

2025-06-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #13322: URL: https://github.com/apache/iceberg/pull/13322#discussion_r2152480164 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1799,6 +1799,43 @@ public void testExpireSnapshotsWithExecutor() { .isGrea

Re: [PR] Add BigQuery Dependencies for Iceberg GCP Bundle [iceberg]

2025-06-17 Thread via GitHub
jbonofre commented on code in PR #13111: URL: https://github.com/apache/iceberg/pull/13111#discussion_r2152171035 ## gcp-bundle/LICENSE: ## @@ -319,17 +540,65 @@ License: Apache-2.0 - https://www.apache.org/licenses/LICENSE-2.0.txt --

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-17 Thread via GitHub
RussellSpitzer commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2152469836 ## .palantir/revapi.yml: ## @@ -1285,6 +1285,12 @@ acceptedBreaks: - code: "java.field.removedWithConstant" old: "field org.apache.iceberg.TablePro

Re: [PR] Add warehouse parameter to the REST Catalog doc [iceberg-python]

2025-06-17 Thread via GitHub
kevinjqliu commented on code in PR #2066: URL: https://github.com/apache/iceberg-python/pull/2066#discussion_r2152415316 ## mkdocs/docs/configuration.md: ## @@ -346,6 +346,7 @@ catalog: | rest.signing-name | execute-api | The service signing name to use

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
deniskuzZ commented on code in PR #13329: URL: https://github.com/apache/iceberg/pull/13329#discussion_r2152362678 ## core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java: ## @@ -239,30 +263,30 @@ private static OutputFile newPartitionStatsFile( private static Par

Re: [PR] Core: Fix filed ids of partition stats file [iceberg]

2025-06-17 Thread via GitHub
deniskuzZ commented on PR #13329: URL: https://github.com/apache/iceberg/pull/13329#issuecomment-2980502271 off-topic: partition stats file format is highly coupled with table `write.format`. If CU is using `ORC`, he automatically loses the partition stats. Do you think we can decouple

  1   2   >