[I] Column Stats Improvements [iceberg]

2025-05-25 Thread via GitHub
nastra opened a new issue, #13153: URL: https://github.com/apache/iceberg/issues/13153 ### Proposed Change ## Motivation Column statistics are currently stored as a mapping of field id to values across multiple columns (lower/upper bounds, value/nan/null counts, sizes). This stora

Re: [PR] AWS: update test cases to verify credentials for the prefixed S3 client [iceberg]

2025-05-25 Thread via GitHub
nastra commented on code in PR #13118: URL: https://github.com/apache/iceberg/pull/13118#discussion_r2106651609 ## aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIO.java: ## @@ -862,6 +919,110 @@ public void multipleStorageCredentialsConfigured() { s3FileIOPro

Re: [PR] Docs: add Tinybird to the list of vendors and blog posts [iceberg]

2025-05-25 Thread via GitHub
futurepastori commented on code in PR #13128: URL: https://github.com/apache/iceberg/pull/13128#discussion_r2106660755 ## site/docs/vendors.md: ## @@ -1,6 +1,7 @@ --- title: "Vendors" --- + Review Comment: @nastra sorry about that, last minute formatter mess-up. fixed

Re: [PR] Resolves #13103 [iceberg]

2025-05-25 Thread via GitHub
nastra commented on PR #13152: URL: https://github.com/apache/iceberg/pull/13152#issuecomment-2908719981 can you please update the PR title to reflect what the PR does? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] chore: introduce `nightly` feature flag to provide error backtrace [iceberg-rust]

2025-05-25 Thread via GitHub
BugenZhao commented on PR #1340: URL: https://github.com/apache/iceberg-rust/pull/1340#issuecomment-2908678446 I don't think this change imposes actual "requirement" on the nightly toolchain: it's completely optional and not enabled by default. Additionally, gating code that requires

Re: [PR] feat: add avro schema projection [iceberg-cpp]

2025-05-25 Thread via GitHub
Xuanwo merged PR #109: URL: https://github.com/apache/iceberg-cpp/pull/109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] refactor: use nesting enum for DataFile and ManifestFile content [iceberg-cpp]

2025-05-25 Thread via GitHub
Xuanwo merged PR #110: URL: https://github.com/apache/iceberg-cpp/pull/110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] feat: support decompress gzip metadata [iceberg-cpp]

2025-05-25 Thread via GitHub
wgtmac commented on code in PR #108: URL: https://github.com/apache/iceberg-cpp/pull/108#discussion_r2106515316 ## src/iceberg/util/gzip_internal.cc: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements

Re: [PR] Feature: Write to branches [iceberg-python]

2025-05-25 Thread via GitHub
vinjai commented on PR #941: URL: https://github.com/apache/iceberg-python/pull/941#issuecomment-2908486545 Identified and fixed a bug related to empty tables. Planning to add test cases to cover this scenario. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Core: Add basic classes for writing table format-version 4 [iceberg]

2025-05-25 Thread via GitHub
ajantha-bhat commented on code in PR #13123: URL: https://github.com/apache/iceberg/pull/13123#discussion_r2106490854 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -54,7 +54,7 @@ public class TestHelpers { private TestHelpers() {} - public static final

Re: [I] Storage Partitioned Join (SPJ) fails when >2 tables are joined [iceberg]

2025-05-25 Thread via GitHub
bryanck commented on issue #10450: URL: https://github.com/apache/iceberg/issues/10450#issuecomment-2908474465 @mrbrahman I can't reproduce this using the code you posted, using Spark 3.5 and Iceberg 1.4 or 1.9. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Resolves #13103 [iceberg]

2025-05-25 Thread via GitHub
Bhargavkonidena commented on PR #13152: URL: https://github.com/apache/iceberg/pull/13152#issuecomment-2908440088 .take-issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Make FileIO a Trait [iceberg-rust]

2025-05-25 Thread via GitHub
linhr commented on issue #1314: URL: https://github.com/apache/iceberg-rust/issues/1314#issuecomment-2908438919 A good `FileIO`/`Storage` abstraction not only benefits object_store integration, but also makes it easier to work with OpenDAL in Iceberg. For example, I noticed that ther

[PR] Resolves #13103 [iceberg]

2025-05-25 Thread via GitHub
Bhargavkonidena opened a new pull request, #13152: URL: https://github.com/apache/iceberg/pull/13152 Resolves #13103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] OpenAPI spec missing `schema` field for `TableMetadata` [iceberg]

2025-05-25 Thread via GitHub
Bhargavkonidena commented on issue #13103: URL: https://github.com/apache/iceberg/issues/13103#issuecomment-2908434920 .take-issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Spark 3.5, Arrow: Support for Row lineage when using the Parquet Vectorized reader [iceberg]

2025-05-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12928: URL: https://github.com/apache/iceberg/pull/12928#discussion_r2106432131 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRowLevelOperationsWithLineage.java: ## @@ -91,7 +91,6 @@ public void bef

Re: [PR] feat: support decompress gzip metadata [iceberg-cpp]

2025-05-25 Thread via GitHub
dongxiao1198 commented on code in PR #108: URL: https://github.com/apache/iceberg-cpp/pull/108#discussion_r2106439616 ## src/iceberg/table_metadata.cc: ## @@ -153,14 +154,70 @@ Result TableMetadataUtil::CodecFromFileName( return MetadataFileCodecType::kNone; } +class GZip

Re: [PR] Website: Add PyIceberg, IcebergRust, and IcebergGo to top nav bar [iceberg]

2025-05-25 Thread via GitHub
manuzhang commented on PR #12950: URL: https://github.com/apache/iceberg/pull/12950#issuecomment-2908385299 @petern48 Thanks, I've [opened a discussion on the dev list](https://lists.apache.org/thread/ndmtzdjwzw5tjs47zg5owphx0p3pzqv1) for a more comprehensive restructure of the side navigat

Re: [PR] Spark 3.5, Arrow: Support for Row lineage when using the Parquet Vectorized reader [iceberg]

2025-05-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12928: URL: https://github.com/apache/iceberg/pull/12928#discussion_r2106437961 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/parquet/vectorized/TestParquetVectorizedReads.java: ## @@ -104,7 +119,30 @@ private void writ

Re: [PR] Spark 3.5, Arrow: Support for Row lineage when using the Parquet Vectorized reader [iceberg]

2025-05-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12928: URL: https://github.com/apache/iceberg/pull/12928#discussion_r2106437659 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java: ## @@ -567,6 +608,164 @@ public void close() { } } + privat

Re: [I] Spark: add IcebergConnectHiveDelegationTokenProvider [iceberg]

2025-05-25 Thread via GitHub
zhangwl9 commented on issue #13116: URL: https://github.com/apache/iceberg/issues/13116#issuecomment-2908380194 @pvary @rdblue @szehon-ho @gaborgsomogyi what do you think about adding delegation token provider support to Hive Catalog? -- This is an automated message from the Apache Git Se

Re: [PR] Spark 3.5, Arrow: Support for Row lineage when using the Parquet Vectorized reader [iceberg]

2025-05-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12928: URL: https://github.com/apache/iceberg/pull/12928#discussion_r2106433365 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java: ## @@ -567,6 +608,164 @@ public void close() { } } + privat

Re: [PR] Spark 3.5, Arrow: Support for Row lineage when using the Parquet Vectorized reader [iceberg]

2025-05-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #12928: URL: https://github.com/apache/iceberg/pull/12928#discussion_r2106432458 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java: ## @@ -461,6 +461,53 @@ public static VectorizedArrowReader positions

Re: [PR] feat: support decompress gzip metadata [iceberg-cpp]

2025-05-25 Thread via GitHub
wgtmac commented on code in PR #108: URL: https://github.com/apache/iceberg-cpp/pull/108#discussion_r2106412663 ## src/iceberg/table_metadata.cc: ## @@ -153,14 +154,70 @@ Result TableMetadataUtil::CodecFromFileName( return MetadataFileCodecType::kNone; } +class GZipDecomp

Re: [PR] Flink: Support compact in iceberg sink v2 [iceberg]

2025-05-25 Thread via GitHub
Guosmilesmile commented on code in PR #12979: URL: https://github.com/apache/iceberg/pull/12979#discussion_r2106413917 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/CommittableToTableChangeConverter.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Sof

Re: [PR] Flink: Support compact in iceberg sink v2 [iceberg]

2025-05-25 Thread via GitHub
Guosmilesmile commented on code in PR #12979: URL: https://github.com/apache/iceberg/pull/12979#discussion_r2106413917 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/CommittableToTableChangeConverter.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Sof

Re: [PR] feat: add avro schema projection [iceberg-cpp]

2025-05-25 Thread via GitHub
wgtmac commented on PR #109: URL: https://github.com/apache/iceberg-cpp/pull/109#issuecomment-2908325006 This is an equivalent feature of https://github.com/apache/iceberg-cpp/pull/102 but it projects on an Avro schema. After this is merged, I can go ahead to implement the Avro file reader.

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-05-25 Thread via GitHub
b-rick commented on code in PR #12424: URL: https://github.com/apache/iceberg/pull/12424#discussion_r2106390743 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-05-25 Thread via GitHub
b-rick commented on code in PR #12424: URL: https://github.com/apache/iceberg/pull/12424#discussion_r2106390743 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add Python 3.13 to the test matrix [iceberg-python]

2025-05-25 Thread via GitHub
github-actions[bot] commented on issue #1372: URL: https://github.com/apache/iceberg-python/issues/1372#issuecomment-2908193501 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity

Re: [PR] Spark: Support rewrite file with z-order for nested Struct type [iceberg]

2025-05-25 Thread via GitHub
github-actions[bot] commented on PR #9818: URL: https://github.com/apache/iceberg/pull/9818#issuecomment-2908190516 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

[PR] Add Variant toString for time type [iceberg]

2025-05-25 Thread via GitHub
aihuaxu opened a new pull request, #13151: URL: https://github.com/apache/iceberg/pull/13151 This implements toString for time type in Variant to show friendly time string in test cases instead of long value. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Revert "Core: Enhance remove snapshots efficiency by executing them in bulk (#12670) [iceberg]

2025-05-25 Thread via GitHub
aihuaxu closed pull request #13098: Revert "Core: Enhance remove snapshots efficiency by executing them in bulk (#12670) URL: https://github.com/apache/iceberg/pull/13098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] test: Add missing tests for update_namespace method in sql catalog [iceberg-rust]

2025-05-25 Thread via GitHub
kyteware commented on PR #1373: URL: https://github.com/apache/iceberg-rust/pull/1373#issuecomment-2907985029 I'm looking at the behaviour of the method, and it seems as though an edge case may cause unwanted behaviour. The method is structured into two parts 1. Read the table of n

[I] JDBCMetricReporter to support governance and compliance use cases [iceberg]

2025-05-25 Thread via GitHub
blcksrx opened a new issue, #13150: URL: https://github.com/apache/iceberg/issues/13150 ### Feature Request / Improvement It would be valuable to introduce a JDBCMetricReporter for Apache Iceberg to report table metrics into an external JDBC-compatible database (e.g., Postgres, MySQL

Re: [PR] fix: add metadata_properties to _construct_parameters when update hive table [iceberg-python]

2025-05-25 Thread via GitHub
kadai0308 commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2106190785 ## pyiceberg/catalog/hive.py: ## @@ -541,6 +548,7 @@ def commit_table( hive_table.parameters = _construct_parameters(

[PR] chore(deps): Bump aws-sdk-glue from 1.94.0 to 1.97.0 [iceberg-rust]

2025-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1376: URL: https://github.com/apache/iceberg-rust/pull/1376 Bumps [aws-sdk-glue](https://github.com/awslabs/aws-sdk-rust) from 1.94.0 to 1.97.0. Commits See full diff in https://github.com/awslabs/aws-sdk-rust/commits";>compare view

[PR] chore(deps): Bump ordered-float from 2.10.1 to 4.6.0 [iceberg-rust]

2025-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1374: URL: https://github.com/apache/iceberg-rust/pull/1374 Bumps [ordered-float](https://github.com/reem/rust-ordered-float) from 2.10.1 to 4.6.0. Release notes Sourced from https://github.com/reem/rust-ordered-float/releases";>ordered-f

[PR] chore(deps): Bump aws-sdk-s3tables from 1.20.0 to 1.22.0 [iceberg-rust]

2025-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1377: URL: https://github.com/apache/iceberg-rust/pull/1377 Bumps [aws-sdk-s3tables](https://github.com/awslabs/aws-sdk-rust) from 1.20.0 to 1.22.0. Commits See full diff in https://github.com/awslabs/aws-sdk-rust/commits";>compare vi

[PR] chore(deps): Bump uuid from 1.16.0 to 1.17.0 [iceberg-rust]

2025-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1375: URL: https://github.com/apache/iceberg-rust/pull/1375 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.16.0 to 1.17.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.17.0 What'

Re: [PR] fix: add metadata_properties to _construct_parameters when update hive table [iceberg-python]

2025-05-25 Thread via GitHub
kadai0308 commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2106145510 ## pyiceberg/catalog/hive.py: ## @@ -211,11 +211,18 @@ def _construct_hive_storage_descriptor( DEFAULT_PROPERTIES = {TableProperties.PARQUET_COMPRESSION: Tab

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-25 Thread via GitHub
liziyan-lzy commented on code in PR #12254: URL: https://github.com/apache/iceberg/pull/12254#discussion_r2106131025 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -384,7 +412,11 @@ public void testMetadataFolderIsIntac

[PR] build(deps): bump the gomod_updates group with 3 updates [iceberg-go]

2025-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #440: URL: https://github.com/apache/iceberg-go/pull/440 Bumps the gomod_updates group with 3 updates: [github.com/aws/aws-sdk-go-v2/service/glue](https://github.com/aws/aws-sdk-go-v2), [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/