Re: [PR] ci: add asan and ubsan support [iceberg-cpp]

2025-06-02 Thread via GitHub
Fokko merged PR #107: URL: https://github.com/apache/iceberg-cpp/pull/107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Docs: add column descriptions for entries metadata table [iceberg]

2025-06-02 Thread via GitHub
nastra merged PR #13104: URL: https://github.com/apache/iceberg/pull/13104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Spark 3.5: Fix flaky testParallelPartialProgressWithMaxFailedCommitsLargerThanTotalFileGroup [iceberg]

2025-06-02 Thread via GitHub
nastra commented on code in PR #13208: URL: https://github.com/apache/iceberg/pull/13208#discussion_r2122802074 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1341,7 +1341,7 @@ public void testParallelPartialProgressWi

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
nastra merged PR #13130: URL: https://github.com/apache/iceberg/pull/13130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add BigQuery Dependencies for Iceberg GCP Bundle [iceberg]

2025-06-02 Thread via GitHub
nastra commented on code in PR #13111: URL: https://github.com/apache/iceberg/pull/13111#discussion_r2122789567 ## gcp-bundle/LICENSE: ## @@ -201,403 +201,656 @@ See the License for the specific language governing permissions and limitations under the License. +

Re: [I] Add description of columns for entries metadata table [iceberg]

2025-06-02 Thread via GitHub
nastra closed issue #13076: Add description of columns for entries metadata table URL: https://github.com/apache/iceberg/issues/13076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Core: Make pageToken query parameter optional [iceberg]

2025-06-02 Thread via GitHub
nastra closed pull request #13129: Core: Make pageToken query parameter optional URL: https://github.com/apache/iceberg/pull/13129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] iceberg table properties are saved in table metadata's properties field [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #2064: URL: https://github.com/apache/iceberg-python/issues/2064#issuecomment-2933471078 We should have a way to override the behavior for applying `SetPropertiesUpdate` -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Hive should throw a NoSuchNamespaceException when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
nastra closed issue #12874: Hive should throw a NoSuchNamespaceException when listing a non-existing namespace URL: https://github.com/apache/iceberg/issues/12874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Core: Catch IAE when decoding JWT [iceberg]

2025-06-02 Thread via GitHub
nastra merged PR #13192: URL: https://github.com/apache/iceberg/pull/13192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Build, Core: Move assertions to AssertJ / Fix checkstyle rules [iceberg]

2025-06-02 Thread via GitHub
nastra merged PR #13213: URL: https://github.com/apache/iceberg/pull/13213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Use batchreader in upsert [iceberg-python]

2025-06-02 Thread via GitHub
koenvo commented on code in PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122696555 ## pyiceberg/io/pyarrow.py: ## @@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: Iterable[FileScanTask]) -> Iterator[pa.Record ResolveError:

Re: [PR] Use batchreader in upsert [iceberg-python]

2025-06-02 Thread via GitHub
koenvo commented on code in PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122696555 ## pyiceberg/io/pyarrow.py: ## @@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: Iterable[FileScanTask]) -> Iterator[pa.Record ResolveError:

Re: [PR] fix: add metadata_properties to _construct_parameters when update hive table [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122698502 ## tests/integration/test_reads.py: ## @@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None: table.transaction().set_properties(

Re: [I] iceberg table properties are saved in table metadata's properties field [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #2064: URL: https://github.com/apache/iceberg-python/issues/2064#issuecomment-2933471189 cc @Fokko wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] iceberg table properties are saved in table metadata's properties field [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #2064: URL: https://github.com/apache/iceberg-python/issues/2064#issuecomment-2933469934 `Transaction.set_properties` creates a `SetPropertiesUpdate` table update which is then applied to the table metadata here https://github.com/apache/iceberg-python/

Re: [PR] fix: add metadata_properties to _construct_parameters when update hive table [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122689242 ## tests/integration/test_reads.py: ## @@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None: table.transaction().set_properties(

[I] iceberg table properties are saved in table metadata's properties field [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu opened a new issue, #2064: URL: https://github.com/apache/iceberg-python/issues/2064 ### Apache Iceberg version None ### Please describe the bug ๐Ÿž Context https://github.com/apache/iceberg-python/pull/2013#discussion_r2122682998 Because `Table.propertie

Re: [PR] fix: add metadata_properties to _construct_parameters when update hive table [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on code in PR #2013: URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122646431 ## pyiceberg/catalog/hive.py: ## @@ -211,11 +211,18 @@ def _construct_hive_storage_descriptor( DEFAULT_PROPERTIES = {TableProperties.PARQUET_COMPRESSION: Ta

Re: [PR] Spark 3.5: Disable executor cache for delete files in RewriteDataFilesSparkAction [iceberg]

2025-06-02 Thread via GitHub
manuzhang commented on PR #12893: URL: https://github.com/apache/iceberg/pull/12893#issuecomment-2933379435 @anuragmantri can you please address the comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Build: Bump pyspark from 3.5.5 to 3.5.6 [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu merged PR #2062: URL: https://github.com/apache/iceberg-python/pull/2062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump pyspark from 3.5.5 to 3.5.6 [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on PR #2062: URL: https://github.com/apache/iceberg-python/pull/2062#issuecomment-2933355438 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Support writing V3 tables [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #1551: URL: https://github.com/apache/iceberg-python/issues/1551#issuecomment-2933359072 @b-phi i've added V3 tags for issues to support Iceberg V3 https://github.com/apache/iceberg-python/issues?q=state%3Aopen%20label%3A%22V3%22 Also here are some o

Re: [PR] Build: Bump getdaft from 0.4.16 to 0.4.18 [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu merged PR #2060: URL: https://github.com/apache/iceberg-python/pull/2060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-06-02 Thread via GitHub
b-rick commented on code in PR #12424: URL: https://github.com/apache/iceberg/pull/12424#discussion_r2122617578 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Build: Bump huggingface-hub from 0.32.2 to 0.32.3 [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu merged PR #2061: URL: https://github.com/apache/iceberg-python/pull/2061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump datafusion from 46.0.0 to 47.0.0 [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu merged PR #2063: URL: https://github.com/apache/iceberg-python/pull/2063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Use batchreader in upsert [iceberg-python]

2025-06-02 Thread via GitHub
corleyma commented on code in PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122591607 ## pyiceberg/io/pyarrow.py: ## @@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: Iterable[FileScanTask]) -> Iterator[pa.Record ResolveErro

Re: [PR] only add `.db` suffix in warehouse location for dynamo/hive/glue catalogs [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on PR #2059: URL: https://github.com/apache/iceberg-python/pull/2059#issuecomment-2933185012 @jayceslesar made it so aligns with the java implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Implement saving `TableMetadata` to new location. [iceberg-rust]

2025-06-02 Thread via GitHub
CTTY commented on issue #1388: URL: https://github.com/apache/iceberg-rust/issues/1388#issuecomment-2933204403 I think this will need to be implemented for different catalogs individually, because catalogs have different ways to persist metadata. Or we are mainly talking about adding traits

Re: [I] Tracking issues of Iceberg Rust 0.5.0 Release (May 2025) [iceberg-rust]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #1325: URL: https://github.com/apache/iceberg-rust/issues/1325#issuecomment-2933194163 Opened https://issues.apache.org/jira/browse/INFRA-26882 to set the secret in github -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Reorganize Spark Time Travel doc [iceberg]

2025-06-02 Thread via GitHub
manuzhang closed issue #13064: Reorganize Spark Time Travel doc URL: https://github.com/apache/iceberg/issues/13064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] Reorganize Spark Time Travel doc [iceberg]

2025-06-02 Thread via GitHub
manuzhang commented on issue #13064: URL: https://github.com/apache/iceberg/issues/13064#issuecomment-2933138628 Resolved by #13113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-06-02 Thread via GitHub
liziyan-lzy commented on PR #12254: URL: https://github.com/apache/iceberg/pull/12254#issuecomment-2933062276 > > Hi @pvary , I've noticed there are some conflicts in this PR. Would you recommend that I rebase onto the current main branch and resolve conflicts? > > Definitely do it pl

[PR] Docs: use latest minio client command configuration parameters [iceberg]

2025-06-02 Thread via GitHub
jerry153fish opened a new pull request, #13221: URL: https://github.com/apache/iceberg/pull/13221 Hi team, This PR updates the `spark-quickstart` documentation to use the latest MinIO client configuration command. The previous `config host` command appears to be deprecated. -- Thi

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-02 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2122411984 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -373,14 +385,15 @@ private static RewriteResult writeDataFileEntry( DataFiles.bu

Re: [I] Does the add_files procedure add column lower and upper bounds statistics to manifest files? [iceberg]

2025-06-02 Thread via GitHub
JeonDaehong commented on issue #13218: URL: https://github.com/apache/iceberg/issues/13218#issuecomment-2933032741 From what I understand as someone who's currently learning and using Iceberg, the add_files procedure is designed to quickly register external Parquet files into an Iceberg tab

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-02 Thread via GitHub
himadripal commented on code in PR #13166: URL: https://github.com/apache/iceberg/pull/13166#discussion_r2122410657 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/DaysFunction.java: ## @@ -70,6 +73,11 @@ public String name() { public DataType resultTyp

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-02 Thread via GitHub
himadripal commented on code in PR #13166: URL: https://github.com/apache/iceberg/pull/13166#discussion_r2122411024 ## api/src/main/java/org/apache/iceberg/util/DateTimeUtil.java: ## @@ -183,6 +183,11 @@ public static long isoTimestampToNanos(CharSequence timestampString) {

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-02 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2122408493 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,88 @@ public void testJoinsWithMismatchingPartit

Re: [PR] spark 4.0 : SPJ : add hour to day reducer [iceberg]

2025-06-02 Thread via GitHub
szehon-ho commented on code in PR #13166: URL: https://github.com/apache/iceberg/pull/13166#discussion_r2122404958 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -578,6 +584,62 @@ public void testAggregates() throws NoSuchT

Re: [PR] Build: Add plugin to generate license and notice files [iceberg]

2025-06-02 Thread via GitHub
jbonofre commented on PR #11977: URL: https://github.com/apache/iceberg/pull/11977#issuecomment-2931565362 The plugin is definitely better than nothing as soon as there's a manual pass, because: 1. The plugin lists all license for a dependency (dual license), which is wrong, license shou

Re: [PR] Introduce MetricsMaxInferredColumnDefaultsStrategy [iceberg]

2025-06-02 Thread via GitHub
jkolash commented on PR #13039: URL: https://github.com/apache/iceberg/pull/13039#issuecomment-2933006390 So while doing a self review I came to question why a user provided default should not be bounded. this was the behavior before, but I'm not quite sure it is right. Users can set the `

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2025-06-02 Thread via GitHub
wypoon commented on PR #11661: URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2932993513 @pvary I reran `VectorizedReadDictionaryEncodedFlatParquetDataBenchmark` on my branch before and after the commits in this PR. Before: ``` Benchmark

Re: [I] Deleting namespaces and tables of JDBC Catalog [iceberg-python]

2025-06-02 Thread via GitHub
github-actions[bot] commented on issue #1400: URL: https://github.com/apache/iceberg-python/issues/1400#issuecomment-2932965463 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity

Re: [PR] Spark: Include manifest lists in `allFiles` in `TestRemoveOrphanFilesProcedure` [iceberg]

2025-06-02 Thread via GitHub
github-actions[bot] commented on PR #12957: URL: https://github.com/apache/iceberg/pull/12957#issuecomment-2932961634 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [PR] Status: Combine Table spec tables [iceberg]

2025-06-02 Thread via GitHub
github-actions[bot] commented on PR #12959: URL: https://github.com/apache/iceberg/pull/12959#issuecomment-2932961674 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [I] Add Software Bill of Materials (SBOM) [iceberg]

2025-06-02 Thread via GitHub
github-actions[bot] commented on issue #11697: URL: https://github.com/apache/iceberg/issues/11697#issuecomment-2932961460 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] `Table.add_files` fails for Parquet files with `DecimalType` columns stored as `FIXED_LEN_BYTE_ARRAY` when precision allows `INT32`/`INT64` [iceberg-python]

2025-06-02 Thread via GitHub
basil-soma commented on issue #2057: URL: https://github.com/apache/iceberg-python/issues/2057#issuecomment-2932945040 I'm running into the same issue myself with table upserts since 0.9.1. Thanks for raising this! -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Enhanced License and Notice Report Generation [iceberg]

2025-06-02 Thread via GitHub
danielcweeks commented on code in PR #13220: URL: https://github.com/apache/iceberg/pull/13220#discussion_r2122261429 ## api/src/main/java/org/apache/iceberg/RowDelta.java: ## @@ -47,12 +47,12 @@ public interface RowDelta extends SnapshotUpdate { RowDelta addDeletes(DeleteFil

[PR] Build: Bump pyspark from 3.5.5 to 3.5.6 [iceberg-python]

2025-06-02 Thread via GitHub
dependabot[bot] opened a new pull request, #2062: URL: https://github.com/apache/iceberg-python/pull/2062 Bumps [pyspark](https://github.com/apache/spark) from 3.5.5 to 3.5.6. Commits https://github.com/apache/spark/commit/303c18c74664f161b9b969ac343784c088b47593";>303c18c Prep

Re: [I] pyiceberg write an extra .db in the schema path [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #2052: URL: https://github.com/apache/iceberg-python/issues/2052#issuecomment-2932844413 Thanks @amitgilad3 thats a good point. [defaultWarehouseLocation](https://github.com/apache/iceberg/blob/9fa50f3b82b321a98698c07977096d1638a9b185/core/src/main/java

Re: [PR] remove `.db` suffix in warehouse location [iceberg-python]

2025-06-02 Thread via GitHub
jayceslesar commented on PR #2059: URL: https://github.com/apache/iceberg-python/pull/2059#issuecomment-2932735358 could do something fancy with `os.path.splitext` but that seems like it would lead down a path where we are assuming what the user wants -- This is an automated message from

Re: [PR] [CORE][REST]: Add context aware response parsing [iceberg]

2025-06-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #13191: URL: https://github.com/apache/iceberg/pull/13191#discussion_r2121603778 ## core/src/main/java/org/apache/iceberg/rest/BaseHTTPClient.java: ## @@ -77,6 +78,18 @@ public T get( return execute(request, responseType, errorHandl

Re: [PR] Enhanced License and Notice Report Generation [iceberg]

2025-06-02 Thread via GitHub
talatuyarer commented on PR #13220: URL: https://github.com/apache/iceberg/pull/13220#issuecomment-2932725678 CI/CD failing because i put LICENCE change as build failure. I just wanted to show you guys. :) -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Enhanced License and Notice Report Generation [iceberg]

2025-06-02 Thread via GitHub
talatuyarer commented on code in PR #13220: URL: https://github.com/apache/iceberg/pull/13220#discussion_r2122265004 ## api/src/main/java/org/apache/iceberg/RowDelta.java: ## @@ -47,12 +47,12 @@ public interface RowDelta extends SnapshotUpdate { RowDelta addDeletes(DeleteFile

Re: [PR] Enhanced License and Notice Report Generation [iceberg]

2025-06-02 Thread via GitHub
talatuyarer commented on code in PR #13220: URL: https://github.com/apache/iceberg/pull/13220#discussion_r2122265004 ## api/src/main/java/org/apache/iceberg/RowDelta.java: ## @@ -47,12 +47,12 @@ public interface RowDelta extends SnapshotUpdate { RowDelta addDeletes(DeleteFile

[PR] Build: Bump datafusion from 46.0.0 to 47.0.0 [iceberg-python]

2025-06-02 Thread via GitHub
dependabot[bot] opened a new pull request, #2063: URL: https://github.com/apache/iceberg-python/pull/2063 Bumps [datafusion](https://github.com/apache/datafusion-python) from 46.0.0 to 47.0.0. Commits https://github.com/apache/datafusion-python/commit/ebd62d052cd54fccbcf73fedab

[PR] Build: Bump huggingface-hub from 0.32.2 to 0.32.3 [iceberg-python]

2025-06-02 Thread via GitHub
dependabot[bot] opened a new pull request, #2061: URL: https://github.com/apache/iceberg-python/pull/2061 Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.32.2 to 0.32.3. Release notes Sourced from https://github.com/huggingface/huggingface_hub/release

[PR] Build: Bump getdaft from 0.4.16 to 0.4.18 [iceberg-python]

2025-06-02 Thread via GitHub
dependabot[bot] opened a new pull request, #2060: URL: https://github.com/apache/iceberg-python/pull/2060 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.4.16 to 0.4.18. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] Build: Add plugin to generate license and notice files [iceberg]

2025-06-02 Thread via GitHub
talatuyarer commented on PR #11977: URL: https://github.com/apache/iceberg/pull/11977#issuecomment-2932642580 @jbonofre @bryanck @RussellSpitzer Based on @jbonofre 's comments I enchanced @bryanck's plugin in his pr: https://github.com/apache/iceberg/pull/13220 I also ran license u

[PR] Enhanced License and Notice Report Generation [iceberg]

2025-06-02 Thread via GitHub
talatuyarer opened a new pull request, #13220: URL: https://github.com/apache/iceberg/pull/13220 This pull request enhances the functionality originally introduced in PR #11977 for generating LICENSE and NOTICE files. The primary goals of this update are to improve accuracy, simplify mainte

Re: [PR] Use batchreader in upsert [iceberg-python]

2025-06-02 Thread via GitHub
koenvo commented on PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932586312 Did an update and ran a quick benchmark with different `concurrent_tasks` settings on `to_arrow_batch_reader()`: ```python table = catalog.get_table("some_table") #

Re: [PR] Build: Bump junit-platform from 1.12.2 to 1.13.0 [iceberg]

2025-06-02 Thread via GitHub
Fokko commented on PR #13198: URL: https://github.com/apache/iceberg/pull/13198#issuecomment-2932448161 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] [EPIC] Iceberg Cache [iceberg-rust]

2025-06-02 Thread via GitHub
kyteware commented on issue #1226: URL: https://github.com/apache/iceberg-rust/issues/1226#issuecomment-2932432117 I'm aware that there is some progress being made on possibly restructuring `FileIO`, but I'd still like to take a go at writing an in-FileIO cache. I'll be happy to rebase my c

Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.37.0 to 2.38.0 [iceberg]

2025-06-02 Thread via GitHub
Fokko merged PR #12852: URL: https://github.com/apache/iceberg/pull/12852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] validate added data files for snapshot compatibility [iceberg-python]

2025-06-02 Thread via GitHub
Fokko commented on PR #2050: URL: https://github.com/apache/iceberg-python/pull/2050#issuecomment-2932019390 @kaushiksrini Could you resolve the conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Glue catalog: avoid reloading table in `commit_table` [iceberg-python]

2025-06-02 Thread via GitHub
geruh commented on issue #2051: URL: https://github.com/apache/iceberg-python/issues/2051#issuecomment-2932378816 Hey @psavalle, thanks for raising this. In the Glue commit path, we're eagerly converting the Glue table to an Iceberg table to get the latest metadata pointer from Glue

Re: [PR] Use batchreader in upsert [iceberg-python]

2025-06-02 Thread via GitHub
koenvo commented on PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932336792 > fwiw I think we should try to get this merged in at some point. Some ideas: > > 1. Make it a flag to use the batchreader or not, some users might have basically infinite mem

Re: [PR] Build: Bump guava from 33.4.7-jre to 33.4.8-jre [iceberg]

2025-06-02 Thread via GitHub
Fokko merged PR #12851: URL: https://github.com/apache/iceberg/pull/12851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

2025-06-02 Thread via GitHub
sriharshaj commented on PR #12634: URL: https://github.com/apache/iceberg/pull/12634#issuecomment-2932275548 Can someone please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Add Avro compression [iceberg-python]

2025-06-02 Thread via GitHub
Fokko commented on code in PR #1976: URL: https://github.com/apache/iceberg-python/pull/1976#discussion_r2122038913 ## pyiceberg/table/update/snapshot.py: ## @@ -126,6 +128,11 @@ def __init__( self._deleted_data_files = set() self.snapshot_properties = snapshot

Re: [PR] Flink: port range distribution to v2 iceberg sink [iceberg]

2025-06-02 Thread via GitHub
rodmeneses commented on code in PR #12071: URL: https://github.com/apache/iceberg/pull/12071#discussion_r2121884952 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -645,72 +711,135 @@ private DataStream distributeDataStream(DataStream in

Re: [I] Support writing V3 tables [iceberg-python]

2025-06-02 Thread via GitHub
b-phi commented on issue #1551: URL: https://github.com/apache/iceberg-python/issues/1551#issuecomment-2932165956 As I understand the v3 spec has been [finalized](https://github.com/apache/iceberg/pull/13175). How can I help with supporting v3 writes from pyiceberg? -- This is an automa

[PR] Spark: Add basic Variant read/write support for Spark Iceberg tables without shredding [iceberg]

2025-06-02 Thread via GitHub
aihuaxu opened a new pull request, #13219: URL: https://github.com/apache/iceberg/pull/13219 This PR is to add the support for Spark to read and write Variant (without shredding) data against Iceberg tables. Basically when reading the Variant data, Spark VariantReader reads an Iceberg `Vari

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
jmelinav commented on code in PR #13130: URL: https://github.com/apache/iceberg/pull/13130#discussion_r2121969015 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -1209,11 +1208,4 @@ public void testDatabaseLocationWithSlashInWarehouseDir() {

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
jmelinav commented on code in PR #13130: URL: https://github.com/apache/iceberg/pull/13130#discussion_r2121966954 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -508,7 +508,7 @@ public void createNamespace(Namespace namespace, Map meta) { @

Re: [PR] feat: `ManifestEntryEvaluator` [iceberg-python]

2025-06-02 Thread via GitHub
jayceslesar closed pull request #2056: feat: `ManifestEntryEvaluator` URL: https://github.com/apache/iceberg-python/pull/2056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Spark: Add basic Variant read/write support for Spark Iceberg tables without shredding [iceberg]

2025-06-02 Thread via GitHub
aihuaxu commented on PR #13219: URL: https://github.com/apache/iceberg/pull/13219#issuecomment-2932064681 @aokolnychyi, @szehon-ho Can you help to check if it's the right direction? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] validate added data files for snapshot compatibility [iceberg-python]

2025-06-02 Thread via GitHub
Fokko commented on PR #2050: URL: https://github.com/apache/iceberg-python/pull/2050#issuecomment-2932020981 @kaushiksrini could you resolve the conflicts? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Build: Bump guava from 33.4.7-jre to 33.4.8-jre [iceberg]

2025-06-02 Thread via GitHub
Fokko commented on PR #12851: URL: https://github.com/apache/iceberg/pull/12851#issuecomment-2932010907 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Build: Bump testcontainers from 1.21.0 to 1.21.1 [iceberg]

2025-06-02 Thread via GitHub
Fokko merged PR #13199: URL: https://github.com/apache/iceberg/pull/13199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [I] pyiceberg write an extra .db in the schema path [iceberg-python]

2025-06-02 Thread via GitHub
amitgilad3 commented on issue #2052: URL: https://github.com/apache/iceberg-python/issues/2052#issuecomment-2931993745 this is the behavior that also happens in spark if i'm not mistaken , for example glue - https://github.com/apache/iceberg/blob/a4b2a0dab092821d4843749b8abc30208622e164/aw

Re: [I] `table.upsert` works only with batching [iceberg-python]

2025-06-02 Thread via GitHub
jayceslesar commented on issue #2058: URL: https://github.com/apache/iceberg-python/issues/2058#issuecomment-2931945784 Of course, and dont be afraid to comment on #1995 or make another issue! I think that should definitely get some more attention -- This is an automated message from the

Re: [PR] Flink: port range distribution to v2 iceberg sink [iceberg]

2025-06-02 Thread via GitHub
rodmeneses commented on code in PR #12071: URL: https://github.com/apache/iceberg/pull/12071#discussion_r2121884952 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -645,72 +711,135 @@ private DataStream distributeDataStream(DataStream in

Re: [PR] feat(storage-azdls): Add Azure Datalake Storage support [iceberg-rust]

2025-06-02 Thread via GitHub
DerGut commented on code in PR #1368: URL: https://github.com/apache/iceberg-rust/pull/1368#discussion_r2120674812 ## crates/iceberg/src/io/storage_azdls.rs: ## @@ -0,0 +1,130 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [PR] Expose ref_name parameter for table scans [iceberg-python]

2025-06-02 Thread via GitHub
b-phi closed pull request #1765: Expose ref_name parameter for table scans URL: https://github.com/apache/iceberg-python/pull/1765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] remove `.db` suffix in warehouse location [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu opened a new pull request, #2059: URL: https://github.com/apache/iceberg-python/pull/2059 Closes #2052 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message fr

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
stevenzwu commented on code in PR #13130: URL: https://github.com/apache/iceberg/pull/13130#discussion_r2121798975 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -508,7 +508,7 @@ public void createNamespace(Namespace namespace, Map meta) {

Re: [I] pyiceberg write an extra .db in the schema path [iceberg-python]

2025-06-02 Thread via GitHub
kevinjqliu commented on issue #2052: URL: https://github.com/apache/iceberg-python/issues/2052#issuecomment-2931706304 Thanks for reporting this! For context here is where the `.db` suffix is appended. https://github.com/apache/iceberg-python/blob/91853898c89f5376aa7fe874e8c2c6e

[PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-02 Thread via GitHub
dramaticlly opened a new pull request, #12885: URL: https://github.com/apache/iceberg/pull/12885 Allow rewrite table path to use snapshot id to filter both 1. `added_snapshot_id` in ManifestFile 1. `snapshot_id` in ManifestEntry This PR help add 2nd filter, this help avoid rep

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-06-02 Thread via GitHub
stevenzwu commented on code in PR #13130: URL: https://github.com/apache/iceberg/pull/13130#discussion_r2121763827 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -1209,11 +1208,4 @@ public void testDatabaseLocationWithSlashInWarehouseDir() {

Re: [PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2025-06-02 Thread via GitHub
mun1r0b0t commented on PR #11623: URL: https://github.com/apache/iceberg/pull/11623#issuecomment-2931657978 Yeah, not sure why. I'm leaving it open in hope that they'll reconsider the feature. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Core: Catch IAE when decoding JWT [iceberg]

2025-06-02 Thread via GitHub
nika-qubit commented on code in PR #13192: URL: https://github.com/apache/iceberg/pull/13192#discussion_r2121731819 ## core/src/test/java/org/apache/iceberg/rest/auth/TestOAuth2Util.java: ## @@ -73,6 +73,10 @@ public void testOAuthScopeTokenValidation() { public void testExpi

Re: [PR] feat(storage-azdls): Add Azure Datalake Storage support [iceberg-rust]

2025-06-02 Thread via GitHub
DerGut commented on PR #1368: URL: https://github.com/apache/iceberg-rust/pull/1368#issuecomment-2931632298 ## ๐Ÿ My plan to get this PR to the finish line - **abandon Azurite support** (for now): Unfortunately this sacrifices integration tests and easy local development. But for the l

Re: [I] REST Catalog fixture is particular about query params [iceberg]

2025-06-02 Thread via GitHub
elphastori commented on issue #13119: URL: https://github.com/apache/iceberg/issues/13119#issuecomment-2931317681 @nastra Yes, I would like to do this (with some guidance of course!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat(storage-azdls): Add Azure Datalake Storage support [iceberg-rust]

2025-06-02 Thread via GitHub
DerGut commented on PR #1368: URL: https://github.com/apache/iceberg-rust/pull/1368#issuecomment-2931598528 ๐Ÿ—ž๏ธ Since this was already taking longer than expected, here is an update about what I learned in the past week or so ๐ŸŽ‰ Some of these were misconceptions that led me to revamp p

Re: [PR] Build: Add plugin to generate license and notice files [iceberg]

2025-06-02 Thread via GitHub
talatuyarer commented on PR #11977: URL: https://github.com/apache/iceberg/pull/11977#issuecomment-2931540021 I see. i used this PR's gradle plugin looks like it works most of case. (I dont have any idea about edge cases) It is better than having nothing. What do you think ? -- This is

Re: [PR] Part 1: Support Scan Planning in Rest Client [iceberg]

2025-06-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #13004: URL: https://github.com/apache/iceberg/pull/13004#discussion_r2121655388 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -329,7 +341,8 @@ protected T execute( } try { -return mapper.re

Re: [PR] Build: Add plugin to generate license and notice files [iceberg]

2025-06-02 Thread via GitHub
jbonofre commented on PR #11977: URL: https://github.com/apache/iceberg/pull/11977#issuecomment-2931483744 @talatuyarer not yet, but I worked "help tool" for LICENSE/NOTICE (the content is done manually with assistance). By experience, it's very hard to have a tool generated clean LICENSE/N

  1   2   >