Re: [PR] Build: Bump pydantic from 2.10.5 to 2.10.6 [iceberg-python]

2025-01-24 Thread via GitHub
Fokko merged PR #1576: URL: https://github.com/apache/iceberg-python/pull/1576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Build: Upgrade to Gradle 8.12.1 [iceberg]

2025-01-24 Thread via GitHub
jbonofre commented on PR #12093: URL: https://github.com/apache/iceberg/pull/12093#issuecomment-2613816895 > Thanks @jbonofre 🙌 My pleasure ! Happy to help :) And thanks for the review and merge @Fokko ! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Core: add variant type support [iceberg]

2025-01-24 Thread via GitHub
aihuaxu commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1929488590 ## core/src/test/java/org/apache/iceberg/TestMetadataUpdateParser.java: ## @@ -108,19 +121,19 @@ public void testUpgradeFormatVersionFromJson() { } /** AddSch

Re: [PR] Core: add variant type support [iceberg]

2025-01-24 Thread via GitHub
aihuaxu commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1929488493 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -61,6 +61,14 @@ private Types() {} private static final Pattern DECIMAL = Pattern.compile("d

Re: [PR] Build: Upgrade to Gradle 8.12.1 [iceberg]

2025-01-24 Thread via GitHub
Fokko commented on PR #12093: URL: https://github.com/apache/iceberg/pull/12093#issuecomment-2613805828 Thanks @jbonofre 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Build: Upgrade to Gradle 8.12.1 [iceberg]

2025-01-24 Thread via GitHub
Fokko merged PR #12093: URL: https://github.com/apache/iceberg/pull/12093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-24 Thread via GitHub
lidavidm commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929484489 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NO

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-24 Thread via GitHub
lidavidm commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929484432 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NO

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-24 Thread via GitHub
lidavidm commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1929484350 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NO

Re: [PR] fix: allow nullable field of equality delete writer [iceberg-rust]

2025-01-24 Thread via GitHub
ZENOTME commented on code in PR #834: URL: https://github.com/apache/iceberg-rust/pull/834#discussion_r1929473404 ## crates/iceberg/src/arrow/record_batch_projector.rs: ## @@ -148,8 +150,11 @@ impl RecordBatchProjector { ))? .column(*idx)

Re: [PR] Core: Check referencedDataFile existence for DV [iceberg]

2025-01-24 Thread via GitHub
ebyhr commented on code in PR #12088: URL: https://github.com/apache/iceberg/pull/12088#discussion_r1929470728 ## core/src/main/java/org/apache/iceberg/FileMetadata.java: ## @@ -255,6 +255,8 @@ public DeleteFile build() { if (format == FileFormat.PUFFIN) { Precon

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2025-01-24 Thread via GitHub
ajantha-bhat commented on PR #11216: URL: https://github.com/apache/iceberg/pull/11216#issuecomment-2613726289 @aokolnychyi, @rdblue, @RussellSpitzer: I have worked on Internal writers, readers for Avro, parquet and PRs got merged. I have rebased this PR to use the internal writers and r

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2025-01-24 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1916789379 ## core/src/main/java/org/apache/iceberg/PartitionStatsUtil.java: ## @@ -133,4 +134,27 @@ private static Collection mergeStats( return statsMap.values();

Re: [I] set tblproperties, spark action expireSnapshots is not work. [iceberg]

2025-01-24 Thread via GitHub
cosen-wu commented on issue #12078: URL: https://github.com/apache/iceberg/issues/12078#issuecomment-2613723439 so sorry. I didn't express myself clearly. What I'm confused about is why the new properties haven't been updated in the metadata file after they were added in hive. This

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2025-01-24 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1929434675 ## data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Core: add variant type support [iceberg]

2025-01-24 Thread via GitHub
aihuaxu commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1929433492 ## core/src/test/java/org/apache/iceberg/avro/TestBuildAvroProjection.java: ## @@ -401,4 +402,32 @@ public void projectMapWithLessFieldInValueSchema() { .as(

[PR] Parquet: Clean up Parquet generic and internal readers [iceberg]

2025-01-24 Thread via GitHub
rdblue opened a new pull request, #12102: URL: https://github.com/apache/iceberg/pull/12102 This is a refactor that cleans up a few issues I noticed while reviewing #11904 and while working on Parquet variant readers. - Updates INT and UINT handling to reject unsupported unsigned type

Re: [PR] Make s3.request_timeout configurable [iceberg-python]

2025-01-24 Thread via GitHub
kevinjqliu commented on code in PR #1568: URL: https://github.com/apache/iceberg-python/pull/1568#discussion_r1929414445 ## mkdocs/docs/configuration.md: ## @@ -116,6 +116,7 @@ For the FileIO there are several configuration options available: | s3.region| us-west-2

Re: [PR] Parquet: Fix Reader leak by removing useless copy [iceberg]

2025-01-24 Thread via GitHub
zizon commented on code in PR #12079: URL: https://github.com/apache/iceberg/pull/12079#discussion_r1929399227 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetIO.java: ## @@ -82,22 +82,10 @@ static OutputFile file(org.apache.iceberg.io.OutputFile file, Configuration

Re: [PR] Docs: add note for `day` transform [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] closed pull request #11749: Docs: add note for `day` transform URL: https://github.com/apache/iceberg/pull/11749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Core: Use FileIO for hadoop table metadata file operations [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] commented on PR #11690: URL: https://github.com/apache/iceberg/pull/11690#issuecomment-2613635653 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Docs: add note for `day` transform [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] commented on PR #11749: URL: https://github.com/apache/iceberg/pull/11749#issuecomment-2613635699 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Spark3.5: Resolve IDENTIFIER FIELDS with merge-on-read bug [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] closed pull request #11757: Spark3.5: Resolve IDENTIFIER FIELDS with merge-on-read bug URL: https://github.com/apache/iceberg/pull/11757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] The unit test for class TestFlinkIcebergSink cannot be executed [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] closed issue #10694: The unit test for class TestFlinkIcebergSink cannot be executed URL: https://github.com/apache/iceberg/issues/10694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] The unit test for class TestFlinkIcebergSink cannot be executed [iceberg]

2025-01-24 Thread via GitHub
github-actions[bot] commented on issue #10694: URL: https://github.com/apache/iceberg/issues/10694#issuecomment-2613635494 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-24 Thread via GitHub
amogh-jahagirdar commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1929381993 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -96,6 +97,7 @@ public class RewriteTablePathSparkA

Re: [PR] OpenAPI: add initial/write defaults to schema [iceberg]

2025-01-24 Thread via GitHub
rdblue commented on code in PR #12094: URL: https://github.com/apache/iceberg/pull/12094#discussion_r1929384425 ## core/src/test/java/org/apache/iceberg/rest/requests/TestCreateTableRequest.java: ## @@ -59,7 +60,7 @@ public class TestCreateTableRequest extends RequestResponseTe

Re: [I] Publish Iceberg kafka connect runtime to Confluet hub [iceberg]

2025-01-24 Thread via GitHub
amogh-jahagirdar commented on issue #10745: URL: https://github.com/apache/iceberg/issues/10745#issuecomment-2613617136 Naive question, Is this something that requires automation to push to Confluent Hub or is it something that for the 1.8 milestone, someone needs to manually do? If we're s

Re: [PR] OpenAPI: add initial/write defaults to schema [iceberg]

2025-01-24 Thread via GitHub
rdblue commented on code in PR #12094: URL: https://github.com/apache/iceberg/pull/12094#discussion_r1929383792 ## open-api/rest-catalog-open-api.yaml: ## @@ -2052,6 +2052,10 @@ components: type: boolean doc: type: string +initial-default:

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-24 Thread via GitHub
amogh-jahagirdar merged PR #11982: URL: https://github.com/apache/iceberg/pull/11982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Core: add variant type support [iceberg]

2025-01-24 Thread via GitHub
aihuaxu commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1929379839 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -56,6 +56,10 @@ class BuildAvroProjection extends AvroCustomOrderSchemaVisitor names,

[PR] Build: Bump pydantic from 2.10.5 to 2.10.6 [iceberg-python]

2025-01-24 Thread via GitHub
dependabot[bot] opened a new pull request, #1576: URL: https://github.com/apache/iceberg-python/pull/1576 Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.5 to 2.10.6. Release notes Sourced from https://github.com/pydantic/pydantic/releases";>pydantic's releases.

Re: [PR] Core, Spark: Scan only live entries in RewriteTablePathUtil [iceberg]

2025-01-24 Thread via GitHub
flyrain commented on PR #12006: URL: https://github.com/apache/iceberg/pull/12006#issuecomment-2613545458 > > Yes, thanks for fixing the issue (found by our internal usage). > > I wonder, because the deleted entry may be important for CDC (to mark that this file at some point existed), is

Re: [PR] Core: add variant type support [iceberg]

2025-01-24 Thread via GitHub
aihuaxu commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1929307357 ## api/src/test/java/org/apache/iceberg/types/TestTypeUtil.java: ## @@ -24,38 +24,43 @@ import static org.assertj.core.api.Assertions.assertThatThrownBy; import j

Re: [I] BaseCommitService consumes 100% CPU when idle [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on issue #12086: URL: https://github.com/apache/iceberg/issues/12086#issuecomment-2613492182 I wonder if it should be sleeping even if inProgressCommits() has elements in it. I don't think we want to loop unless work is actually finished -- This is an automated me

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-24 Thread via GitHub
aokolnychyi commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1929239820 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stati

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-01-24 Thread via GitHub
aokolnychyi commented on PR #12098: URL: https://github.com/apache/iceberg/pull/12098#issuecomment-2613398783 cc @ajantha-bhat @rdblue @nastra @Fokko @RussellSpitzer @huaxingao @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-24 Thread via GitHub
rdblue merged PR #11904: URL: https://github.com/apache/iceberg/pull/11904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Make s3.request_timeout configurable [iceberg-python]

2025-01-24 Thread via GitHub
metadaddy commented on PR #1568: URL: https://github.com/apache/iceberg-python/pull/1568#issuecomment-2613455318 @kevinjqliu Ah - it wanted imports in alphabetical order - I'd just inserted `S3_REQUEST_TIMEOUT` immediately after `S3_CONNECT_TIMEOUT`. All fixed now.1 -- This is an automate

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-24 Thread via GitHub
rdblue commented on PR #11904: URL: https://github.com/apache/iceberg/pull/11904#issuecomment-2613443297 Thanks, @ajantha-bhat! Good to get this in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] The List APIs currently lack pagination functionality, resulting in inefficient data retrieval for large datasets. When retrieving a substantial number of records, the absence of pagination le

2025-01-24 Thread via GitHub
RussellSpitzer closed issue #12099: The List APIs currently lack pagination functionality, resulting in inefficient data retrieval for large datasets. When retrieving a substantial number of records, the absence of pagination leads to performance bottlenecks such as slow response times, increa

[PR] Spec: Adds in missing ChangeLog metadata columns - Reassigns Row Line… [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer opened a new pull request, #12100: URL: https://github.com/apache/iceberg/pull/12100 …age Columns It turns out that while our spec currently has assigned field ID's for Row Lineage fields, those ids have already been used in the reference library for ChangeLog views. I

Re: [I] The List APIs currently lack pagination functionality, resulting in inefficient data retrieval for large datasets. When retrieving a substantial number of records, the absence of pagination le

2025-01-24 Thread via GitHub
RussellSpitzer commented on issue #12099: URL: https://github.com/apache/iceberg/issues/12099#issuecomment-2613362984 Closing this as another GenAI issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] KeyError raised when calling inspect.entries() [iceberg-python]

2025-01-24 Thread via GitHub
summermousa-vendia commented on issue #1574: URL: https://github.com/apache/iceberg-python/issues/1574#issuecomment-2613349005 Based on the iceberg spec, the field attribute that is raising an error is listed as optional: https://iceberg.apache.org/spec/#manifests:~:text=in%20the%20column-

Re: [I] The List APIs currently lack pagination functionality, resulting in inefficient data retrieval for large datasets. When retrieving a substantial number of records, the absence of pagination le

2025-01-24 Thread via GitHub
theamit45 commented on issue #12099: URL: https://github.com/apache/iceberg/issues/12099#issuecomment-2613324335 ### Context - **Current Behavior:** List APIs return all records in a single response without pagination. - **Observed Issues:** Large datasets cause performance issues,

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-24 Thread via GitHub
HonahX merged PR #11660: URL: https://github.com/apache/iceberg/pull/11660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-24 Thread via GitHub
HonahX closed issue #11659: Document Snapshot Summary Optional Fields for Standardization URL: https://github.com/apache/iceberg/issues/11659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-24 Thread via GitHub
HonahX commented on PR #11660: URL: https://github.com/apache/iceberg/pull/11660#issuecomment-2613344942 Thanks @kevinjqliu @RussellSpitzer @sfc-gh-aixu @danielcweeks @rdblue for reviewing! The vote has passed: https://lists.apache.org/thread/mz01jwt69osqxhx9d3dd9xzncv9yncd0 I

Re: [PR] OpenAPI: add initial/write defaults to schema [iceberg]

2025-01-24 Thread via GitHub
kevinjqliu commented on code in PR #12094: URL: https://github.com/apache/iceberg/pull/12094#discussion_r1929207089 ## open-api/rest-catalog-open-api.yaml: ## @@ -2052,6 +2052,10 @@ components: type: boolean doc: type: string +initial-defau

[I] KeyError raised when calling inspect.entries() [iceberg-python]

2025-01-24 Thread via GitHub
summermousa-vendia opened a new issue, #1574: URL: https://github.com/apache/iceberg-python/issues/1574 ### Apache Iceberg version 0.8.1 (latest release) ### Please describe the bug 🐞 # Description When connecting to an iceberg glue catalog, I am unable to retrieve the

Re: [I] Support for timestamp downcasting when loading data to iceberg tables [iceberg-python]

2025-01-24 Thread via GitHub
fusion commented on issue #1045: URL: https://github.com/apache/iceberg-python/issues/1045#issuecomment-2613243778 Link mentioned by @lloyd-EA seems not to work. I created another this PR based on my patch from my previous comment as you suggested @sungwy . https://github.com/ap

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-24 Thread via GitHub
szehon-ho commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1929128400 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -96,6 +97,7 @@ public class RewriteTablePathSparkAction e

Re: [PR] API, CORE: Adds Row Lineage Fields [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer closed pull request #11930: API, CORE: Adds Row Lineage Fields URL: https://github.com/apache/iceberg/pull/11930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Make s3.request_timeout configurable [iceberg-python]

2025-01-24 Thread via GitHub
kevinjqliu commented on PR #1568: URL: https://github.com/apache/iceberg-python/pull/1568#issuecomment-2613211967 Looks like theres a lint issue, can you make `make lint` locally? @metadaddy -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] Enable pyiceberg.table.Table.add_files ns downcasting [iceberg-python]

2025-01-24 Thread via GitHub
fusion opened a new pull request, #1572: URL: https://github.com/apache/iceberg-python/pull/1572 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Make s3.request_timeout configurable [iceberg-python]

2025-01-24 Thread via GitHub
metadaddy commented on code in PR #1568: URL: https://github.com/apache/iceberg-python/pull/1568#discussion_r1929058246 ## pyiceberg/io/fsspec.py: ## @@ -150,6 +151,9 @@ def _s3(properties: Properties) -> AbstractFileSystem: if connect_timeout := properties.get(S3_CONNECT_T

Re: [PR] Core: Check referencedDataFile existence for DV [iceberg]

2025-01-24 Thread via GitHub
amogh-jahagirdar commented on code in PR #12088: URL: https://github.com/apache/iceberg/pull/12088#discussion_r1929076093 ## core/src/main/java/org/apache/iceberg/FileMetadata.java: ## @@ -255,6 +255,8 @@ public DeleteFile build() { if (format == FileFormat.PUFFIN) {

Re: [PR] Make s3.request_timeout configurable [iceberg-python]

2025-01-24 Thread via GitHub
metadaddy commented on PR #1568: URL: https://github.com/apache/iceberg-python/pull/1568#issuecomment-2613110724 Hi @Fokko - your suggestion correction integrated and pushed. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] "Request for New Feature: Optimized Query Execution" [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on issue #12097: URL: https://github.com/apache/iceberg/issues/12097#issuecomment-2613097485 Closed, as I'm pretty sure this is a purely AI generated issue. I will reopen if i'm wrong. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] "Request for New Feature: Optimized Query Execution" [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer closed issue #12097: "Request for New Feature: Optimized Query Execution" URL: https://github.com/apache/iceberg/issues/12097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

2025-01-24 Thread via GitHub
aokolnychyi commented on code in PR #12056: URL: https://github.com/apache/iceberg/pull/12056#discussion_r1929047717 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorWithFilter.java: ## @@ -18,78 +18,138 @@ */ package org.apache.iceberg.s

[PR] Add relevant NOTICE portions from ALv2 bundled dependencies [iceberg]

2025-01-24 Thread via GitHub
jbonofre opened a new pull request, #12095: URL: https://github.com/apache/iceberg/pull/12095 Bundle jar files actually bundle a few ALv2 dependencies. These dependencies are correctly listed in the `LICENSE` file, but the `NOTICE` file don't contain relevant portions when the dependency pr

Re: [PR] Spec, OpenAPI: Adds EnableRowLineage Metadata Update [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on PR #12050: URL: https://github.com/apache/iceberg/pull/12050#issuecomment-2613082288 Merged, Thanks for the review @danielcweeks + @flyrain + @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Spec, OpenAPI: Adds EnableRowLineage Metadata Update [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer merged PR #12050: URL: https://github.com/apache/iceberg/pull/12050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-24 Thread via GitHub
rahil-c commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1929038949 ## core/src/main/java/org/apache/iceberg/ContentFileParser.java: ## @@ -27,7 +27,7 @@ import org.apache.iceberg.relocated.com.google.common.base.Preconditions; impo

Re: [PR] Parquet: Fix Reader leak by removing useless copy [iceberg]

2025-01-24 Thread via GitHub
nastra commented on code in PR #12079: URL: https://github.com/apache/iceberg/pull/12079#discussion_r1929021070 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetIO.java: ## @@ -82,22 +82,10 @@ static OutputFile file(org.apache.iceberg.io.OutputFile file, Configuration

Re: [PR] Spark 3.5: Procedure to rewrite table path [iceberg]

2025-01-24 Thread via GitHub
dramaticlly commented on code in PR #11931: URL: https://github.com/apache/iceberg/pull/11931#discussion_r1928991587 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the

Re: [PR] Core: Add metadataFileLocation in TableUtil [iceberg]

2025-01-24 Thread via GitHub
nastra commented on PR #12082: URL: https://github.com/apache/iceberg/pull/12082#issuecomment-2613043318 I'll wait a bit with merging in case @amogh-jahagirdar has any comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Fix statistics documentation by removing snapshot_id references [iceberg-python]

2025-01-24 Thread via GitHub
kevinjqliu merged PR #1570: URL: https://github.com/apache/iceberg-python/pull/1570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Spark 3.4: Refactor delete logic in batch reading [iceberg]

2025-01-24 Thread via GitHub
huaxingao commented on PR #12061: URL: https://github.com/apache/iceberg/pull/12061#issuecomment-2613034312 Thanks @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Parquet: Fix Reader leak by removing useless copy [iceberg]

2025-01-24 Thread via GitHub
zizon commented on PR #12079: URL: https://github.com/apache/iceberg/pull/12079#issuecomment-2613024665 cc @nastra @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on code in PR #11948: URL: https://github.com/apache/iceberg/pull/11948#discussion_r1929006431 ## core/src/test/java/org/apache/iceberg/TestRowLineageMetadata.java: ## @@ -0,0 +1,328 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-24 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2613020929 I'm still working on this and will continue working on this. I find it quite difficult to get Arrow and Iceberg to agree on types because of field ids. The Iceberg schema requires

Re: [PR] Spark 3.4: Refactor delete logic in batch reading [iceberg]

2025-01-24 Thread via GitHub
aokolnychyi merged PR #12061: URL: https://github.com/apache/iceberg/pull/12061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Spark 3.5: Procedure to rewrite table path [iceberg]

2025-01-24 Thread via GitHub
dramaticlly commented on code in PR #11931: URL: https://github.com/apache/iceberg/pull/11931#discussion_r1928992930 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on code in PR #11948: URL: https://github.com/apache/iceberg/pull/11948#discussion_r1928991487 ## core/src/test/java/org/apache/iceberg/TestRowLineageMetadata.java: ## @@ -0,0 +1,328 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Spark 3.5: Procedure to rewrite table path [iceberg]

2025-01-24 Thread via GitHub
dramaticlly commented on code in PR #11931: URL: https://github.com/apache/iceberg/pull/11931#discussion_r1928991587 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the

Re: [I] Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down [iceberg]

2025-01-24 Thread via GitHub
mgmarino commented on issue #12046: URL: https://github.com/apache/iceberg/issues/12046#issuecomment-2612986973 After doing some further investigation, my initial conclusion is the following: - I can see `SerializableTableWithSize` being generated on the driver at least in two differ

Re: [PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

2025-01-24 Thread via GitHub
nickdelnano commented on PR #11923: URL: https://github.com/apache/iceberg/pull/11923#issuecomment-2612984557 sorry for the delay here - I was on vacation for a bit a9cbadd3dc11ed082e42fdae12a640373027bb38 should fix the tests ``` ./gradlew spotlessApply -DallModules ``` --

Re: [I] Cannot use MERGE INTO query on Iceberg table. Getting `java.lang.IllegalArgumentException: Comparison method violates its general contract!` error. [iceberg]

2025-01-24 Thread via GitHub
stevencarpenter commented on issue #9650: URL: https://github.com/apache/iceberg/issues/9650#issuecomment-2612974723 @nastra thank you for the quick response! I will open a ticket with AWS. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-24 Thread via GitHub
ajantha-bhat commented on PR #11904: URL: https://github.com/apache/iceberg/pull/11904#issuecomment-2612954116 rebasing the PR as Flink hit a flaky test https://github.com/apache/iceberg/issues/11833#issuecomment-2581584085 -- This is an automated message from the Apache Git Service. To r

Re: [PR] Parquet: Fix Reader leak by removing useless copy [iceberg]

2025-01-24 Thread via GitHub
zizon commented on PR #12079: URL: https://github.com/apache/iceberg/pull/12079#issuecomment-2612949803 I think I found the root cause. ``` 2025-01-24T10:39:58.963+0800 WARNFinalizer org.apache.iceberg.hadoop.HadoopStreams Unclosed input stream created by: org.ap

Re: [PR] Core, Spark: Scan only live entries in RewriteTablePathUtil [iceberg]

2025-01-24 Thread via GitHub
dramaticlly commented on code in PR #12006: URL: https://github.com/apache/iceberg/pull/12006#discussion_r1928946351 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -710,7 +710,7 @@ private boolean fileExist(String path)

[I] Overwrite with Filter Conditions Example - Large Amount of Filter Conditions [iceberg-python]

2025-01-24 Thread via GitHub
lelandroling opened a new issue, #1571: URL: https://github.com/apache/iceberg-python/issues/1571 ### Question Checking through the GitHub issues, I noticed very few examples and I did see the open requests for improved documentation. Understandably, I understand that I can use MERGE

Re: [PR] refactor(catalog): restructure catalog package [iceberg-go]

2025-01-24 Thread via GitHub
zeroshade commented on PR #266: URL: https://github.com/apache/iceberg-go/pull/266#issuecomment-2612911737 CC @achille-roussel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Core, Spark: Scan only live entries in RewriteTablePathUtil [iceberg]

2025-01-24 Thread via GitHub
dramaticlly commented on code in PR #12006: URL: https://github.com/apache/iceberg/pull/12006#discussion_r1928932547 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java: ## @@ -923,16 +1005,20 @@ protected void checkFileNum(

[PR] OpenAPI: add initial/write defaults to schema [iceberg]

2025-01-24 Thread via GitHub
danielcweeks opened a new pull request, #12094: URL: https://github.com/apache/iceberg/pull/12094 Adds `inital-default` and `write-default` to the REST API spec. Includes updates to a couple small round-trip request/response serde tests. -- This is an automated message from the Apac

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-24 Thread via GitHub
mapleFU commented on code in PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#discussion_r1928877725 ## src/iceberg/type.cc: ## @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOT

[PR] Fix statistics documentation by removing snapshot_id references [iceberg-python]

2025-01-24 Thread via GitHub
ndrluis opened a new pull request, #1570: URL: https://github.com/apache/iceberg-python/pull/1570 @Fokko I forgot to update the statistics documentation in the previous PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on code in PR #11948: URL: https://github.com/apache/iceberg/pull/11948#discussion_r1928910159 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -282,6 +283,13 @@ public Snapshot apply() { throw new RuntimeIOException(e, "Faile

Re: [I] Deprecate `snapshot-id` of `SetStatisticsUpdate` [iceberg-python]

2025-01-24 Thread via GitHub
Fokko closed issue #1556: Deprecate `snapshot-id` of `SetStatisticsUpdate` URL: https://github.com/apache/iceberg-python/issues/1556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Remove redundant snapshot_id from SetStatisticsUpdate [iceberg-python]

2025-01-24 Thread via GitHub
Fokko merged PR #1566: URL: https://github.com/apache/iceberg-python/pull/1566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Remove redundant snapshot_id from SetStatisticsUpdate [iceberg-python]

2025-01-24 Thread via GitHub
ndrluis commented on PR #1566: URL: https://github.com/apache/iceberg-python/pull/1566#issuecomment-2612827799 @Fokko Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-24 Thread via GitHub
amogh-jahagirdar commented on code in PR #11948: URL: https://github.com/apache/iceberg/pull/11948#discussion_r1928151867 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -290,7 +298,27 @@ public Snapshot apply() { operation(), summary(base

Re: [I] set tblproperties, spark action expireSnapshots is not work. [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on issue #12078: URL: https://github.com/apache/iceberg/issues/12078#issuecomment-2612815164 I'm not sure what you are asking here. Are you saying that the table properties are not being set correctly in metadata json when set in flink? Or are you saying t

Re: [PR] Core: Change RemoveSnapshots to remove unused schemas [iceberg]

2025-01-24 Thread via GitHub
gaborkaszab commented on PR #12089: URL: https://github.com/apache/iceberg/pull/12089#issuecomment-2612804377 cc @RussellSpitzer @rdblue @danielcweeks Would you mind taking a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Iceberg + Spark + Hive Catalog + warehouse in s3a [iceberg]

2025-01-24 Thread via GitHub
Fokko commented on issue #11984: URL: https://github.com/apache/iceberg/issues/11984#issuecomment-2612781668 @2MD Can you try: ``` .config(s"spark.sql.catalog.$icebergCatalog.s3.access-key-id", minioContainer.getMinioAccessValue) .config(s"spark.sql.catalog.$icebergCatalog

Re: [I] `partial-progress.max-failed-commits` Incorrectly compare the failureCommit value [iceberg]

2025-01-24 Thread via GitHub
RussellSpitzer commented on issue #12076: URL: https://github.com/apache/iceberg/issues/12076#issuecomment-2612777076 Got it! That makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] `partial-progress.max-failed-commits` Incorrectly compare the failureCommit value [iceberg]

2025-01-24 Thread via GitHub
ruotianwang commented on issue #12076: URL: https://github.com/apache/iceberg/issues/12076#issuecomment-2612767531 @RussellSpitzer Basically this line: https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java

Re: [PR] Kafka Connect: Add delta writer support [iceberg]

2025-01-24 Thread via GitHub
bryanck commented on PR #12070: URL: https://github.com/apache/iceberg/pull/12070#issuecomment-2612762278 > @bryanck copied over the code as is. > > Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior.

  1   2   >