Re: [PR] Remove slf4j-api reference in LICENSE as it's now excluded from the distributed jar files [iceberg]

2025-01-22 Thread via GitHub
jbonofre commented on code in PR #12052: URL: https://github.com/apache/iceberg/pull/12052#discussion_r1926463459 ## gcp-bundle/LICENSE: ## @@ -549,12 +549,6 @@ License: Apache 2 - https://www.apache.org/licenses/LICENSE-2.0 -

Re: [I] Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down [iceberg]

2025-01-22 Thread via GitHub
mgmarino commented on issue #12046: URL: https://github.com/apache/iceberg/issues/12046#issuecomment-2609058634 I tried to trace where the connection pool is being closed. Aside from a calls stemming from finalizers on Thread shutdown (which seem perfectly legitimate), I see: ```

Re: [I] Rename the partition field and add a field with the same name as the old partition field GOT ERROR [iceberg]

2025-01-22 Thread via GitHub
huan233usc commented on issue #11762: URL: https://github.com/apache/iceberg/issues/11762#issuecomment-2609038962 IIUC data file's name doesn't matter, I propose a fix of generating partition spec on rename identity partition key. -- This is an automated message from the Apache Git Servic

[PR] Fix rename then add column with same name failure if the renamed columns was an identity partition key [iceberg]

2025-01-22 Thread via GitHub
huan233usc opened a new pull request, #12064: URL: https://github.com/apache/iceberg/pull/12064 This PR tries to fix https://github.com/apache/iceberg/issues/11762 for the identity partition case. Other partition may also requires the fix. -- This is an automated message from the Apache G

Re: [PR] Remove slf4j-api reference in LICENSE as it's now excluded from the distributed jar files [iceberg]

2025-01-22 Thread via GitHub
jbonofre commented on PR #12052: URL: https://github.com/apache/iceberg/pull/12052#issuecomment-2608984497 @Fokko let me check but I'm not sure it's the case for all bundle jar files, and also not sure it's actually slf4j-api. I do a new pass. -- This is an automated message from the Apa

Re: [PR] Remove slf4j-api reference in LICENSE as it's now excluded from the distributed jar files [iceberg]

2025-01-22 Thread via GitHub
jbonofre commented on code in PR #12052: URL: https://github.com/apache/iceberg/pull/12052#discussion_r1926463459 ## gcp-bundle/LICENSE: ## @@ -549,12 +549,6 @@ License: Apache 2 - https://www.apache.org/licenses/LICENSE-2.0 -

Re: [PR] Remove slf4j-api reference in LICENSE as it's now excluded from the distributed jar files [iceberg]

2025-01-22 Thread via GitHub
jbonofre commented on code in PR #12052: URL: https://github.com/apache/iceberg/pull/12052#discussion_r1926463235 ## azure-bundle/LICENSE: ## @@ -511,9 +511,3 @@ License: The Apache Software License, Version 2.0 - http://www.apache.org/licens Group: org.reactivestreams Name:

Re: [I] [Bug] Error in overwrite(): pyarrow.lib.ArrowInvalid: offset overflow with large dataset (~3M rows) [iceberg-python]

2025-01-22 Thread via GitHub
Fokko closed issue #1491: [Bug] Error in overwrite(): pyarrow.lib.ArrowInvalid: offset overflow with large dataset (~3M rows) URL: https://github.com/apache/iceberg-python/issues/1491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-22 Thread via GitHub
Fokko merged PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Delta, AWS, REST: Remove redundant charset lookup [iceberg]

2025-01-22 Thread via GitHub
nastra merged PR #12057: URL: https://github.com/apache/iceberg/pull/12057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Rename the partition field and add a field with the same name as the old partition field GOT ERROR [iceberg]

2025-01-22 Thread via GitHub
madeirak commented on issue #11762: URL: https://github.com/apache/iceberg/issues/11762#issuecomment-2608982066 > It seems that the on rename column, if the column is partition column, its name is not updated. metadata file from my local testing -> "schemas" : [ { "type" : "struct", "schema

Re: [PR] Flink: Add null check to writers to prevent resurrecting null values [iceberg]

2025-01-22 Thread via GitHub
pvary commented on PR #12049: URL: https://github.com/apache/iceberg/pull/12049#issuecomment-2608966180 @mxm: Please remove the 1.18, 1.19 changes from the PR. It is much easier to review this way, and apply changes required by the reviewer. When the PR has been merged, we can backport the

Re: [PR] Remove slf4j-api reference in LICENSE as it's now excluded from the distributed jar files [iceberg]

2025-01-22 Thread via GitHub
Fokko commented on code in PR #12052: URL: https://github.com/apache/iceberg/pull/12052#discussion_r1926395983 ## azure-bundle/LICENSE: ## @@ -511,9 +511,3 @@ License: The Apache Software License, Version 2.0 - http://www.apache.org/licens Group: org.reactivestreams Name: rea

Re: [PR] [DO NOT MERGE]Test pr title [iceberg-go]

2025-01-22 Thread via GitHub
liurenjie1024 closed pull request #267: [DO NOT MERGE]Test pr title URL: https://github.com/apache/iceberg-go/pull/267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1926371943 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +46,31 @@ protected ParquetValueWriter createWriter(MessageType typ

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1926371943 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +46,31 @@ protected ParquetValueWriter createWriter(MessageType typ

Re: [I] Rename the partition field and add a field with the same name as the old partition field GOT ERROR [iceberg]

2025-01-22 Thread via GitHub
huan233usc commented on issue #11762: URL: https://github.com/apache/iceberg/issues/11762#issuecomment-2608847755 It seems that the on rename column, if the column is partition column, its name is not updated. metadata file from my local testing -> "schemas" : [ { "type" : "

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-22 Thread via GitHub
Fokko commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1926350102 ## core/src/test/java/org/apache/iceberg/TestSortOrder.java: ## @@ -342,6 +342,22 @@ public void testVariantUnsupported() { .hasMessage("Unsupported type for i

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-22 Thread via GitHub
Fokko commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1926348050 ## core/src/test/java/org/apache/iceberg/TestSortOrder.java: ## @@ -342,6 +342,22 @@ public void testVariantUnsupported() { .hasMessage("Unsupported type for i

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-22 Thread via GitHub
Fokko commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1926344616 ## api/src/main/java/org/apache/iceberg/transforms/Identity.java: ## @@ -39,7 +39,9 @@ class Identity implements Transform { @Deprecated public static Identity g

Re: [I] Typedefs such as Identifier, Properties, and RecursiveDict are not hyperlinked in the generated documentation [iceberg-python]

2025-01-22 Thread via GitHub
Fokko closed issue #1529: Typedefs such as Identifier, Properties, and RecursiveDict are not hyperlinked in the generated documentation URL: https://github.com/apache/iceberg-python/issues/1529 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] docs: Add docstrings for Identifier, Properties, RecursiveDict [iceberg-python]

2025-01-22 Thread via GitHub
Fokko merged PR #1530: URL: https://github.com/apache/iceberg-python/pull/1530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Typedefs such as Identifier, Properties, and RecursiveDict are not hyperlinked in the generated documentation [iceberg-python]

2025-01-22 Thread via GitHub
Fokko closed issue #1529: Typedefs such as Identifier, Properties, and RecursiveDict are not hyperlinked in the generated documentation URL: https://github.com/apache/iceberg-python/issues/1529 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Refactor `{year,month,day,hour}` transform [iceberg-python]

2025-01-22 Thread via GitHub
Fokko commented on PR #1563: URL: https://github.com/apache/iceberg-python/pull/1563#issuecomment-2608808180 Thanks @kevinjqliu :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Refactor `{year,month,day,hour}` transform [iceberg-python]

2025-01-22 Thread via GitHub
Fokko merged PR #1563: URL: https://github.com/apache/iceberg-python/pull/1563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-22 Thread via GitHub
manuzhang commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1926329595 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -728,4 +710,13 @@ private String getMetadataLocation(Tabl

Re: [PR] Add Doxygen for generating API documentation [iceberg-cpp]

2025-01-22 Thread via GitHub
lidavidm commented on PR #27: URL: https://github.com/apache/iceberg-cpp/pull/27#issuecomment-2608784051 https://github.com/apache/iceberg-cpp/issues/36 :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Add Doxygen for generating API documentation [iceberg-cpp]

2025-01-22 Thread via GitHub
lidavidm commented on PR #27: URL: https://github.com/apache/iceberg-cpp/pull/27#issuecomment-2608783393 I am not sure what I was doing there, good catch. Let me fix that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Add Doxygen for generating API documentation [iceberg-cpp]

2025-01-22 Thread via GitHub
wgtmac commented on PR #27: URL: https://github.com/apache/iceberg-cpp/pull/27#issuecomment-2608781211 > Thanks, I filed [apache/arrow-java#555](https://github.com/apache/arrow-java/issues/555) to keep track of that. It seems that the issue was created against the wrong repo... --

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1926310867 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -728,4 +710,13 @@ private String getMetadataLocati

Re: [PR] Add Doxygen for generating API documentation [iceberg-cpp]

2025-01-22 Thread via GitHub
lidavidm commented on PR #27: URL: https://github.com/apache/iceberg-cpp/pull/27#issuecomment-2608767701 Thanks, I filed https://github.com/apache/arrow-java/issues/555 to keep track of that. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-22 Thread via GitHub
lidavidm commented on PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2608739486 I added a bunch of unit tests + fixed some copy-paste errors I discovered in the process. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] test: Introduce datafusion engine for executing sqllogictest. [iceberg-rust]

2025-01-22 Thread via GitHub
liurenjie1024 commented on PR #895: URL: https://github.com/apache/iceberg-rust/pull/895#issuecomment-2608730957 > I published datafusion-sqllogictest to crates.io (details here [apache/datafusion#14229 (comment)](https://github.com/apache/datafusion/discussions/14229#discussioncomment-1191

Re: [PR] [DO NOT MERGE]Test pr title [iceberg-cpp]

2025-01-22 Thread via GitHub
liurenjie1024 closed pull request #35: [DO NOT MERGE]Test pr title URL: https://github.com/apache/iceberg-cpp/pull/35 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1926276720 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +54,32 @@ protected ParquetValueWriter createWriter(MessageType typ

Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

2025-01-22 Thread via GitHub
huaxingao commented on PR #12056: URL: https://github.com/apache/iceberg/pull/12056#issuecomment-2608719647 Thanks for the refactor! The code is much cleaner now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1926274339 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +54,32 @@ protected ParquetValueWriter createWriter(MessageType typ

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1926270438 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -850,4 +919,42 @@ private TripleIterator firstNonNullColumn(List> columns) {

[PR] [DO NOT MERGE]Test pr title [iceberg-go]

2025-01-22 Thread via GitHub
liurenjie1024 opened a new pull request, #267: URL: https://github.com/apache/iceberg-go/pull/267 Test if desc is set as default git message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-22 Thread via GitHub
manuzhang commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1926261998 ## format/spec.md: ## @@ -1633,3 +1633,47 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time quer

[PR] Test pr title [iceberg-cpp]

2025-01-22 Thread via GitHub
liurenjie1024 opened a new pull request, #35: URL: https://github.com/apache/iceberg-cpp/pull/35 Test to see if pr desc has been enabled as default git message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat(puffin): Add PuffinReader [iceberg-rust]

2025-01-22 Thread via GitHub
fqaiser94 commented on code in PR #892: URL: https://github.com/apache/iceberg-rust/pull/892#discussion_r1926260077 ## crates/iceberg/src/puffin/blob.rs: ## @@ -0,0 +1,38 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] feat(puffin): Add PuffinReader [iceberg-rust]

2025-01-22 Thread via GitHub
fqaiser94 commented on code in PR #892: URL: https://github.com/apache/iceberg-rust/pull/892#discussion_r1926246953 ## crates/iceberg/src/puffin/reader.rs: ## @@ -0,0 +1,126 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-22 Thread via GitHub
manuzhang commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1926244324 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -728,4 +724,22 @@ private String getMetadataLocation(Tabl

Re: [PR] Spark 3.5: Fix broadcasting specs in RewriteTablePath [iceberg]

2025-01-22 Thread via GitHub
manuzhang commented on code in PR #11982: URL: https://github.com/apache/iceberg/pull/11982#discussion_r1926243669 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -728,4 +724,22 @@ private String getMetadataLocation(Tabl

[I] Drop table failed when metadata.json file is missing [iceberg]

2025-01-22 Thread via GitHub
SGITLOGIN opened a new issue, #12062: URL: https://github.com/apache/iceberg/issues/12062 ### Feature Request / Improvement ### Question: drop table failed when metadata.json file is missing When the metadata.json file is lost, I want to delete this table record from the hiv

Re: [PR] Spark 3.5: Fix Javadoc in ColumnarBatchUtil [iceberg]

2025-01-22 Thread via GitHub
huaxingao commented on code in PR #12058: URL: https://github.com/apache/iceberg/pull/12058#discussion_r1926237909 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -31,10 +31,13 @@ public class ColumnarBatchUtil { priv

Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

2025-01-22 Thread via GitHub
aokolnychyi commented on PR #12056: URL: https://github.com/apache/iceberg/pull/12056#issuecomment-2608648993 @huaxingao, could you check this one? I've been meaning to refactor our `ColumnVectorWithFilter`. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Spark 3.5: Make ColumnVectorWithFilter generic and refactor batch load [iceberg]

2025-01-22 Thread via GitHub
aokolnychyi commented on code in PR #12056: URL: https://github.com/apache/iceberg/pull/12056#discussion_r1926229097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorWithFilter.java: ## @@ -18,78 +18,121 @@ */ package org.apache.iceberg.s

Re: [PR] Spark 3.5: Fix Javadoc in ColumnarBatchUtil [iceberg]

2025-01-22 Thread via GitHub
aokolnychyi commented on code in PR #12058: URL: https://github.com/apache/iceberg/pull/12058#discussion_r1926222183 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -31,10 +31,13 @@ public class ColumnarBatchUtil { pr

Re: [PR] Spark 3.5: Fix Javadoc in ColumnarBatchUtil [iceberg]

2025-01-22 Thread via GitHub
aokolnychyi commented on code in PR #12058: URL: https://github.com/apache/iceberg/pull/12058#discussion_r1926221855 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -31,10 +31,13 @@ public class ColumnarBatchUtil { pr

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926214561 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -125,6 +126,18 @@ public class Parquet { private Parquet() {} + public void register()

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926214945 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -1171,6 +1188,16 @@ public ReadBuilder withNameMapping(NameMapping newNameMapping) { re

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926214256 ## core/src/main/java/org/apache/iceberg/avro/SupportsCustomRecords.java: ## @@ -20,7 +20,7 @@ import java.util.Map; -/** An interface for Avro DatumReaders to su

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926212266 ## core/src/main/java/org/apache/iceberg/avro/InternalReader.java: ## @@ -76,6 +76,15 @@ public void setSchema(Schema schema) { initReader(); } + @Override

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926211300 ## core/src/main/java/org/apache/iceberg/avro/Avro.java: ## @@ -90,14 +103,18 @@ private enum Codec { } public static WriteBuilder write(OutputFile file) { +

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926210150 ## core/src/main/java/org/apache/iceberg/V1Metadata.java: ## @@ -330,32 +333,44 @@ public ManifestEntry copyWithoutStats() { } } - static class IndexedDataF

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926209609 ## core/src/main/java/org/apache/iceberg/V1Metadata.java: ## @@ -64,17 +59,21 @@ public ManifestFile wrap(ManifestFile file) { } @Override -public org.

Re: [PR] OpenAPI: Changes for freshness-aware table loading [iceberg]

2025-01-22 Thread via GitHub
flyrain commented on code in PR #11946: URL: https://github.com/apache/iceberg/pull/11946#discussion_r1926204614 ## open-api/rest-catalog-open-api.yaml: ## @@ -1873,6 +1886,15 @@ components: type: integer minimum: 1 +etag: + name: ETag + in: hea

Re: [PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #12060: URL: https://github.com/apache/iceberg/pull/12060#discussion_r1926206956 ## core/src/main/java/org/apache/iceberg/InternalData.java: ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[PR] Core: Add InternalData read and write builders [iceberg]

2025-01-22 Thread via GitHub
rdblue opened a new pull request, #12060: URL: https://github.com/apache/iceberg/pull/12060 This adds `InternalData` with read and write builder interfaces that can be used with Avro and Parquet by passing a `FileFormat`. Formats are registered by calling `InternalData.register` with callba

Re: [I] Reference apache/iceberg repo's contribute doc in this repo [iceberg-python]

2025-01-22 Thread via GitHub
github-actions[bot] commented on issue #970: URL: https://github.com/apache/iceberg-python/issues/970#issuecomment-2608548182 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [PR] Fix Hive FileIO closing with FileIOTracker [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on PR #11782: URL: https://github.com/apache/iceberg/pull/11782#issuecomment-2608543572 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Azure Databricks Azure databricks 14.3 LTS runtime with apache iceberg 1.5.2 spark 3.5.0 and scala 2.12 Error [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on issue #10789: URL: https://github.com/apache/iceberg/issues/10789#issuecomment-2608543474 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Parquet: add variant type support [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on PR #11653: URL: https://github.com/apache/iceberg/pull/11653#issuecomment-2608543531 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] JDBC: Escape table names when checking the existence [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on PR #11863: URL: https://github.com/apache/iceberg/pull/11863#issuecomment-2608543609 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Should be null_value_counts updated after adding a new column to the schema? [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on issue #10773: URL: https://github.com/apache/iceberg/issues/10773#issuecomment-2608543354 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Error configuration key in document [iceberg]

2025-01-22 Thread via GitHub
github-actions[bot] commented on issue #10785: URL: https://github.com/apache/iceberg/issues/10785#issuecomment-2608543452 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Implement update for `remove-snapshots` action [iceberg-python]

2025-01-22 Thread via GitHub
grihabor commented on code in PR #1561: URL: https://github.com/apache/iceberg-python/pull/1561#discussion_r1926141836 ## tests/table/test_init.py: ## @@ -793,6 +794,34 @@ def test_update_metadata_set_snapshot_ref(table_v2: Table) -> None: ) +def test_update_remove_sna

Re: [PR] Spec, OpenAPI: Adds EnableRowLineage Metadata Update [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #12050: URL: https://github.com/apache/iceberg/pull/12050#discussion_r1926140195 ## open-api/rest-catalog-open-api.yaml: ## @@ -2945,6 +2946,14 @@ components: items: type: integer +EnableRowLineageUpdate: Review

Re: [PR] Implement update for `remove-snapshots` action [iceberg-python]

2025-01-22 Thread via GitHub
grihabor commented on code in PR #1561: URL: https://github.com/apache/iceberg-python/pull/1561#discussion_r1926139112 ## pyiceberg/table/update/__init__.py: ## @@ -455,6 +455,19 @@ def _(update: SetSnapshotRefUpdate, base_metadata: TableMetadata, context: _Tabl return bas

Re: [PR] Spark 3.5: Fix Javadoc in ColumnarBatchUtil [iceberg]

2025-01-22 Thread via GitHub
huaxingao commented on PR #12058: URL: https://github.com/apache/iceberg/pull/12058#issuecomment-2608516483 cc @aokolnychyi @szehon-ho @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Build: Bump mypy-boto3-glue from 1.36.0 to 1.36.4 [iceberg-python]

2025-01-22 Thread via GitHub
kevinjqliu merged PR #1565: URL: https://github.com/apache/iceberg-python/pull/1565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump boto3 from 1.36.1 to 1.36.3 [iceberg-python]

2025-01-22 Thread via GitHub
kevinjqliu merged PR #1564: URL: https://github.com/apache/iceberg-python/pull/1564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-22 Thread via GitHub
HonahX commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1926050463 ## core/src/test/java/org/apache/iceberg/TestSortOrder.java: ## @@ -342,6 +342,22 @@ public void testVariantUnsupported() { .hasMessage("Unsupported type for

Re: [PR] Auth Manager API part 4: RESTClient, HTTPClient [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #11992: URL: https://github.com/apache/iceberg/pull/11992#discussion_r1926123223 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -214,92 +225,63 @@ private static void throwFailure( throw new RESTException("Unhandled

Re: [PR] Auth Manager API part 4: RESTClient, HTTPClient [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #11992: URL: https://github.com/apache/iceberg/pull/11992#discussion_r1926121826 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -214,92 +225,63 @@ private static void throwFailure( throw new RESTException("Unhandled

Re: [PR] AWS: Add support for enabling access to S3 Requester Pays bucket [iceberg]

2025-01-22 Thread via GitHub
blitzmohit commented on PR #11915: URL: https://github.com/apache/iceberg/pull/11915#issuecomment-2608477998 Hi @steveloughran I just wanted to follow up on my previous comment and would love to hear any additional thoughts or suggestions you might have. I looked into the Hado

Re: [PR] Auth Manager API part 4: RESTClient, HTTPClient [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #11992: URL: https://github.com/apache/iceberg/pull/11992#discussion_r1926113976 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -88,33 +84,30 @@ public class HTTPClient implements RESTClient { @VisibleForTesting sta

Re: [PR] Auth Manager API part 4: RESTClient, HTTPClient [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #11992: URL: https://github.com/apache/iceberg/pull/11992#discussion_r1926105323 ## core/src/main/java/org/apache/iceberg/rest/BaseHTTPClient.java: ## @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Auth Manager API part 4: RESTClient, HTTPClient [iceberg]

2025-01-22 Thread via GitHub
danielcweeks commented on code in PR #11992: URL: https://github.com/apache/iceberg/pull/11992#discussion_r1926099736 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -214,92 +225,63 @@ private static void throwFailure( throw new RESTException("Unhandled

Re: [PR] Spark3.5: Standardizing Error Handling in Iceberg Spark Module - TestViews [iceberg]

2025-01-22 Thread via GitHub
huaxingao commented on code in PR #11993: URL: https://github.com/apache/iceberg/pull/11993#discussion_r1926089153 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -213,10 +213,13 @@ public void readFromViewUsingNonExistingTa

[PR] Fix Javadoc in ColumnarBatchUtil [iceberg]

2025-01-22 Thread via GitHub
huaxingao opened a new pull request, #12058: URL: https://github.com/apache/iceberg/pull/12058 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-22 Thread via GitHub
Fokko commented on code in PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555#discussion_r1926067731 ## pyiceberg/partitioning.py: ## @@ -413,8 +413,10 @@ def partition_record_value(partition_field: PartitionField, value: Any, schema: the final partition rec

[PR] Build: Bump mypy-boto3-glue from 1.36.0 to 1.36.4 [iceberg-python]

2025-01-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1565: URL: https://github.com/apache/iceberg-python/pull/1565 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.36.0 to 1.36.4. Release notes Sourced from https://github.com/youtype/mypy_boto3_builder/releases"

[PR] Build: Bump boto3 from 1.36.1 to 1.36.3 [iceberg-python]

2025-01-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1564: URL: https://github.com/apache/iceberg-python/pull/1564 Bumps [boto3](https://github.com/boto/boto3) from 1.36.1 to 1.36.3. Commits https://github.com/boto/boto3/commit/50e6c29196fdb2ace0adc7ee65231b57ad8f1c74";>50e6c29 Merge br

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1926051437 ## core/src/main/java/org/apache/iceberg/UnboundBaseFileScanTask.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1926051437 ## core/src/main/java/org/apache/iceberg/UnboundBaseFileScanTask.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Refactor `bucket` transform types [iceberg-python]

2025-01-22 Thread via GitHub
Fokko commented on PR #1562: URL: https://github.com/apache/iceberg-python/pull/1562#issuecomment-2608342763 @kevinjqliu Thanks, and yes, thanks for correcting the title :p -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Refactor `bucket` transform types [iceberg-python]

2025-01-22 Thread via GitHub
Fokko merged PR #1562: URL: https://github.com/apache/iceberg-python/pull/1562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-22 Thread via GitHub
huaxingao commented on PR #11933: URL: https://github.com/apache/iceberg/pull/11933#issuecomment-2608329677 Thanks a lot @aokolnychyi! I will back-port to 3.4 and fix the java doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-01-22 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2608311564 Also @kevinjqliu - To address your question on datafusion. When I looked into this feature, I explored these 3 options for an arrow processing engine: 1. Duckdb 2. Da

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-22 Thread via GitHub
kevinjqliu commented on code in PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555#discussion_r1925983653 ## pyiceberg/partitioning.py: ## @@ -413,8 +413,10 @@ def partition_record_value(partition_field: PartitionField, value: Any, schema: the final partitio

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-22 Thread via GitHub
kevinjqliu commented on code in PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555#discussion_r1925983134 ## pyiceberg/partitioning.py: ## @@ -413,8 +413,10 @@ def partition_record_value(partition_field: PartitionField, value: Any, schema: the final partitio

Re: [PR] Implement update for `remove-snapshots` action [iceberg-python]

2025-01-22 Thread via GitHub
kevinjqliu commented on code in PR #1561: URL: https://github.com/apache/iceberg-python/pull/1561#discussion_r1925977119 ## tests/table/test_init.py: ## @@ -793,6 +794,34 @@ def test_update_metadata_set_snapshot_ref(table_v2: Table) -> None: ) +def test_update_remove_s

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1925975943 ## core/src/main/java/org/apache/iceberg/UnboundBaseFileScanTask.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1925975079 ## core/src/main/java/org/apache/iceberg/UnboundBaseFileScanTask.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1925973465 ## core/src/main/java/org/apache/iceberg/ContentFileParser.java: ## @@ -48,6 +48,97 @@ class ContentFileParser { private ContentFileParser() {} + public static

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1925971968 ## core/src/main/java/org/apache/iceberg/ContentFileParser.java: ## @@ -48,6 +48,97 @@ class ContentFileParser { private ContentFileParser() {} + public static

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-22 Thread via GitHub
rdblue commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1925958651 ## core/src/main/java/org/apache/iceberg/ContentFileParser.java: ## @@ -48,6 +48,97 @@ class ContentFileParser { private ContentFileParser() {} + public static

  1   2   3   >