Re: [PR] fix(infra/testing): Enable iceberg spark interops integration tests on ci pipeline [iceberg-go]

2025-07-22 Thread via GitHub
lliangyu-lin commented on code in PR #493: URL: https://github.com/apache/iceberg-go/pull/493#discussion_r2223979653 ## table/transaction_test.go: ## @@ -114,21 +137,143 @@ func (s *SparkIntegrationTestSuite) TestAddFile() { tbl, err = tx.Commit(s.ctx) s.Require(

Re: [PR] fix(infra/testing): Enable iceberg spark interops integration tests on ci pipeline [iceberg-go]

2025-07-22 Thread via GitHub
lliangyu-lin commented on code in PR #493: URL: https://github.com/apache/iceberg-go/pull/493#discussion_r2223979653 ## table/transaction_test.go: ## @@ -114,21 +137,143 @@ func (s *SparkIntegrationTestSuite) TestAddFile() { tbl, err = tx.Commit(s.ctx) s.Require(

Re: [PR] fix(infra/testing): Enable iceberg spark interops integration tests on ci pipeline [iceberg-go]

2025-07-22 Thread via GitHub
lliangyu-lin commented on PR #493: URL: https://github.com/apache/iceberg-go/pull/493#issuecomment-3105060238 I'm thinking we should just use the testcontainers api to manage the container lifecycles instead of compose in [CI](https://github.com/apache/iceberg-go/blob/main/.github/workflows

Re: [PR] feat(datafusion): Support insert_into in IcebergTableProvider [iceberg-rust]

2025-07-22 Thread via GitHub
CTTY commented on code in PR #1511: URL: https://github.com/apache/iceberg-rust/pull/1511#discussion_r2223962611 ## crates/iceberg/src/arrow/value.rs: ## @@ -440,10 +440,12 @@ impl PartnerAccessor for ArrowArrayAccessor { Ok(schema_partner) } +// todo generat

Re: [PR] feat(datafusion): Support insert_into in IcebergTableProvider [iceberg-rust]

2025-07-22 Thread via GitHub
CTTY commented on code in PR #1511: URL: https://github.com/apache/iceberg-rust/pull/1511#discussion_r2223954393 ## crates/iceberg/src/spec/manifest/mod.rs: ## @@ -1056,4 +1089,120 @@ mod tests { assert!(!partitions[2].clone().contains_null); assert_eq!(partiti

Re: [I] RewriteTablePaths throws FileAlreadyExistsException [iceberg]

2025-07-22 Thread via GitHub
szehon-ho commented on issue #13630: URL: https://github.com/apache/iceberg/issues/13630#issuecomment-310413 Sounds good to me, look forward to a patch. It sounds like RewriteTablepaths may not doesnt handle write.object-storage.enabled. -- This is an automated message from the Apach

Re: [PR] feat: support incremental scan between 2 snapshots [iceberg-rust]

2025-07-22 Thread via GitHub
CTTY commented on code in PR #1470: URL: https://github.com/apache/iceberg-rust/pull/1470#discussion_r2223906620 ## crates/iceberg/src/scan/context.rs: ## @@ -262,6 +346,61 @@ impl PlanContext { field_ids: self.field_ids.clone(), expression_evaluator_ca

Re: [PR] Rest: Implement register table [iceberg-rust]

2025-07-22 Thread via GitHub
CTTY commented on code in PR #1521: URL: https://github.com/apache/iceberg-rust/pull/1521#discussion_r2223848333 ## crates/catalog/rest/src/catalog.rs: ## @@ -745,10 +749,86 @@ impl Catalog for RestCatalog { _table_ident: &TableIdent, _metadata_location: String

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar merged PR #13555: URL: https://github.com/apache/iceberg/pull/13555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on PR #13555: URL: https://github.com/apache/iceberg/pull/13555#issuecomment-3104878898 Thanks for the reviews @jeesou @stevenzwu @rdblue , I will go ahead and merge and prepare the backport PRs. -- This is an automated message from the Apache Git Service. To r

Re: [PR] refactor: Add `read_from()` and `write_to()` to `TableMetadata` [iceberg-rust]

2025-07-22 Thread via GitHub
CTTY commented on code in PR #1523: URL: https://github.com/apache/iceberg-rust/pull/1523#discussion_r2223839998 ## crates/catalog/glue/src/catalog.rs: ## @@ -395,10 +395,7 @@ impl Catalog for GlueCatalog { .metadata; let metadata_location = create_metadata

Re: [PR] Add support for DELTA_BINARY_PACKED Parquet encoding [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13391: URL: https://github.com/apache/iceberg/pull/13391#discussion_r2223826762 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedDeltaEncodedValuesReader.java: ## @@ -0,0 +1,276 @@ +/* + * Licensed to the Apac

Re: [PR] feat(table/updates): add stubs for the remove schemas & remove partition specs table updates [iceberg-go]

2025-07-22 Thread via GitHub
zeroshade commented on code in PR #491: URL: https://github.com/apache/iceberg-go/pull/491#discussion_r2223820008 ## table/updates.go: ## @@ -462,3 +468,35 @@ func NewRemoveSnapshotRefUpdate(ref string) *removeSnapshotRefUpdate { func (u *removeSnapshotRefUpdate) Apply(builder

[I] Feature request: add nightly for docs [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu opened a new issue, #2242: URL: https://github.com/apache/iceberg-python/issues/2242 Similar to https://iceberg.apache.org/docs/nightly/ Otherwise our doc site is only updated on every release -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Scan with filtering on projected field rerurn empty table [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2028: URL: https://github.com/apache/iceberg-python/issues/2028#issuecomment-3104793094 great catch, we do project the value after reading from the data files. maybe the order of operations here is wrong. i'll take a look at the PR -- This is an automa

[I] RewriteTablePaths throws FileAlreadyExistsException [iceberg]

2025-07-22 Thread via GitHub
hpinca98 opened a new issue, #13630: URL: https://github.com/apache/iceberg/issues/13630 ### Apache Iceberg version 1.8.0 ### Query engine Spark ### Please describe the bug 🐞 Currently RewriteTablePaths procedures will only copy file names inside the stagin

Re: [PR] Sanitize field names to ensure valid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on PR #2136: URL: https://github.com/apache/iceberg-python/pull/2136#issuecomment-3104683568 removing the milestone from PR and tagging the underlying issue #2123 instead -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104664890 Since we can validate that the field names are sanitized properly with #2241, im inclined to move this out of the 0.10 milestone -- This is an automated message from th

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104660911 To answer the original issue, i think instead of rejecting invalid avro identifiers in field names, we should (and already do) sanitize the name, just like the java refere

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104655823 #2241 shows that we're already using the right sanitization logic for column names. And i also tested locally with `😎` i wonder if its with `ICEBERG_FIELD_NAME_PROP`

[PR] add test for avro sanitization [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu opened a new pull request, #2241: URL: https://github.com/apache/iceberg-python/pull/2241 # Rationale for this change Related to #2123 Add tests that mirrors the avro column name sanitization test in the java implementation https://github.com/apac

Re: [I] [Feature Request] Add Support for Multipart Namespace [iceberg-python]

2025-07-22 Thread via GitHub
dingo4dev commented on issue #2240: URL: https://github.com/apache/iceberg-python/issues/2240#issuecomment-3104618108 @kevinjqliu At this moments, I found hive & rest catalog are support multipart namespace, and I may need some time to test remains catalogs. -- This is an automated m

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
nvartolomei commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104601357 I’m still confused about it and would love to see a test (e2e) showing a bug. The Avro projection thingy seems to be related to reading avro files and not parq

Re: [I] [Feature Request] Add Support for Multipart Namespace [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2240: URL: https://github.com/apache/iceberg-python/issues/2240#issuecomment-3104595584 Thanks for raising this! Do you know which catalog does not currently support multipart namespace? -- This is an automated message from the Apache Git Service. To respo

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104568464 thanks @kris-gaudel let me know if theres anything i can help with. https://github.com/apache/iceberg-python/pull/83 was the previous fix for the avro column saniti

[I] [Feature Request] Add Support for Multipart Namespace [iceberg-python]

2025-07-22 Thread via GitHub
dingo4dev opened a new issue, #2240: URL: https://github.com/apache/iceberg-python/issues/2240 ### Feature Request / Improvement The Apache Iceberg specification provides robust support for multipart namespaces. To align with the implementation in Java Iceberg - [createNamespace](htt

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223652755 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -142,6 +143,11 @@ public SparkTable(Table icebergTable, String branc

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223645684 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1968,6 +2007,68 @@ public void testZOrderRewriteWi

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kris-gaudel commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104540558 I'll close my other PR and raise another one implementing this approach -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3104520590 Thanks for the note @nvartolomei, thats a great point. i found the answer from the java reference implementation https://github.com/apache/iceberg/blob/85cc58aa

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223632723 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -191,10 +198,18 @@ private Schema snapshotSchema() { if (iceber

Re: [PR] revert avro timestamp-millis mapping [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on PR #2223: URL: https://github.com/apache/iceberg-python/pull/2223#issuecomment-3104472070 thanks for the review @amogh-jahagirdar @geruh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] revert avro timestamp-millis mapping [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu merged PR #2223: URL: https://github.com/apache/iceberg-python/pull/2223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Trino: Add Trino Docker Compose for integration testing [iceberg-python]

2025-07-22 Thread via GitHub
Copilot commented on code in PR #2220: URL: https://github.com/apache/iceberg-python/pull/2220#discussion_r2223578764 ## tests/integration/test_rest_catalog.py: ## @@ -61,3 +62,22 @@ def test_create_namespace_if_already_existing(catalog: RestCatalog) -> None: catalog.creat

Re: [PR] Fix projected fields predicate evaluation [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on PR #2029: URL: https://github.com/apache/iceberg-python/pull/2029#issuecomment-3104265082 removed milestone tag since the referenced issue (https://github.com/apache/iceberg-python/issues/2028) is already tagged -- This is an automated message from the Apache Git

Re: [PR] Avoid local Mac issues for test_bodo_nan [iceberg-python]

2025-07-22 Thread via GitHub
ehsantn commented on PR #2237: URL: https://github.com/apache/iceberg-python/pull/2237#issuecomment-3104228109 No problem. Thanks for helping get it merged quickly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: add schema conversion from avro `timestamp-millis` and `uuid` [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on code in PR #2173: URL: https://github.com/apache/iceberg-python/pull/2173#discussion_r2223448960 ## pyiceberg/utils/schema_conversion.py: ## @@ -69,8 +69,10 @@ LOGICAL_FIELD_TYPE_MAPPING: Dict[Tuple[str, str], PrimitiveType] = { ("date", "int"): Dat

Re: [PR] Add Column Name to the Error Message in StatsAggregator [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu merged PR #2190: URL: https://github.com/apache/iceberg-python/pull/2190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] (doc): Change error message to reference column that has mismatch [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu closed issue #2017: (doc): Change error message to reference column that has mismatch URL: https://github.com/apache/iceberg-python/issues/2017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[I] [feature request] provide data type conversion between avro and iceberg data types [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu opened a new issue, #2239: URL: https://github.com/apache/iceberg-python/issues/2239 ### Feature Request / Improvement see https://github.com/apache/iceberg-python/pull/2173#discussion_r2209233326 for more context The thread address `timestamp-millis` conversion sp

Re: [PR] feat: add schema conversion from avro `timestamp-millis` and `uuid` [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on code in PR #2173: URL: https://github.com/apache/iceberg-python/pull/2173#discussion_r2223441722 ## pyiceberg/utils/schema_conversion.py: ## @@ -69,8 +69,10 @@ LOGICAL_FIELD_TYPE_MAPPING: Dict[Tuple[str, str], PrimitiveType] = { ("date", "int"): Dat

Re: [PR] revert avro timestamp-millis mapping [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on PR #2223: URL: https://github.com/apache/iceberg-python/pull/2223#issuecomment-3104196073 moving this forward to unblock 0.10 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[I] docs: link "Iceberg community events" to pyiceberg's community page [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu opened a new issue, #2238: URL: https://github.com/apache/iceberg-python/issues/2238 ### Feature Request / Improvement would be great to include https://iceberg.apache.org/community/#iceberg-community-events in https://py.iceberg.apache.org/community/#iceberg-community-ev

Re: [I] [discussion] dealing with multiple pyarrow versions [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2209: URL: https://github.com/apache/iceberg-python/issues/2209#issuecomment-3104099526 Thanks for the examples @dingo4dev. i agree we're balancing features availability and project maintainability > Another thing would be on agreeing how many pyarrow v

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223364763 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1829,6 +1830,32 @@ public void testRemoveSnapshotsNoOp() throws Exception { .isSameA

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223364190 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1829,6 +1830,32 @@ public void testRemoveSnapshotsNoOp() throws Exception { .isSameA

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223360345 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -236,7 +236,7 @@ public void testExpireOlderThanWithRollback() { Set deletedFiles = Set

Re: [I] EPIC: Implement register_table for catalogs. [iceberg-rust]

2025-07-22 Thread via GitHub
gabeiglio commented on issue #1508: URL: https://github.com/apache/iceberg-rust/issues/1508#issuecomment-3104058013 The [PR](https://github.com/apache/iceberg-rust/pull/1521) is up! @CTTY @liurenjie1024 lmk what you think -- This is an automated message from the Apache Git Service. To res

Re: [PR] Avoid local Mac issues for test_bodo_nan [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu merged PR #2237: URL: https://github.com/apache/iceberg-python/pull/2237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Avoid local Mac issues for test_bodo_nan [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on PR #2237: URL: https://github.com/apache/iceberg-python/pull/2237#issuecomment-3104040283 thank you for the quick fix @ehsantn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu closed issue #2225: bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally URL: https://github.com/apache/iceberg-python/issues/2225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Use short string in Variant when possible [iceberg]

2025-07-22 Thread via GitHub
manirajv06 closed issue #13282: Use short string in Variant when possible URL: https://github.com/apache/iceberg/issues/13282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Use short string in Variant when possible [iceberg]

2025-07-22 Thread via GitHub
manirajv06 commented on issue #13282: URL: https://github.com/apache/iceberg/issues/13282#issuecomment-3103946684 PR Merged. Hence, closing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Avoid local Mac issues for test_bodo_nan [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on code in PR #2237: URL: https://github.com/apache/iceberg-python/pull/2237#discussion_r2223284864 ## tests/integration/test_reads.py: ## @@ -342,7 +342,11 @@ def test_daft_nan_rewritten(catalog: Catalog) -> None: @pytest.mark.integration @pytest.mark.fil

Re: [I] Improve test coverage for 1-5 byte header string primitive in Variant [iceberg]

2025-07-22 Thread via GitHub
manirajv06 commented on issue #13376: URL: https://github.com/apache/iceberg/issues/13376#issuecomment-3103943099 @RussellSpitzer We discussed this follow up fix in another pr review discussions. Please review. Thanks. -- This is an automated message from the Apache Git Service. To respon

[PR] Improve test coverage for 1-5 byte header string primitive in Variant [iceberg]

2025-07-22 Thread via GitHub
manirajv06 opened a new pull request, #13629: URL: https://github.com/apache/iceberg/pull/13629 Improved test coverage to test 1-5 byte header string primitive in Variant Fixes https://github.com/apache/iceberg/issues/13376 -- This is an automated message from the Apache Git Service

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223274853 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -50,11 +49,6 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Ov

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223256780 ## core/src/main/java/org/apache/iceberg/RemoveSnapshots.java: ## @@ -375,7 +375,7 @@ private void cleanExpiredSnapshots() { } if (incrementalCle

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223240990 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -236,7 +236,7 @@ public void testExpireOlderThanWithRollback() { Set deletedF

[PR] Avoid local Mac issues for test_bodo_nan [iceberg-python]

2025-07-22 Thread via GitHub
ehsantn opened a new pull request, #2237: URL: https://github.com/apache/iceberg-python/pull/2237 Closes #2225. # Rationale for this change Some Mac laptops have MPI initialization issues that this fixes. # Are these changes tested? Tested on a Mac laptop t

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
ehsantn commented on issue #2225: URL: https://github.com/apache/iceberg-python/issues/2225#issuecomment-3103896507 Opened a PR: https://github.com/apache/iceberg-python/pull/2237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223247804 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -81,12 +75,11 @@ public void cleanFiles(TableMetadata beforeExpiration, TableMet

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223170737 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1968,6 +2007,68 @@ public void testZOrderRewriteWithSpecific

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223162845 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -944,16 +983,15 @@ public void testBinPackCombineMixedFiles()

Re: [I] ORC file format support [iceberg-python]

2025-07-22 Thread via GitHub
mccormickt12 commented on issue #20: URL: https://github.com/apache/iceberg-python/issues/20#issuecomment-3103758792 Initial progress https://github.com/apache/iceberg-python/pull/2236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223158213 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -477,71 +481,106 @@ public void testBinPackWithDeletes() thro

Re: [PR] feat: avro support applying field-ids based on name mapping [iceberg-cpp]

2025-07-22 Thread via GitHub
wgtmac commented on code in PR #127: URL: https://github.com/apache/iceberg-cpp/pull/127#discussion_r2223156940 ## test/avro_schema_test.cc: ## @@ -1057,4 +1059,366 @@ TEST(AvroSchemaProjectionTest, ProjectDecimalIncompatible) { ASSERT_THAT(projection_result, HasErrorMessage

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223125733 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -191,10 +198,18 @@ private Schema snapshotSchema() { if (icebergTable ins

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223124175 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -142,6 +143,11 @@ public SparkTable(Table icebergTable, String branch, boolea

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223120485 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -988,6 +999,8 @@ private Table loadFromPathIdentifier(PathIdentifier ident) {

Re: [PR] Spark 4.0: Preserve row lineage information on compaction [iceberg]

2025-07-22 Thread via GitHub
rdblue commented on code in PR #13555: URL: https://github.com/apache/iceberg/pull/13555#discussion_r2223118229 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -894,6 +895,10 @@ private Table load(Identifier ident) { } } +

Re: [PR] [Append Scan] Extract manifest group planning into separate class [iceberg-python]

2025-07-22 Thread via GitHub
smaheshwar-pltr commented on code in PR #2232: URL: https://github.com/apache/iceberg-python/pull/2232#discussion_r2223095477 ## pyiceberg/table/__init__.py: ## @@ -1819,76 +1819,19 @@ def _match_deletes_to_data_file(data_entry: ManifestEntry, positional_delete_ent class D

Re: [PR] [Append Scan] Extract manifest group planning into separate class [iceberg-python]

2025-07-22 Thread via GitHub
smaheshwar-pltr commented on code in PR #2232: URL: https://github.com/apache/iceberg-python/pull/2232#discussion_r2223090366 ## pyiceberg/table/__init__.py: ## @@ -2075,6 +1957,160 @@ def count(self) -> int: return res +class ManifestGroupPlanner: Review Comment:

Re: [PR] [Append Scan] Extract manifest group planning into separate class [iceberg-python]

2025-07-22 Thread via GitHub
smaheshwar-pltr commented on code in PR #2232: URL: https://github.com/apache/iceberg-python/pull/2232#discussion_r499670 ## pyiceberg/table/__init__.py: ## @@ -2075,6 +1957,160 @@ def count(self) -> int: return res +class ManifestGroupPlanner: Review Comment:

Re: [PR] [Append Scan] Extract manifest group planning into separate class [iceberg-python]

2025-07-22 Thread via GitHub
smaheshwar-pltr commented on code in PR #2232: URL: https://github.com/apache/iceberg-python/pull/2232#discussion_r490944 ## pyiceberg/table/__init__.py: ## @@ -1819,76 +1819,19 @@ def _match_deletes_to_data_file(data_entry: ManifestEntry, positional_delete_ent class D

[PR] Basic read/write support for ORC [iceberg-python]

2025-07-22 Thread via GitHub
mccormickt12 opened a new pull request, #2236: URL: https://github.com/apache/iceberg-python/pull/2236 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the Apache

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
ehsantn commented on issue #2225: URL: https://github.com/apache/iceberg-python/issues/2225#issuecomment-3103548838 Glad it worked! I'll open a PR ASAP. Will also work on a general fix in the Bodo package. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Core: Implement Map comparator [iceberg]

2025-07-22 Thread via GitHub
nandorKollar commented on code in PR #13626: URL: https://github.com/apache/iceberg/pull/13626#discussion_r989822 ## api/src/main/java/org/apache/iceberg/types/Comparators.java: ## @@ -149,6 +155,51 @@ public int compare(List o1, List o2) { } } + private static cl

Re: [PR] Core: Implement Map comparator [iceberg]

2025-07-22 Thread via GitHub
nandorKollar commented on code in PR #13626: URL: https://github.com/apache/iceberg/pull/13626#discussion_r989822 ## api/src/main/java/org/apache/iceberg/types/Comparators.java: ## @@ -149,6 +155,51 @@ public int compare(List o1, List o2) { } } + private static cl

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2225: URL: https://github.com/apache/iceberg-python/issues/2225#issuecomment-3103457799 thank you! works like a charm. can we add those env vars to the test context? https://github.com/apache/iceberg-python/blob/b15937a14d7f0b8bad174fa15bc2eea4b3d8dbc7/tests/

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #2225: URL: https://github.com/apache/iceberg-python/issues/2225#issuecomment-3103400451 > What Linux distro/container do you use with OrbStack? im just running `make test-integration-setup` > Does setting BODO_DATAFRAME_LIBRARY_RUN_PARALLEL=0 elimina

Re: [I] bug? `test_bodo_nan` in `tests/integration/test_reads.py` hangs locally [iceberg-python]

2025-07-22 Thread via GitHub
ehsantn commented on issue #2225: URL: https://github.com/apache/iceberg-python/issues/2225#issuecomment-3103402722 I finally found a Mac laptop that reproduces! Here is a workaround to unblock while working on a fix: `BODO_DATAFRAME_LIBRARY_RUN_PARALLEL=0 FI_PROVIDER=tcp poetry run

Re: [I] Investigate trusted publishing in crates.io [iceberg-rust]

2025-07-22 Thread via GitHub
kevinjqliu commented on issue #1539: URL: https://github.com/apache/iceberg-rust/issues/1539#issuecomment-3103239628 i was just about to create this issue, good thing i double checked :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] [kafka-connect] org.apache.thrift.TApplicationException: Invalid method name: 'get_table' in HMS v4 [iceberg]

2025-07-22 Thread via GitHub
igorvoltaic commented on issue #13628: URL: https://github.com/apache/iceberg/issues/13628#issuecomment-3103222522 I believe fix might be applied over here https://github.com/apache/iceberg/blob/85cc58aa8acda26809b3c67bbc3452689490/hive-metastore/src/main/java/org/apache/iceberg/hive/Hiv

Re: [PR] infra: use `toml-cli` to manually set version in github action [iceberg-rust]

2025-07-22 Thread via GitHub
kevinjqliu merged PR #1537: URL: https://github.com/apache/iceberg-rust/pull/1537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r832174 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Ove

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r832174 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Ove

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-07-22 Thread via GitHub
VerneZhong commented on PR #13194: URL: https://github.com/apache/iceberg/pull/13194#issuecomment-3103210696 Hi @nastra , I’ve addressed the review comments and updated the PR accordingly. I also deployed the site locally as described in the [README](https://github.com/apache/iceberg/blob/a

[I] [kafka-connect] org.apache.thrift.TApplicationException: Invalid method name: 'get_table' [iceberg]

2025-07-22 Thread via GitHub
igorvoltaic opened a new issue, #13628: URL: https://github.com/apache/iceberg/issues/13628 ### Apache Iceberg version 1.9.2 (latest release) ### Query engine Kafka Connect ### Please describe the bug 🐞 We're trying to use Kafka-connect IcebergSink with Hive

Re: [PR] feat: avro support applying field-ids based on name mapping [iceberg-cpp]

2025-07-22 Thread via GitHub
MisterRaindrop commented on code in PR #127: URL: https://github.com/apache/iceberg-cpp/pull/127#discussion_r798801 ## test/avro_schema_test.cc: ## @@ -1057,4 +1059,366 @@ TEST(AvroSchemaProjectionTest, ProjectDecimalIncompatible) { ASSERT_THAT(projection_result, HasErro

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
sqd commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r741581 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Override @Sup

Re: [PR] Core: Fix incorrect selection of incremental cleanup in expire snapshots [iceberg]

2025-07-22 Thread via GitHub
sqd commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r702883 ## core/src/main/java/org/apache/iceberg/RemoveSnapshots.java: ## @@ -375,7 +375,7 @@ private void cleanExpiredSnapshots() { } if (incrementalCleanup == null)

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-07-22 Thread via GitHub
VerneZhong commented on code in PR #13194: URL: https://github.com/apache/iceberg/pull/13194#discussion_r695632 ## docs/docs/spark-system.md: ## @@ -0,0 +1,263 @@ +--- +title: "System" +--- + + + +## Spark SQL Functions for Iceberg Transforms + +Iceberg provides Spark SQL fu

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-07-22 Thread via GitHub
nastra commented on PR #13194: URL: https://github.com/apache/iceberg/pull/13194#issuecomment-3102956861 can you please deploy the site locally as described in https://github.com/apache/iceberg/blob/a8d111eaa7bfb3f98a236578cee3b2ff14b7b338/site/README.md and share a screenshot? This is to m

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-07-22 Thread via GitHub
nastra commented on code in PR #13194: URL: https://github.com/apache/iceberg/pull/13194#discussion_r685213 ## docs/docs/spark-system.md: ## @@ -0,0 +1,263 @@ +--- +title: "System" Review Comment: I would rename the title to `Functions` or `SQL Functions` because `System

Re: [PR] Docs: Add docs for Spark SQL Iceberg transform functions (#13156) [iceberg]

2025-07-22 Thread via GitHub
nastra commented on code in PR #13194: URL: https://github.com/apache/iceberg/pull/13194#discussion_r682782 ## docs/docs/spark-system.md: ## @@ -0,0 +1,263 @@ +--- +title: "System" +--- + + + +## Spark SQL Functions for Iceberg Transforms + +Iceberg provides Spark SQL functi

Re: [I] Add docs of Spark SQL functions for Iceberg transforms [iceberg]

2025-07-22 Thread via GitHub
VerneZhong commented on issue #13156: URL: https://github.com/apache/iceberg/issues/13156#issuecomment-3102931291 Hi @manuzhang , I’ve rebased the branch onto the latest main and resolved the conflicts. Could you please take another look when you have time? Thanks! -- This is an auto

Re: [PR] Docs: Add `nullable: true` to LoadTableResult metadata-location [iceberg]

2025-07-22 Thread via GitHub
nastra commented on PR #13624: URL: https://github.com/apache/iceberg/pull/13624#issuecomment-3102925959 > There is a difference between NULL and missing. > Fields that aren't required are allowed to be missing, they're not allowed to be null Yes I'm aware, I realized that I formul

Re: [PR] Core, Data: File Format API interfaces [iceberg]

2025-07-22 Thread via GitHub
pvary commented on code in PR #12774: URL: https://github.com/apache/iceberg/pull/12774#discussion_r665248 ## core/src/main/java/org/apache/iceberg/io/ObjectModel.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

Re: [PR] [docs] Flink can now add/drop/modify columns [iceberg]

2025-07-22 Thread via GitHub
rmoff commented on code in PR #13617: URL: https://github.com/apache/iceberg/pull/13617#discussion_r630232 ## docs/docs/flink.md: ## @@ -401,4 +401,3 @@ There are some features that are do not yet supported in the current Flink Icebe * Don't support creating iceberg table

  1   2   >