Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-26 Thread via GitHub
nastra commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1860118523 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1860112766 ## crates/iceberg/src/transaction.rs: ## @@ -96,6 +109,60 @@ impl<'a> Transaction<'a> { Ok(self) } +fn generate_unique_snapshot_id(&self) -> i64 {

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1860102313 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -106,34 +106,38 @@ impl std::fmt::Debug for ManifestListWriter { impl ManifestListWriter { /// Construct a v

[PR] Write `null` when there is no parent-snapshot-id [iceberg-python]

2024-11-26 Thread via GitHub
Fokko opened a new pull request, #1383: URL: https://github.com/apache/iceberg-python/pull/1383 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1860074706 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -106,34 +106,38 @@ impl std::fmt::Debug for ManifestListWriter { impl ManifestListWriter { /// Construct a v

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1860068128 ## crates/e2e_test/Cargo.toml: ## @@ -0,0 +1,37 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NO

Re: [PR] feat: support position delete writer [iceberg-rust]

2024-11-26 Thread via GitHub
liurenjie1024 commented on code in PR #704: URL: https://github.com/apache/iceberg-rust/pull/704#discussion_r1860049342 ## crates/iceberg/src/writer/base_writer/position_delete_file_writer.rs: ## @@ -0,0 +1,320 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

Re: [PR] fix: expand arrow to iceberg schema to handle nanosecond timestamp [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on code in PR #710: URL: https://github.com/apache/iceberg-rust/pull/710#discussion_r1860062199 ## crates/iceberg/src/arrow/schema.rs: ## @@ -382,12 +382,15 @@ impl ArrowSchemaVisitor for ArrowSchemaConverter { DataType::Time64(unit) if unit == &Time

Re: [I] Enhance `catalog.create_table` API to enable creation of table with matching `field_ids` to provided Schema [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on issue #1284: URL: https://github.com/apache/iceberg-python/issues/1284#issuecomment-2503087394 Thanks for the context yesterday, I was still noodling on it overnight. If I understand correctly (and please also share the video of you and @adrianqin; I must have miss

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503066336 @pvary Yes, you are right. We changed our approach to handle null values instead of filtering them out. Before serialization, we added a flag for each field, where true indicates

Re: [PR] Adding ComputeTableStats Procedure to Spark 3.4 [iceberg]

2024-11-26 Thread via GitHub
nastra merged PR #11652: URL: https://github.com/apache/iceberg/pull/11652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add Python version 3.13 to test matrix. [iceberg-python]

2024-11-26 Thread via GitHub
JE-Chen commented on PR #1377: URL: https://github.com/apache/iceberg-python/pull/1377#issuecomment-2503058477 Same error. ``` error: the configured Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12) ``` -- This is an automated message from th

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-11-26 Thread via GitHub
ZENOTME commented on code in PR #703: URL: https://github.com/apache/iceberg-rust/pull/703#discussion_r1860040274 ## crates/iceberg/src/writer/base_writer/equality_delete_writer.rs: ## @@ -0,0 +1,502 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Build: Bump pytest-checkdocs from 2.10.1 to 2.13.0 [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on PR #682: URL: https://github.com/apache/iceberg-python/pull/682#issuecomment-2503053345 https://github.com/dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-11-26 Thread via GitHub
ZENOTME commented on code in PR #703: URL: https://github.com/apache/iceberg-rust/pull/703#discussion_r1860035033 ## crates/iceberg/src/writer/base_writer/equality_delete_writer.rs: ## @@ -0,0 +1,502 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] fix: Remove check of last_column_id [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko merged PR #717: URL: https://github.com/apache/iceberg-rust/pull/717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] fix: Remove check of last_column_id [iceberg-rust]

2024-11-26 Thread via GitHub
Fokko commented on PR #717: URL: https://github.com/apache/iceberg-rust/pull/717#issuecomment-2503040380 Thanks for cleaning this up @liurenjie1024 and thanks for the quick review @ZENOTME -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-11-26 Thread via GitHub
liurenjie1024 commented on code in PR #703: URL: https://github.com/apache/iceberg-rust/pull/703#discussion_r1860022862 ## crates/iceberg/src/arrow/record_batch_projector.rs: ## @@ -0,0 +1,288 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] fix: Remove check of last_column_id [iceberg-rust]

2024-11-26 Thread via GitHub
ZENOTME commented on PR #717: URL: https://github.com/apache/iceberg-rust/pull/717#issuecomment-2502997254 Thanks @liurenjie1024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: Mark `last-field-id` as deprecated [iceberg-rust]

2024-11-26 Thread via GitHub
liurenjie1024 commented on PR #715: URL: https://github.com/apache/iceberg-rust/pull/715#issuecomment-2502994521 > Seems this PR caused clippy fail > > ``` > error: use of deprecated field `catalog::TableUpdate::AddSchema::last_column_id`: This field is handled internally, and sho

Re: [PR] Use Snapshot's statistics file in SparkScan [iceberg]

2024-11-26 Thread via GitHub
jeesou commented on PR #11040: URL: https://github.com/apache/iceberg/pull/11040#issuecomment-2502877154 Hi @karuppayya , @amogh-jahagirdar kindly check the comment above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859956563 ## core/src/test/java/org/apache/iceberg/rest/responses/TestFetchScanTasksResponseParser.java: ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Adding ComputeTableStats Procedure to Spark 3.4 [iceberg]

2024-11-26 Thread via GitHub
jeesou commented on PR #11652: URL: https://github.com/apache/iceberg/pull/11652#issuecomment-2502859962 Hi @karuppayya @nastra can you please check this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859959392 ## core/src/test/java/org/apache/iceberg/rest/requests/TestPlanTableScanRequestParser.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859958795 ## core/src/test/java/org/apache/iceberg/rest/responses/TestPlanTableScanResponseParser.java: ## @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859957950 ## core/src/test/java/org/apache/iceberg/rest/responses/TestFetchScanTasksResponseParser.java: ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859939105 ## core/src/test/java/org/apache/iceberg/rest/responses/TestFetchScanTasksResponseParser.java: ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859932083 ## core/src/test/java/org/apache/iceberg/rest/responses/TestFetchScanTasksResponseParser.java: ## @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502780003 @ConeyLiu yes,I have added UT to cover the relevant changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859909899 ## core/src/main/java/org/apache/iceberg/rest/requests/PlanTableScanRequestParser.java: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1859891750 ## core/src/main/java/org/apache/iceberg/ContentFileParser.java: ## @@ -48,6 +48,97 @@ class ContentFileParser { private ContentFileParser() {} + pub

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
ConeyLiu commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502609743 Thanks for the contributions. Should we better handle the null value instead of skipping it? Could you also add the UT to cover these changes? cc @stevenzwu @pvary -- This is an

Re: [PR] Core, Spark3.5: Fix tests failure due to timeout [iceberg]

2024-11-26 Thread via GitHub
manuzhang commented on code in PR #11654: URL: https://github.com/apache/iceberg/pull/11654#discussion_r1859767751 ## core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java: ## @@ -446,7 +446,7 @@ public void testConcurrentFastAppends(@TempDir File dir) throws Exce

[PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile opened a new pull request, #11662: URL: https://github.com/apache/iceberg/pull/11662 When configuring the distribution mode to RANGE, if the partition field in the data contains null values, it will cause the SortKey serialization to fail, resulting in the job continuously res

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-11-26 Thread via GitHub
wypoon commented on PR #11661: URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2502493531 @nastra can you please review this? My impetus for the refactor is that I want to implement Parquet page-skipping, and with the refactor, a core change in logic can be applied in one

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-11-26 Thread via GitHub
wypoon commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1859480431 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-11-26 Thread via GitHub
wypoon commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1859480431 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Spark: Read DVs when reading from .position_deletes table [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11657: URL: https://github.com/apache/iceberg/pull/11657#discussion_r1859474395 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterable.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Spark: Read DVs when reading from .position_deletes table [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11657: URL: https://github.com/apache/iceberg/pull/11657#discussion_r1859473563 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterable.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [I] Support for loading different hive-metastore versions at Runtime [iceberg]

2024-11-26 Thread via GitHub
github-actions[bot] commented on issue #10401: URL: https://github.com/apache/iceberg/issues/10401#issuecomment-2502320652 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] SparkSessionCatalog with JDBC catalog: SHOW TABLES IN ... returns error but table exists in JDBC catalog [iceberg]

2024-11-26 Thread via GitHub
github-actions[bot] commented on issue #10003: URL: https://github.com/apache/iceberg/issues/10003#issuecomment-2502320515 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1859414335 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -143,6 +143,10 @@ protected Table newTable(TableOperations ops, String tableName) { r

Re: [PR] Spark : Derive Stats From Manifest on the Fly [iceberg]

2024-11-26 Thread via GitHub
guykhazma commented on PR #11615: URL: https://github.com/apache/iceberg/pull/11615#issuecomment-2502221330 @huaxingao yes, it is possible to reuse the logic from the aggregate pushdown by reusing the AggregateEvaluator instead of the current code to aggregate from the manifests. Something

Re: [PR] API, Core: Add formatVersion() to Table [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on PR #11587: URL: https://github.com/apache/iceberg/pull/11587#issuecomment-2502200150 I think I'd prefer #11620 for now, not a strong opinion, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1859414335 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -143,6 +143,10 @@ protected Table newTable(TableOperations ops, String tableName) { r

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1859417454 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1859414335 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -143,6 +143,10 @@ protected Table newTable(TableOperations ops, String tableName) { r

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859381182 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -833,7 +833,17 @@ public List apply(TableMetadata base, Snapshot snapshot) { fil

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859372481 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859371399 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859362907 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [PR] Spark: Write DVs for V3 MoR tables [iceberg]

2024-11-26 Thread via GitHub
aokolnychyi commented on code in PR #11561: URL: https://github.com/apache/iceberg/pull/11561#discussion_r1859282133 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java: ## @@ -168,7 +168,7 @@ protected Map rewritableDeletes() { for (S

[PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2024-11-26 Thread via GitHub
HonahX opened a new pull request, #11660: URL: https://github.com/apache/iceberg/pull/11660 This PR introduces a new section, "Snapshot Summary", in the table spec under [Snapshots](https://iceberg.apache.org/spec/#snapshots) to document optional fields in the snapshot summary, including me

[I] Document Snapshot Summary Optional Fields for Standardization [iceberg]

2024-11-26 Thread via GitHub
HonahX opened a new issue, #11659: URL: https://github.com/apache/iceberg/issues/11659 ### Proposed Change The proposal introduces a new section in the table spec under [Snapshots](https://iceberg.apache.org/spec/#snapshots) to document optional fields in the snapshot summary, includ

[PR] Bump pydantic from 2.10.1 to 2.10.2 [iceberg-python]

2024-11-26 Thread via GitHub
dependabot[bot] opened a new pull request, #1382: URL: https://github.com/apache/iceberg-python/pull/1382 Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.1 to 2.10.2. Release notes Sourced from https://github.com/pydantic/pydantic/releases";>pydantic's releases.

[PR] Bump pyarrow from 18.0.0 to 18.1.0 [iceberg-python]

2024-11-26 Thread via GitHub
dependabot[bot] opened a new pull request, #1381: URL: https://github.com/apache/iceberg-python/pull/1381 Bumps [pyarrow](https://github.com/apache/arrow) from 18.0.0 to 18.1.0. Release notes Sourced from https://github.com/apache/arrow/releases";>pyarrow's releases. Apache A

Re: [PR] Flink: Add RowConverter for Iceberg Source [iceberg]

2024-11-26 Thread via GitHub
stevenzwu commented on code in PR #11301: URL: https://github.com/apache/iceberg/pull/11301#discussion_r1859281089 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedRow.java: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859261982 ## core/src/test/java/org/apache/iceberg/TestBase.java: ## @@ -108,6 +108,14 @@ public class TestBase { .withPartitionPath("data_bucket=1") // easy way to s

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859257408 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859256224 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [PR] Core/RewriteFiles: Duplicate Data Bug - Fixed dropping delete files that are still required [iceberg]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #10962: URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859255414 ## core/src/test/java/org/apache/iceberg/TestRewriteFiles.java: ## @@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() { assertThat(list

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501946144 @tusharchou Thanks. I was noodling on this, and instead of having a `.to_arrow()`, we could also have a `.count()` that will return the number of rows that match the predicate.

Re: [PR] Extend bugfix report [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on PR #1380: URL: https://github.com/apache/iceberg-python/pull/1380#issuecomment-2501935264 👍 ![Screenshot 2024-11-26 at 1 06 16  PM](https://github.com/user-attachments/assets/d26931a3-563c-4c46-9ecd-18b1e5a1f731) -- This is an automated message from the Apac

Re: [PR] Extend bugfix report [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu merged PR #1380: URL: https://github.com/apache/iceberg-python/pull/1380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Update `upload-artifact` to use v4 [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu merged PR #1371: URL: https://github.com/apache/iceberg-python/pull/1371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] Renamed column returns null values from 'appended' Parquet file not originally created by Iceberg [iceberg]

2024-11-26 Thread via GitHub
pedorro closed issue #11650: Renamed column returns null values from 'appended' Parquet file not originally created by Iceberg URL: https://github.com/apache/iceberg/issues/11650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Renamed column returns null values from 'appended' Parquet file not originally created by Iceberg [iceberg]

2024-11-26 Thread via GitHub
pedorro commented on issue #11650: URL: https://github.com/apache/iceberg/issues/11650#issuecomment-2501897179 Closing this ticket as apparently not an issue, but rather ignorant-user-error. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] Extend bugfix report [iceberg-python]

2024-11-26 Thread via GitHub
Fokko opened a new pull request, #1380: URL: https://github.com/apache/iceberg-python/pull/1380 I like this part from the Iceberg pull-request 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add Python version 3.13 to test matrix. [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on PR #1377: URL: https://github.com/apache/iceberg-python/pull/1377#issuecomment-2501858847 @JE-Chen Thanks for checking. It looks like Py3O is already at 3.13: https://github.com/PyO3/pyo3/issues/4636 I was hoping that maybe bumping the Poetry build might fix it https://gi

Re: [PR] Bump Poetry to 1.8.4 [iceberg-python]

2024-11-26 Thread via GitHub
Fokko merged PR #1379: URL: https://github.com/apache/iceberg-python/pull/1379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Update `upload-artifact` to use v4 [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on PR #1371: URL: https://github.com/apache/iceberg-python/pull/1371#issuecomment-2501851589 @kevinjqliu Ah I see, I don't think that we have _fully_ reproducible builds. For example, if a timestamp in some wheel is different, then it would yield another hash. I think it is

Re: [I] Newly created table does not detect commit failures [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on issue #1366: URL: https://github.com/apache/iceberg-python/issues/1366#issuecomment-2501843565 Here's the logic: https://github.com/apache/iceberg-python/blob/1e9bdc21d95d2702f0777b21c61409262d1e3052/pyiceberg/table/update/snapshot.py#L271-L279 We always set

Re: [I] Renamed column returns null values from 'appended' Parquet file not originally created by Iceberg [iceberg]

2024-11-26 Thread via GitHub
pedorro commented on issue #11650: URL: https://github.com/apache/iceberg/issues/11650#issuecomment-2501840264 Well dad-gum! Setting that table property appears to have resolved it. I'm quite new to Iceberg in general, and so was unaware of this property. I really appreciate you taking a m

Re: [I] Newly created table does not detect commit failures [iceberg-python]

2024-11-26 Thread via GitHub
Fokko commented on issue #1366: URL: https://github.com/apache/iceberg-python/issues/1366#issuecomment-2501830533 As a side note, we don't have a proper retry strategy in Iceberg yet. To appends should not cause a conflict right away because they don't interfere. Instead what should happen

Re: [PR] Replace use of deprecated methods [iceberg]

2024-11-26 Thread via GitHub
jbonofre commented on PR #11658: URL: https://github.com/apache/iceberg/pull/11658#issuecomment-2501796971 @pvary I started some cleanups in Flink tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Replace use of deprecated methods [iceberg]

2024-11-26 Thread via GitHub
jbonofre opened a new pull request, #11658: URL: https://github.com/apache/iceberg/pull/11658 This PR replaces use of deprecated method in Flink (1.20) tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-11-26 Thread via GitHub
tusharchou commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2501709107 # RCA Hi @Visorgood, The behavior expected here is a simple partition push-down implementation in duck db which this pr solves for- https://github.com/duckdb/duckd

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-26 Thread via GitHub
kevinjqliu commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1859082300 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-26 Thread via GitHub
kevinjqliu commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1859082627 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [I] Create table format version constants [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on issue #851: URL: https://github.com/apache/iceberg-python/issues/851#issuecomment-2501696638 @willcollins10 assigned to you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Update `upload-artifact` to use v4 [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on PR #1371: URL: https://github.com/apache/iceberg-python/pull/1371#issuecomment-2501640410 I was comparing between running the action on [main](https://github.com/kevinjqliu/iceberg-python/actions/runs/12036359688) versus [this branch](https://github.com/kevinjqliu/i

Re: [I] Newly created table does not detect commit failures [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on issue #1366: URL: https://github.com/apache/iceberg-python/issues/1366#issuecomment-2501614001 @HaraldVanWoerkom if you have a reproducible process, can you check if this occurs with another catalog? For example, this one that we use for integration tests https:

Re: [PR] API, Core: Add default value APIs and Avro implementation [iceberg]

2024-11-26 Thread via GitHub
emkornfield commented on PR #9502: URL: https://github.com/apache/iceberg/pull/9502#issuecomment-2501606427 Just wanted to ask does this only apply to Avro or does adding it to AvroGenericReader also cover other file types ORC and Parquet? -- This is an automated message from the Apache G

Re: [PR] Spec: add variant type [iceberg]

2024-11-26 Thread via GitHub
emkornfield commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1858978402 ## format/spec.md: ## @@ -182,6 +182,21 @@ A **`list`** is a collection of values with some element type. The element field A **`map`** is a collection of key

Re: [PR] Spec: add variant type [iceberg]

2024-11-26 Thread via GitHub
emkornfield commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1858976684 ## format/spec.md: ## @@ -178,6 +178,21 @@ A **`list`** is a collection of values with some element type. The element field A **`map`** is a collection of key

Re: [I] table_exists error raisng 400 error code [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on issue #1378: URL: https://github.com/apache/iceberg-python/issues/1378#issuecomment-2501517247 hi @dongsupkim-onepredict thanks for raising this issue. can you also include the python code that cause this error? -- This is an automated message from the Apache Git

Re: [I] SparkExecutorCache causes slowness of RewriteDataFilesSparkAction [iceberg]

2024-11-26 Thread via GitHub
singhpk234 commented on issue #11648: URL: https://github.com/apache/iceberg/issues/11648#issuecomment-2501506598 @davseitsev can you please also try setting to true ? ``` spark.sql.iceberg.executor-cache.locality.enabled ``` please ref : https://github.com/apache/iceberg/pu

Re: [PR] [release] Pyiceberg 0.8.1 [iceberg-python]

2024-11-26 Thread via GitHub
kevinjqliu commented on PR #1369: URL: https://github.com/apache/iceberg-python/pull/1369#issuecomment-2501494494 thanks @fokko cherry-picked #1373 as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Renamed column returns null values from 'appended' Parquet file not originally created by Iceberg [iceberg]

2024-11-26 Thread via GitHub
singhpk234 commented on issue #11650: URL: https://github.com/apache/iceberg/issues/11650#issuecomment-2501447829 This is an interesting observation can you please help with the following : [1] is this prop being set : schema.name-mapping.default in tbl properties if not can you set it ?

Re: [PR] Core, Spark3.5: Fix tests failure due to timeout [iceberg]

2024-11-26 Thread via GitHub
nastra commented on code in PR #11654: URL: https://github.com/apache/iceberg/pull/11654#discussion_r1858896301 ## core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java: ## @@ -446,7 +446,7 @@ public void testConcurrentFastAppends(@TempDir File dir) throws Excepti

Re: [I] How an application can communicate with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino [iceberg]

2024-11-26 Thread via GitHub
ajantha-bhat commented on issue #11649: URL: https://github.com/apache/iceberg/issues/11649#issuecomment-2501250605 Duplicate of https://github.com/apache/iceberg/issues/11655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] How an application can communicate with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino [iceberg]

2024-11-26 Thread via GitHub
ajantha-bhat closed issue #11649: How an application can communicate with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino URL: https://github.com/apache/iceberg/issues/11649 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Interaction with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino [iceberg]

2024-11-26 Thread via GitHub
ajantha-bhat commented on issue #11655: URL: https://github.com/apache/iceberg/issues/11655#issuecomment-2501247204 Since you are using spark engine, spark session can be configured to use Iceberg catalogs. Just make sure catalog type is `rest`. https://iceberg.apache.org/docs/nightl

Re: [I] Interaction with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino [iceberg]

2024-11-26 Thread via GitHub
ajantha-bhat closed issue #11655: Interaction with the Iceberg REST catalog like Dremio Arctic (Nessie), Snowflake’s Polaris , Gravitino URL: https://github.com/apache/iceberg/issues/11655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Flink: Test both "new" Flink Avro planned reader and "deprecated" Avro reader [iceberg]

2024-11-26 Thread via GitHub
pvary commented on PR #11430: URL: https://github.com/apache/iceberg/pull/11430#issuecomment-2501235249 Thanks @jbonofre for the PR and @nastra for the review. @jbonofre: Please port the changes to Flink 1.19/1.18 too. Thanks, Peter -- This is an automated message from the Apa

Re: [PR] Flink: Test both "new" Flink Avro planned reader and "deprecated" Avro reader [iceberg]

2024-11-26 Thread via GitHub
pvary merged PR #11430: URL: https://github.com/apache/iceberg/pull/11430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Parquet: Bump to Apache Parquet 1.15.0 [iceberg]

2024-11-26 Thread via GitHub
Fokko commented on PR #11656: URL: https://github.com/apache/iceberg/pull/11656#issuecomment-2501216005 Seems that JDK22 is not yet supported :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Parquet: Bump to Apache Parquet 1.15.0 [iceberg]

2024-11-26 Thread via GitHub
Fokko commented on PR #11656: URL: https://github.com/apache/iceberg/pull/11656#issuecomment-2501152732 Yes, looks like we need to drop JDK 11 first before we can upgrade to the newer baseline version: https://github.com/palantir/gradle-baseline/releases/tag/6.0.0 -- This is an automated

Re: [PR] Parquet: Bump to Apache Parquet 1.15.0 [iceberg]

2024-11-26 Thread via GitHub
Fokko commented on PR #11656: URL: https://github.com/apache/iceberg/pull/11656#issuecomment-2501142678 @jbonofre It looks like the palantir baseline plugin doesn't know how to handle Java 22 multi-source packages. Let me take a look first. -- This is an automated message from the Apache

Re: [PR] Parquet: Bump to Apache Parquet 1.15.0 [iceberg]

2024-11-26 Thread via GitHub
jbonofre commented on PR #11656: URL: https://github.com/apache/iceberg/pull/11656#issuecomment-2501078210 @Fokko it seems that jackson is shaded two times with Parquet 1.15. Do you want me to investigate ? -- This is an automated message from the Apache Git Service. To respond to the mes

  1   2   >