Re: [PR] Kafka Connect: Record converters [iceberg]

2024-12-02 Thread via GitHub
anmol commented on code in PR #9641: URL: https://github.com/apache/iceberg/pull/9641#discussion_r1867182926 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/IcebergWriter.java: ## @@ -52,20 +51,22 @@ public IcebergWriter(Table table, String tableName

Re: [I] Iceberg Kafka-Connect runtime not published as part of 1.7.0 release? [iceberg]

2024-12-02 Thread via GitHub
Fokko commented on issue #11685: URL: https://github.com/apache/iceberg/issues/11685#issuecomment-2513717563 Let's ask @bryanck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867134750 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewriteExecutor.java: ## @@ -0,0 +1,257 @@ +/* + * Licensed to the Apache Softw

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867148777 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867146642 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867144630 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867140918 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867130953 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewriteCommitter.java: ## @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Soft

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867119084 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DeleteFilesProcessor.java: ## @@ -40,17 +41,17 @@ public class DeleteFilesProcessor ext

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867125168 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewriteCommitter.java: ## @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Soft

Re: [PR] Flink: Maintenance - RewriteDataFiles [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11497: URL: https://github.com/apache/iceberg/pull/11497#discussion_r1867122626 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/RewriteDataFiles.java: ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Flink: Add RowConverter for Iceberg Source [iceberg]

2024-12-02 Thread via GitHub
abharath9 commented on code in PR #11301: URL: https://github.com/apache/iceberg/pull/11301#discussion_r1867072917 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedGenericRecord.java: ## @@ -18,112 +18,77 @@ */ package org.apache.iceb

Re: [PR] fix: equality delete writer field id project [iceberg-rust]

2024-12-02 Thread via GitHub
ZENOTME commented on PR #751: URL: https://github.com/apache/iceberg-rust/pull/751#issuecomment-2513577786 cc @liurenjie1024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] fix: equality delete writer field id project [iceberg-rust]

2024-12-02 Thread via GitHub
ZENOTME opened a new pull request, #751: URL: https://github.com/apache/iceberg-rust/pull/751 1. I find that the definition of primitive type is different between arrow and iceberg, which cause the condition in equality delete writer to be wrong. This PR fixes it and add test. 2. Also fi

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
gaborkaszab commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865947614 ## include/CMakeLists.txt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: I gave this a second thought: I think

Re: [I] Iceberg Kafka-Connect runtime not published as part of 1.7.0 release? [iceberg]

2024-12-02 Thread via GitHub
manuzhang commented on issue #11685: URL: https://github.com/apache/iceberg/issues/11685#issuecomment-2513464279 I believe it's on purpose according to the [installation guide](https://iceberg.apache.org/docs/nightly/kafka-connect/#installation) -- This is an automated message from the Ap

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866324210 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1108,6 +1108,25 @@ public Builder setDefaultPartitionSpec(int specId) { return thi

Re: [PR] Added force virtual addressing configuration for S3 [iceberg-python]

2024-12-02 Thread via GitHub
helmiazizm commented on code in PR #1392: URL: https://github.com/apache/iceberg-python/pull/1392#discussion_r1866939185 ## pyiceberg/io/pyarrow.py: ## @@ -350,7 +351,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc, f"{u

Re: [PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2024-12-02 Thread via GitHub
bryanck commented on PR #11623: URL: https://github.com/apache/iceberg/pull/11623#issuecomment-2513413707 I don't feel we need a new `RecordRouter` abstraction to implement this feature, which introduces complexity. `SinkWriter.extractRouteValue()` could be enhanced to extract the source to

Re: [PR] Added force virtual addressing configuration for S3 [iceberg-python]

2024-12-02 Thread via GitHub
helmiazizm commented on code in PR #1392: URL: https://github.com/apache/iceberg-python/pull/1392#discussion_r1866939185 ## pyiceberg/io/pyarrow.py: ## @@ -350,7 +351,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc, f"{u

Re: [PR] feat: support position delete writer [iceberg-rust]

2024-12-02 Thread via GitHub
ZENOTME commented on PR #704: URL: https://github.com/apache/iceberg-rust/pull/704#issuecomment-2513388680 I think we can resolve #741 first before this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-02 Thread via GitHub
aihuaxu commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1866700545 ## core/src/main/java/org/apache/iceberg/variants/VariantArray.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2024-12-02 Thread via GitHub
mun1r0b0t commented on PR #11623: URL: https://github.com/apache/iceberg/pull/11623#issuecomment-2513372539 I made the changes as we discussed on Slack. Instead of using new keys for the topic based configuration, I am overloading the existing `route-regex` key to match against the topic.

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2513346295 Thanks @flyrain for reviewing and merging the PR! Also thanks @singhpk234 for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
flyrain merged PR #11551: URL: https://github.com/apache/iceberg/pull/11551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]

2024-12-02 Thread via GitHub
mrcnc commented on code in PR #11565: URL: https://github.com/apache/iceberg/pull/11565#discussion_r1866849501 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +250,116 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream

Re: [PR] Core: Support aggregated basic stats in partition summary [iceberg]

2024-12-02 Thread via GitHub
ajantha-bhat commented on PR #11669: URL: https://github.com/apache/iceberg/pull/11669#issuecomment-2511694995 Yes, it is still active. But it is not getting enough reviews. I am facing very hard to get reviews. https://github.com/apache/iceberg/pull/11216 is the last PR that is n

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
wypoon commented on PR #11661: URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2513299002 @nastra thank you for reviewing this. I have done some renaming as you suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Is there any way on Flink to read newly appended data only (NOT in current Iceberg table snapshot)? [iceberg]

2024-12-02 Thread via GitHub
github-actions[bot] closed issue #9955: Is there any way on Flink to read newly appended data only (NOT in current Iceberg table snapshot)? URL: https://github.com/apache/iceberg/issues/9955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Improve remove_orphan_files performance by using "inventory listing" [iceberg]

2024-12-02 Thread via GitHub
github-actions[bot] commented on issue #10426: URL: https://github.com/apache/iceberg/issues/10426#issuecomment-2513245082 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] why spark ddl rename iceberg table name not change location? does it matter? [iceberg]

2024-12-02 Thread via GitHub
github-actions[bot] commented on issue #10436: URL: https://github.com/apache/iceberg/issues/10436#issuecomment-2513245151 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Is there any way on Flink to read newly appended data only (NOT in current Iceberg table snapshot)? [iceberg]

2024-12-02 Thread via GitHub
github-actions[bot] commented on issue #9955: URL: https://github.com/apache/iceberg/issues/9955#issuecomment-2513244966 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Bump `HiveCatalog` hive-metastore dependency to Hive 4 [iceberg]

2024-12-02 Thread via GitHub
github-actions[bot] commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2513245119 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
raulcd commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1866022415 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: what is `api` in this context? is this the public file headers to include? -- This is an automated message

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
wypoon commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1866764798 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
wypoon commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1866755885 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
wypoon commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1866757589 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]

2024-12-02 Thread via GitHub
mrcnc commented on code in PR #11565: URL: https://github.com/apache/iceberg/pull/11565#discussion_r1866755677 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +250,116 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream

Re: [PR] Fix when reading struct-type data without an id in iceberg-parquet [iceberg]

2024-12-02 Thread via GitHub
nastra commented on PR #11378: URL: https://github.com/apache/iceberg/pull/11378#issuecomment-2512158794 @Fokko could you review this one please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Spark-3.5: make `where` sql case sensitive setting alterable in rewrite data files procedure [iceberg]

2024-12-02 Thread via GitHub
szehon-ho merged PR #11439: URL: https://github.com/apache/iceberg/pull/11439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark-3.5: make `where` sql case sensitive setting alterable in rewrite data files procedure [iceberg]

2024-12-02 Thread via GitHub
szehon-ho commented on PR #11439: URL: https://github.com/apache/iceberg/pull/11439#issuecomment-2513022061 Thanks @ludlows , and also @huaxingao, @anuragmantri @singhpk234 for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] Getting ValidationException Found conflicting files that can contain records matching true (concurrent writers) [iceberg]

2024-12-02 Thread via GitHub
paul-bormans-pcgw opened a new issue, #11687: URL: https://github.com/apache/iceberg/issues/11687 ### Query engine 1. PyIceberg 2. Trino ### Question I'm running a test (on docker-compose) where new data is appended (FastAppend) every +/- 1 second while on the other e

Re: [PR] Spark: Read DVs when reading from .position_deletes table [iceberg]

2024-12-02 Thread via GitHub
nastra commented on code in PR #11657: URL: https://github.com/apache/iceberg/pull/11657#discussion_r1866226099 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterable.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Document procedure for stats collection [iceberg]

2024-12-02 Thread via GitHub
szehon-ho commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1866644046 ## docs/docs/spark-procedures.md: ## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-||-

Re: [I] Add partition expire for Iceberg [iceberg]

2024-12-02 Thread via GitHub
nastra commented on issue #11686: URL: https://github.com/apache/iceberg/issues/11686#issuecomment-2512032148 This is being handled by https://github.com/apache/iceberg/pull/10755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865982169 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: Personally I'm not in favor of splitting this repo into multiple libraries, which is an overkill. Originally I

Re: [PR] Core: Support aggregated basic stats in partition summary [iceberg]

2024-12-02 Thread via GitHub
ajantha-bhat commented on PR #11669: URL: https://github.com/apache/iceberg/pull/11669#issuecomment-2512123092 @deniskuzZ: Could you please comment on my last PR that this feature will be helpful for Hive? and you are looking for it. It might help get more attention for review. -- T

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1866008092 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: Sounds good! It is still unclear what it will look like eventually. I'm adding third-party libraries (arrow, a

Re: [PR] Core, Spark: Refactor RewriteFileGroup planner to core [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11513: URL: https://github.com/apache/iceberg/pull/11513#discussion_r1865930536 ## core/src/main/java/org/apache/iceberg/actions/RewriteFileGroup.java: ## @@ -31,26 +31,26 @@ import org.apache.iceberg.util.DataFileSet; /** - * Container class r

Re: [PR] Create publish-docker.yml [iceberg]

2024-12-02 Thread via GitHub
sungwy commented on code in PR #11632: URL: https://github.com/apache/iceberg/pull/11632#discussion_r1865927839 ## .github/workflows/publish-docker.yml: ## @@ -0,0 +1,51 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865972561 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: I'm not sure if I understand it correctly. Do you mean that the `api` folder stores all public headers? Then e

[PR] Align argument name with doc comment [iceberg-rust]

2024-12-02 Thread via GitHub
SergeiPatiakin opened a new pull request, #750: URL: https://github.com/apache/iceberg-rust/pull/750 Method TableMetadataBuilder::new_from_metadata was introduced in https://github.com/apache/iceberg-rust/pull/587 This PR aligns the method argument names with the method's documentatio

Re: [PR] Count rows as a metadata only operation [iceberg-python]

2024-12-02 Thread via GitHub
jayceslesar commented on PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#issuecomment-2512673626 Question: Does it make sense to expose this as the `__len__` dunder method because python? It would just return the `self.count()` -- This is an automated message from the Ap

Re: [PR] Feature: Write to branches [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on code in PR #941: URL: https://github.com/apache/iceberg-python/pull/941#discussion_r1866354514 ## pyiceberg/table/update/__init__.py: ## @@ -609,11 +609,14 @@ class AssertRefSnapshotId(ValidatableTableRequirement): type: Literal["assert-ref-snapsho

Re: [PR] Feature: Write to branches [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on code in PR #941: URL: https://github.com/apache/iceberg-python/pull/941#discussion_r1866358508 ## tests/table/test_init.py: ## @@ -982,28 +982,43 @@ def test_assert_table_uuid(table_v2: Table) -> None: def test_assert_ref_snapshot_id(table_v2: Table) -

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2512632326 @flyrain Thanks for the quick reply. I will have a follow-up PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] `catalog.load_table` raises Invalid JSON error [iceberg-python]

2024-12-02 Thread via GitHub
sandcobainer commented on issue #1328: URL: https://github.com/apache/iceberg-python/issues/1328#issuecomment-2512626122 @kevinjqliu tried this snippet by pointing the metadata location directly to the s3 uri, and the error is the same. does this mean it's an s3 access issue? ```File

Re: [I] [Feature] Provide Nightly Build to PyPi [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on issue #872: URL: https://github.com/apache/iceberg-python/issues/872#issuecomment-2512598368 FYI https://lists.apache.org/thread/oowhcfwv3fcjzdzm76tbn99k5q84mr75 One step closer to nightly build -- This is an automated message from the Apache Git Service. To

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
gaborkaszab commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865992819 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: Ok ,what if we start with api/ core/ puffin/ and example/ and then we'll see if there is a need for anyth

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
gaborkaszab commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865953000 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: In my opinion we'd need at leas api/ and core/ (I'd prefer core/ over libiceberg just to be in line with

Re: [PR] Spark: Remove extra columns for ColumnBatch [iceberg]

2024-12-02 Thread via GitHub
huaxingao commented on PR #11551: URL: https://github.com/apache/iceberg/pull/11551#issuecomment-2512581027 @flyrain I think this over. The `missingIds` could be from [`ROW_POSITION.fieldId()`](https://github.com/apache/iceberg/blob/main/data/src/main/java/org/apache/iceberg/data/DeleteFilte

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-12-02 Thread via GitHub
deniskuzZ commented on PR #11216: URL: https://github.com/apache/iceberg/pull/11216#issuecomment-2512537342 Thanks @ajantha-bhat for your work on partition stats support in Iceberg! That could be reused in Hive as a building block for https://github.com/apache/hive/pull/5498 -- This is a

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866324210 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1108,6 +1108,25 @@ public Builder setDefaultPartitionSpec(int specId) { return thi

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1866036591 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: What about this? ``` iceberg-cpp/ ├── api/ │ ├── table.h │ └── puffin.h ├── example/ │

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-02 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1866416964 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java: ## @@ -386,6 +386,12 @@ public void testHiveTableAndIcebergTableWithSameName(TableTyp

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-02 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1866414354 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-02 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1866412255 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-02 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1866408953 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-02 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1866408953 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Added force virtual addressing configuration for S3 [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on PR #1392: URL: https://github.com/apache/iceberg-python/pull/1392#issuecomment-2512403443 reopened @21, and added a few comments on this PR. Thanks for the contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Added force virtual addressing configuration for S3 [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on code in PR #1392: URL: https://github.com/apache/iceberg-python/pull/1392#discussion_r1866397952 ## pyiceberg/io/pyarrow.py: ## @@ -350,7 +351,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc, f"{u

[I] Support virtual addressing style in PyArrowFileIO [iceberg-python]

2024-12-02 Thread via GitHub
Fokko opened a new issue, #21: URL: https://github.com/apache/iceberg-python/issues/21 ### Feature Request / Improvement Migrated from the old repository https://github.com/apache/iceberg/issues/7219 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866330705 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1108,6 +1108,25 @@ public Builder setDefaultPartitionSpec(int specId) { return thi

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1866348249 ## core/src/main/java/org/apache/iceberg/RESTFileScanTaskParser.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Feature: Write to branches [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on PR #941: URL: https://github.com/apache/iceberg-python/pull/941#issuecomment-2512331585 @vinjai coming back to this after the 0.8.1 release :) Feel free to tag me again once the comments above are addressed. Thanks again for the contribution! -- This is an aut

Re: [PR] Feature: Write to branches [iceberg-python]

2024-12-02 Thread via GitHub
kevinjqliu commented on code in PR #941: URL: https://github.com/apache/iceberg-python/pull/941#discussion_r1866359013 ## tests/table/test_init.py: ## @@ -982,28 +982,43 @@ def test_assert_table_uuid(table_v2: Table) -> None: def test_assert_ref_snapshot_id(table_v2: Table) -

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866324210 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1108,6 +1108,25 @@ public Builder setDefaultPartitionSpec(int specId) { return thi

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866324210 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1108,6 +1108,25 @@ public Builder setDefaultPartitionSpec(int specId) { return thi

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1866315984 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [I] Variant Data Type Support [iceberg]

2024-12-02 Thread via GitHub
aihuaxu commented on issue #10392: URL: https://github.com/apache/iceberg/issues/10392#issuecomment-2512272807 Hi folks. We are actively working on the Variant support. I have made a POC with [Spark implementation](https://github.com/apache/iceberg/pull/11201) and currently we are working o

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
raulcd commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1866034166 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: ok, just read the other comment. No strong opinion on calling it `include` or `api`, I've seen it call it both

Re: [PR] Core, Spark: Refactor RewriteFileGroup planner to core [iceberg]

2024-12-02 Thread via GitHub
pvary commented on code in PR #11513: URL: https://github.com/apache/iceberg/pull/11513#discussion_r1865929273 ## core/src/main/java/org/apache/iceberg/actions/RewriteFilePlan.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [I] Can't seem to read a table (unexpected error, crashes JVM processes) [iceberg-python]

2024-12-02 Thread via GitHub
shanielh closed issue #1390: Can't seem to read a table (unexpected error, crashes JVM processes) URL: https://github.com/apache/iceberg-python/issues/1390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Spark : Derive Stats From Manifest on the Fly [iceberg]

2024-12-02 Thread via GitHub
guykhazma commented on PR #11615: URL: https://github.com/apache/iceberg/pull/11615#issuecomment-2511446282 @RussellSpitzer @huaxingao Just a friendly reminder to review the changes when you have a chance. Thanks! -- This is an automated message from the Apache Git Service. To respond to

[I] Add partition expire for Iceberg [iceberg]

2024-12-02 Thread via GitHub
zlzhang0122 opened a new issue, #11686: URL: https://github.com/apache/iceberg/issues/11686 ### Feature Request / Improvement Currently, when we write a partition to Iceberg, the partition and will exists permanently unless we manually drop the partition, sometimes we may want to exp

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511169797 @mxm Thank you very much for your suggestions. I have made the necessary modifications, and I appreciate you taking the time out of your busy schedule to review it again. I am ver

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511130152 @mxm Thank you very much for your suggestions. I have made the necessary modifications, and I appreciate you taking the time out of your busy schedule to review it again. I am ver

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865575644 ## CMakeLists.txt: ## @@ -0,0 +1,53 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# dis

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
raulcd commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r186410 ## CMakeLists.txt: ## @@ -0,0 +1,53 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# dis

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1865523100 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -310,10 +366,16 @@ public void writeSnapshot(DataOutputView out) th

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
mxm commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511041771 Thanks for the update @Guosmilesmile! Unfortunately, we don't have a way to encode the serializer version for all serializers, so a best-effort approach to retry with a different serializer

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
nastra commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1865502617 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
nastra commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1865498503 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-02 Thread via GitHub
nastra commented on code in PR #11661: URL: https://github.com/apache/iceberg/pull/11661#discussion_r1865497835 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java: ## @@ -46,122 +47,217 @@ public VectorizedParquetDefini

[I] Iceberg Kafka-Connect runtime not published as part of 1.7.0 release? [iceberg]

2024-12-02 Thread via GitHub
thjaeckle opened a new issue, #11685: URL: https://github.com/apache/iceberg/issues/11685 ### Query engine _No response_ ### Question Hello Iceberg community. I saw that in the recent `1.7.0` release you moved the Kafka-Connect Iceberg sink to this project, which

Re: [PR] Docs: Default value of table level distribution-mode should be not set [iceberg]

2024-12-02 Thread via GitHub
manuzhang commented on code in PR #11663: URL: https://github.com/apache/iceberg/pull/11663#discussion_r1865444837 ## docs/docs/configuration.md: ## @@ -67,10 +67,10 @@ Iceberg tables support table properties to configure table behavior, like the de | write.metadata.metrics.co

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865421713 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review Comment: Please see my comments above. I think this relat

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865415441 ## src/demo.cc: ## @@ -0,0 +1,26 @@ +/* Review Comment: Do we need more libraries other than `libiceberg` (which is the core library in your structure)? In my mind, `

Re: [PR] Add basic CMake support for the iceberg library [iceberg-cpp]

2024-12-02 Thread via GitHub
wgtmac commented on code in PR #3: URL: https://github.com/apache/iceberg-cpp/pull/3#discussion_r1865407281 ## include/CMakeLists.txt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: The `include` folder is for public header