Re: [I] Add overwrite support for `add_files` [iceberg-python]

2024-06-10 Thread via GitHub
enkidulan commented on issue #809: URL: https://github.com/apache/iceberg-python/issues/809#issuecomment-2159931547 I've started working on adding the feature in https://github.com/apache/iceberg-python/pull/810. It is a work in progress, but it should sufficiently demonstrate the idea. No

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1634222468 ## .github/workflows/delta-conversion-ci.yml: ## @@ -68,6 +68,7 @@ jobs: delta-conversion-scala-2-12-tests: runs-on: ubuntu-22.04 strategy: + fail-f

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1634222468 ## .github/workflows/delta-conversion-ci.yml: ## @@ -68,6 +68,7 @@ jobs: delta-conversion-scala-2-12-tests: runs-on: ubuntu-22.04 strategy: + fail-f

Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

2024-06-10 Thread via GitHub
zhangbutao commented on code in PR #10449: URL: https://github.com/apache/iceberg/pull/10449#discussion_r1634220405 ## mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java: ## @@ -234,6 +234,8 @@ public void testProjection() throws Exception { Schema projec

Re: [PR] Add accessor for Schema identifier_field_ids [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 commented on PR #388: URL: https://github.com/apache/iceberg-rust/pull/388#issuecomment-2159754189 Clippy check failed. cc @c-thiel Do you mind to fix it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Flaky test due to failing to delete temp directory [iceberg]

2024-06-10 Thread via GitHub
manuzhang opened a new issue, #10480: URL: https://github.com/apache/iceberg/issues/10480 ### Apache Iceberg version main (development) ### Query engine None ### Please describe the bug 🐞 https://github.com/apache/iceberg/actions/runs/9443030987/job/26006092

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
chinmay-bhat commented on PR #728: URL: https://github.com/apache/iceberg-python/pull/728#issuecomment-2159661781 Removed AssertTableUUID from PR and rebased onto latest main! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1634086426 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = pr

Re: [PR] Docs: Add flinkVersion and flinkVersionMajor instead of hardcode [iceberg]

2024-06-10 Thread via GitHub
manuzhang commented on PR #10463: URL: https://github.com/apache/iceberg/pull/10463#issuecomment-2159626243 @pvary Links to javadocs, where `icebergVersion` is used, were previously ignored. Anyway, the linkcheck has been removed in #10404 -- This is an automated message from the Apac

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-06-10 Thread via GitHub
rodmeneses commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1634032064 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java: ## @@ -140,8 +141,7 @@ public void testCheckAndGetEqualityFieldIds()

Re: [PR] docs: Backport fixes for broken links [iceberg]

2024-06-10 Thread via GitHub
amogh-jahagirdar merged PR #10439: URL: https://github.com/apache/iceberg/pull/10439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] docs: Backport fixes for broken links [iceberg]

2024-06-10 Thread via GitHub
amogh-jahagirdar commented on PR #10439: URL: https://github.com/apache/iceberg/pull/10439#issuecomment-2159594019 Thanks @Fokko ! Thanks @manuzhang and @szehon-ho for reviewing as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-06-10 Thread via GitHub
rodmeneses commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1634032064 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java: ## @@ -140,8 +141,7 @@ public void testCheckAndGetEqualityFieldIds()

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1634010587 ## api/src/main/java/org/apache/iceberg/types/AssignIds.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1634009661 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1634009506 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1634008905 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -355,6 +355,17 @@ public static Schema reassignOrRefreshIds( return new Schema(struct.fields

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-06-10 Thread via GitHub
rodmeneses commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1634003009 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/SinkTestUtil.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Bump griffe from 0.45.2 to 0.45.3 [iceberg-python]

2024-06-10 Thread via GitHub
HonahX merged PR #806: URL: https://github.com/apache/iceberg-python/pull/806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Bump pypa/cibuildwheel from 2.18.1 to 2.19.0 [iceberg-python]

2024-06-10 Thread via GitHub
HonahX merged PR #805: URL: https://github.com/apache/iceberg-python/pull/805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633990490 ## .github/workflows/delta-conversion-ci.yml: ## @@ -68,6 +68,7 @@ jobs: delta-conversion-scala-2-12-tests: runs-on: ubuntu-22.04 strategy: + fail

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633990025 ## .github/workflows/flink-ci.yml: ## @@ -70,8 +70,9 @@ jobs: flink-scala-2-12-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false ma

[PR] Bump griffe from 0.45.2 to 0.45.3 [iceberg-python]

2024-06-10 Thread via GitHub
dependabot[bot] opened a new pull request, #806: URL: https://github.com/apache/iceberg-python/pull/806 Bumps [griffe](https://github.com/mkdocstrings/griffe) from 0.45.2 to 0.45.3. Release notes Sourced from https://github.com/mkdocstrings/griffe/releases";>griffe's releases.

[PR] Bump pypa/cibuildwheel from 2.18.1 to 2.19.0 [iceberg-python]

2024-06-10 Thread via GitHub
dependabot[bot] opened a new pull request, #805: URL: https://github.com/apache/iceberg-python/pull/805 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.0. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwhee

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2159421290 No problem, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633910949 ## core/src/main/java/org/apache/iceberg/ManifestReader.java: ## @@ -114,7 +114,7 @@ protected ManifestReader( this.spec = readPartitionSpec(file); }

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633910150 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } + /**

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633901791 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633852261 ## core/src/main/java/org/apache/iceberg/ManifestReader.java: ## @@ -114,7 +114,7 @@ protected ManifestReader( this.spec = readPartitionSpec(file);

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633849778 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -475,6 +475,10 @@ public NestedField asRequired() { return new NestedField(false, id, na

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633849167 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -355,6 +355,17 @@ public static Schema reassignOrRefreshIds( return new Schema(struct.f

Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159307590 Gotcha - thank you for the explanation @Fokko I didn't think of how using a large_binary could actually improve the performance because the data is grouped together into large bu

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633842060 ## api/src/main/java/org/apache/iceberg/types/AssignIds.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159294140 I agree that you cannot write a single field of 2GB+ to a parquet file. In that case, Parquet is probably not the best way of storing such a big blob. The difference between how

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1633831929 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properti

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2159288156 @chinmay-bhat Looks like the CI is still a bit sad :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159287595 No problem!!! Thank you for the review. Much appreciated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159279193 > > > I think we can overlay the hash distribution above the ranges, > > > > > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distri

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159270792 Thanks @syun64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko merged PR #804: URL: https://github.com/apache/iceberg-python/pull/804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159264979 @syun64 Ah, you're right. For some reason I thought the table-uuid would be refreshed when a table was being updated, but this isn't the case: > A UUID that identifies the table

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633802509 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633800594 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633799253 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi closed pull request #10472: Add .java-version URL: https://github.com/apache/iceberg/pull/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2159226722 I totally forgot about https://github.com/apache/iceberg/pull/8300. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159222857 > > I think we can overlay the hash distribution above the ranges, > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires st

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632922261 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: Agree the default

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159128257 > This will increase the number of conflicts while there shouldn't be an issue with adding a snapshot and updating the sort order at the same time. My impression was that Assert

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633719086 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159110376 I also noticed this yesterday. @danielcweeks do you know why this is? This will increase the number of conflicts while there shouldn't be an issue with adding a snapshot and upda

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633711249 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633711249 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633709924 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Add support for orc format [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2159094452 Thanks, @HonahX for the feedback, I will consider all this while moving forward! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2159087897 Being able to configure (and also disable) the cachine would be a very nice touch Op ma 10 jun 2024 om 09:52 schreef Chinmay Bhat ***@***.***> > ***@***. commented o

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633695272 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633693407 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10020: URL: https://github.com/apache/iceberg/pull/10020#issuecomment-2159056511 Rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633681035 ## .palantir/revapi.yml: ## @@ -1018,6 +1018,12 @@ acceptedBreaks: old: "method void org.apache.iceberg.PositionDeletesTable.PositionDeletesBatchScan::(org.

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2159028678 Thanks, @ZENOTME :-) https://github.com/apache/avro/pull/2650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2159026426 This appears to have been resolved: https://issues.apache.org/jira/browse/AVRO-3920 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159001144 > I think we can overlay the hash distribution above the ranges, Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires stat

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158978912 > > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can cr

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2158956729 Related discussion at : https://github.com/apache/iceberg/pull/8300 As a jenv user, I am not in favor of this. There is no right version here (depends on what engines you are

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
singhpk234 commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633621743 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
singhpk234 commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633621743 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
s-akhtar-baig commented on PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#issuecomment-2158936026 @liurenjie1024 @marvinlanhenke @sdd, thank you for reviewing these changes and for your feedback! Greatly appreciated! I have pushed the required changes. Please let me kno

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633577198 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

[I] Iceberg tables creation using Spark2.4 [iceberg]

2024-06-10 Thread via GitHub
trsree opened a new issue, #10479: URL: https://github.com/apache/iceberg/issues/10479 ### Apache Iceberg version 1.2.1 ### Query engine Spark ### Please describe the bug 🐞 I am trying to integrate iceberg with spark 2.4 to create iceberg tables. what is th

Re: [PR] Support building with Java 21 [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1633492995 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -1,8 +1,8 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists # checksum was taken from https://

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1633484372 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properti

Re: [PR] Use all known Scala versions when -DallVersions set [iceberg]

2024-06-10 Thread via GitHub
findepi commented on PR #10478: URL: https://github.com/apache/iceberg/pull/10478#issuecomment-2158732077 Can these interfere with https://github.com/apache/iceberg/blob/75b3a052a739f85f28fa3e43ee29456aa01da066/build.gradle#L1040-L1066 ? -- This is an automated message from the Apache Gi

Re: [PR] Build: Add -DallVersions property that exposes all component versions [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #6167: URL: https://github.com/apache/iceberg/pull/6167#discussion_r1633430132 ## settings.gradle: ## @@ -52,6 +52,12 @@ project(':nessie').name = 'iceberg-nessie' project(':gcp').name = 'iceberg-gcp' project(':dell').name = 'iceberg-dell' +if

Re: [PR] S3 InputsStream: Reopen connection on Connection Reset [iceberg]

2024-06-10 Thread via GitHub
amogh-jahagirdar commented on PR #10470: URL: https://github.com/apache/iceberg/pull/10470#issuecomment-2158614362 @shanielh Sorry about missing to link the issue on #10433, that's totally my bad. As @singhpk234 said, that PR should address the issue you're seeing here. I'll be revisiting t

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158524375 > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can

Re: [PR] Update Gradle to 8.8 [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10476: URL: https://github.com/apache/iceberg/pull/10476#discussion_r1633343421 ## .github/workflows/java-ci.yml: ## @@ -90,20 +90,26 @@ jobs: build-checks: runs-on: ubuntu-22.04 +strategy: + matrix: +jvm: [8, 11, 17]

Re: [I] refactor: Remove `async_trait` in `Catalog` trait. [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo commented on issue #139: URL: https://github.com/apache/iceberg-rust/issues/139#issuecomment-2158450268 I personally do not favor adopting `async in trait` for the `Catalog` API because it compromises object safety. Users will no longer be able to use `Box`. All APIs related to `Cata

Re: [PR] chore(deps): Update arrow-arith requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #395: chore(deps): Update arrow-arith requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): Update arrow-array requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #394: URL: https://github.com/apache/iceberg-rust/pull/394#issuecomment-2158432190 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update parquet requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #396: URL: https://github.com/apache/iceberg-rust/pull/396#issuecomment-2158432546 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-ord requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #397: URL: https://github.com/apache/iceberg-rust/pull/397#issuecomment-2158432702 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-array requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #394: chore(deps): Update arrow-array requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): Update arrow-arith requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #395: URL: https://github.com/apache/iceberg-rust/pull/395#issuecomment-2158432414 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-ord requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #397: chore(deps): Update arrow-ord requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] chore(deps): Update parquet requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #396: chore(deps): Update parquet requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore(deps): Update arrow-schema requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #393: URL: https://github.com/apache/iceberg-rust/pull/393#issuecomment-2158431722 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-schema requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #393: chore(deps): Update arrow-schema requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] Bump arrow related crates to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo opened a new issue, #399: URL: https://github.com/apache/iceberg-rust/issues/399 arrow-rs has releated 52, it's better for us to keep updated. --- This issue closed: - https://github.com/apache/iceberg-rust/pull/393 - https://github.com/apache/iceberg-rust/pull/

Re: [PR] Clarify hash specification for date and boolean [iceberg]

2024-06-10 Thread via GitHub
findepi closed pull request #5774: Clarify hash specification for date and boolean URL: https://github.com/apache/iceberg/pull/5774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10472: URL: https://github.com/apache/iceberg/pull/10472#discussion_r1633221023 ## .java-version: ## @@ -0,0 +1 @@ +17 Review Comment: thanks @pvary for your comment! this matters only for people who use specific tooling to select java

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10472: URL: https://github.com/apache/iceberg/pull/10472#discussion_r1633191092 ## .java-version: ## @@ -0,0 +1 @@ +17 Review Comment: Isn't it better to use the lowest supported version, so the issues are shown immediately? What if someone

Re: [I] Implement `ExpressionEvaluator` [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 closed issue #358: Implement `ExpressionEvaluator` URL: https://github.com/apache/iceberg-rust/issues/358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 merged PR #363: URL: https://github.com/apache/iceberg-rust/pull/363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1633057721 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -103,98 +106,245 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> { refer

Re: [PR] feat: support lower_bound&&upper_bound for parquet writer [iceberg-rust]

2024-06-10 Thread via GitHub
ZENOTME commented on PR #383: URL: https://github.com/apache/iceberg-rust/pull/383#issuecomment-2157957733 Thanks for your review! I have refined the code, please let me know if there is something that still can be improved. @sdd -- This is an automated message from the Apache Git Servic

Re: [PR] feat: support lower_bound&&upper_bound for parquet writer [iceberg-rust]

2024-06-10 Thread via GitHub
ZENOTME commented on code in PR #383: URL: https://github.com/apache/iceberg-rust/pull/383#discussion_r1633012703 ## crates/iceberg/src/writer/file_writer/parquet_writer.rs: ## @@ -78,106 +89,407 @@ impl FileWriterBuilder for ParquetWr type R = ParquetWriter; async

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632922261 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: Agree the default

[PR] Support building with Java 21 [iceberg]

2024-06-10 Thread via GitHub
findepi opened a new pull request, #10474: URL: https://github.com/apache/iceberg/pull/10474 This is currently the latest Java LTS version, so we should support building with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Docs: Add flinkVersion and flinkVersionMajor instead of hardcode [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10463: URL: https://github.com/apache/iceberg/pull/10463#issuecomment-2157713201 @manuzhang: Any idea why the tests are failing on the new values (`flinkVersion`, `flinkVersionMajor`), but they are not failing on `icebergVersion`? -- This is an automated message fr

  1   2   >