Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2157521431 @chinmay-bhat @Fokko Sorry for the last-minute feedback, but wouldn't it be better if we explicitly pass (maxsize=128) for the lru_cache annotation, When a new call comes in, t

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632704500 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: what are your thou

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-10 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2157551451 Here is how the different deletes work: - EQ-DELETE - removes all occurrences of the record with the given id BEFORE the snapshot - POS-DELETE - removes a given row from the giv

Re: [PR] Flink 1.17: Supports batch queries using time ranges [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #7362: URL: https://github.com/apache/iceberg/pull/7362#discussion_r1632734377 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -291,6 +291,14 @@ public Builder startSnapshotTimestamp(Long newStartSnapsho

Re: [PR] Flink 1.17: Supports batch queries using time ranges [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #7362: URL: https://github.com/apache/iceberg/pull/7362#discussion_r1632737707 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedSql.java: ## @@ -71,6 +75,16 @@ protected List run( throws Exception {

Re: [PR] Flink 1.17: Supports batch queries using time ranges [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #7362: URL: https://github.com/apache/iceberg/pull/7362#discussion_r1632739078 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamScanSql.java: ## @@ -427,6 +427,7 @@ public void testConsumeFromStartTag() throws Exception {

Re: [PR] Flink 1.17: Supports batch queries using time ranges [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #7362: URL: https://github.com/apache/iceberg/pull/7362#issuecomment-2157568620 Welcome back @hililiwei! Went through the PR and left a few comments. Would you mind checking the test failures? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10449: URL: https://github.com/apache/iceberg/pull/10449#discussion_r1632746388 ## mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java: ## @@ -234,6 +234,8 @@ public void testProjection() throws Exception { Schema projection

Re: [I] flink + iceberg can't read realtime data from glue + s3 data with PRIMARY KEY unique table [iceberg]

2024-06-10 Thread via GitHub
pvary closed issue #10453: flink + iceberg can't read realtime data from glue + s3 data with PRIMARY KEY unique table URL: https://github.com/apache/iceberg/issues/10453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] flink + iceberg can't read realtime data from glue + s3 data with PRIMARY KEY unique table [iceberg]

2024-06-10 Thread via GitHub
pvary commented on issue #10453: URL: https://github.com/apache/iceberg/issues/10453#issuecomment-2157575704 Closing the issue, as the question has been answered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1632753331 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -103,98 +106,245 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> { reference: &Bou

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1632753331 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -103,98 +106,245 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> { reference: &Bou

Re: [PR] Add support for orc format [iceberg-python]

2024-06-10 Thread via GitHub
HonahX commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2157590281 Hi @MehulBatra. Thanks for taking this! It looks like a great start. > I believe we need to make a change in this write_file method to support ORC writes, as the link goes

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
chinmay-bhat commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632758206 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: As [the default

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
chinmay-bhat commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2157626790 I'm not sure why `test_duckdb_url_import` continues to fail with the same error. I've rebased onto latest main branch, and the test runs locally. I also don't see this issue in

Re: [PR] Core: Use Failsafe in ClientPoolImpl retry logic [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10458: URL: https://github.com/apache/iceberg/pull/10458#issuecomment-2157634782 @amogh-jahagirdar: Is there a way to avoid changing the type of the exception thrown? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-10 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2157653011 @pvary Yes, the `01930` file is a pos delete file, but the file path is only contain the data file `01928`. -- This is an automated message from the Apache Git Service. To

Re: [PR] Kafka Connect: Add kerberos authentication option [iceberg]

2024-06-10 Thread via GitHub
Dawnpool commented on PR #10173: URL: https://github.com/apache/iceberg/pull/10173#issuecomment-2157660809 @bryanck Hi, I am mentioning you in case you have forgot this PR. Please take a look and let me know if there is anything more to fix. -- This is an automated message from the Ap

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1632808824 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Seriali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1632828545 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Seriali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2157707270 @stevenzwu: Unrelated question, but come up when I have reading the PR: - What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure

Re: [PR] Docs: Add flinkVersion and flinkVersionMajor instead of hardcode [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10463: URL: https://github.com/apache/iceberg/pull/10463#issuecomment-2157713201 @manuzhang: Any idea why the tests are failing on the new values (`flinkVersion`, `flinkVersionMajor`), but they are not failing on `icebergVersion`? -- This is an automated message fr

[PR] Support building with Java 21 [iceberg]

2024-06-10 Thread via GitHub
findepi opened a new pull request, #10474: URL: https://github.com/apache/iceberg/pull/10474 This is currently the latest Java LTS version, so we should support building with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632922261 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: Agree the default

Re: [PR] feat: support lower_bound&&upper_bound for parquet writer [iceberg-rust]

2024-06-10 Thread via GitHub
ZENOTME commented on code in PR #383: URL: https://github.com/apache/iceberg-rust/pull/383#discussion_r1633012703 ## crates/iceberg/src/writer/file_writer/parquet_writer.rs: ## @@ -78,106 +89,407 @@ impl FileWriterBuilder for ParquetWr type R = ParquetWriter; async

Re: [PR] feat: support lower_bound&&upper_bound for parquet writer [iceberg-rust]

2024-06-10 Thread via GitHub
ZENOTME commented on PR #383: URL: https://github.com/apache/iceberg-rust/pull/383#issuecomment-2157957733 Thanks for your review! I have refined the code, please let me know if there is something that still can be improved. @sdd -- This is an automated message from the Apache Git Servic

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1633057721 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -103,98 +106,245 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> { refer

Re: [I] Implement `ExpressionEvaluator` [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 closed issue #358: Implement `ExpressionEvaluator` URL: https://github.com/apache/iceberg-rust/issues/358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-06-10 Thread via GitHub
liurenjie1024 merged PR #363: URL: https://github.com/apache/iceberg-rust/pull/363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10472: URL: https://github.com/apache/iceberg/pull/10472#discussion_r1633191092 ## .java-version: ## @@ -0,0 +1 @@ +17 Review Comment: Isn't it better to use the lowest supported version, so the issues are shown immediately? What if someone

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10472: URL: https://github.com/apache/iceberg/pull/10472#discussion_r1633221023 ## .java-version: ## @@ -0,0 +1 @@ +17 Review Comment: thanks @pvary for your comment! this matters only for people who use specific tooling to select java

Re: [PR] Clarify hash specification for date and boolean [iceberg]

2024-06-10 Thread via GitHub
findepi closed pull request #5774: Clarify hash specification for date and boolean URL: https://github.com/apache/iceberg/pull/5774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] Bump arrow related crates to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo opened a new issue, #399: URL: https://github.com/apache/iceberg-rust/issues/399 arrow-rs has releated 52, it's better for us to keep updated. --- This issue closed: - https://github.com/apache/iceberg-rust/pull/393 - https://github.com/apache/iceberg-rust/pull/

Re: [PR] chore(deps): Update arrow-schema requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #393: chore(deps): Update arrow-schema requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore(deps): Update arrow-schema requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #393: URL: https://github.com/apache/iceberg-rust/pull/393#issuecomment-2158431722 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [I] refactor: Remove `async_trait` in `Catalog` trait. [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo commented on issue #139: URL: https://github.com/apache/iceberg-rust/issues/139#issuecomment-2158450268 I personally do not favor adopting `async in trait` for the `Catalog` API because it compromises object safety. Users will no longer be able to use `Box`. All APIs related to `Cata

Re: [PR] chore(deps): Update arrow-arith requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #395: chore(deps): Update arrow-arith requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): Update arrow-array requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #394: URL: https://github.com/apache/iceberg-rust/pull/394#issuecomment-2158432190 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update parquet requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #396: URL: https://github.com/apache/iceberg-rust/pull/396#issuecomment-2158432546 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-ord requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #397: URL: https://github.com/apache/iceberg-rust/pull/397#issuecomment-2158432702 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-array requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #394: chore(deps): Update arrow-array requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): Update arrow-arith requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
dependabot[bot] commented on PR #395: URL: https://github.com/apache/iceberg-rust/pull/395#issuecomment-2158432414 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] chore(deps): Update arrow-ord requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #397: chore(deps): Update arrow-ord requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] chore(deps): Update parquet requirement from 51 to 52 [iceberg-rust]

2024-06-10 Thread via GitHub
Xuanwo closed pull request #396: chore(deps): Update parquet requirement from 51 to 52 URL: https://github.com/apache/iceberg-rust/pull/396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Update Gradle to 8.8 [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10476: URL: https://github.com/apache/iceberg/pull/10476#discussion_r1633343421 ## .github/workflows/java-ci.yml: ## @@ -90,20 +90,26 @@ jobs: build-checks: runs-on: ubuntu-22.04 +strategy: + matrix: +jvm: [8, 11, 17]

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158524375 > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can

Re: [PR] S3 InputsStream: Reopen connection on Connection Reset [iceberg]

2024-06-10 Thread via GitHub
amogh-jahagirdar commented on PR #10470: URL: https://github.com/apache/iceberg/pull/10470#issuecomment-2158614362 @shanielh Sorry about missing to link the issue on #10433, that's totally my bad. As @singhpk234 said, that PR should address the issue you're seeing here. I'll be revisiting t

Re: [PR] Build: Add -DallVersions property that exposes all component versions [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #6167: URL: https://github.com/apache/iceberg/pull/6167#discussion_r1633430132 ## settings.gradle: ## @@ -52,6 +52,12 @@ project(':nessie').name = 'iceberg-nessie' project(':gcp').name = 'iceberg-gcp' project(':dell').name = 'iceberg-dell' +if

Re: [PR] Use all known Scala versions when -DallVersions set [iceberg]

2024-06-10 Thread via GitHub
findepi commented on PR #10478: URL: https://github.com/apache/iceberg/pull/10478#issuecomment-2158732077 Can these interfere with https://github.com/apache/iceberg/blob/75b3a052a739f85f28fa3e43ee29456aa01da066/build.gradle#L1040-L1066 ? -- This is an automated message from the Apache Gi

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1633484372 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properti

Re: [PR] Support building with Java 21 [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1633492995 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -1,8 +1,8 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists # checksum was taken from https://

[I] Iceberg tables creation using Spark2.4 [iceberg]

2024-06-10 Thread via GitHub
trsree opened a new issue, #10479: URL: https://github.com/apache/iceberg/issues/10479 ### Apache Iceberg version 1.2.1 ### Query engine Spark ### Please describe the bug 🐞 I am trying to integrate iceberg with spark 2.4 to create iceberg tables. what is th

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633577198 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-10 Thread via GitHub
s-akhtar-baig commented on PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#issuecomment-2158936026 @liurenjie1024 @marvinlanhenke @sdd, thank you for reviewing these changes and for your feedback! Greatly appreciated! I have pushed the required changes. Please let me kno

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
singhpk234 commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633621743 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
singhpk234 commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633621743 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2158956729 Related discussion at : https://github.com/apache/iceberg/pull/8300 As a jenv user, I am not in favor of this. There is no right version here (depends on what engines you are

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158978912 > > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can cr

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159001144 > I think we can overlay the hash distribution above the ranges, Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires stat

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2159026426 This appears to have been resolved: https://issues.apache.org/jira/browse/AVRO-3920 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-06-10 Thread via GitHub
sdd commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2159028678 Thanks, @ZENOTME :-) https://github.com/apache/avro/pull/2650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633681035 ## .palantir/revapi.yml: ## @@ -1018,6 +1018,12 @@ acceptedBreaks: old: "method void org.apache.iceberg.PositionDeletesTable.PositionDeletesBatchScan::(org.

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10020: URL: https://github.com/apache/iceberg/pull/10020#issuecomment-2159056511 Rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633693407 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633695272 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2159087897 Being able to configure (and also disable) the cachine would be a very nice touch Op ma 10 jun 2024 om 09:52 schreef Chinmay Bhat ***@***.***> > ***@***. commented o

Re: [PR] Add support for orc format [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2159094452 Thanks, @HonahX for the feedback, I will consider all this while moving forward! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633709924 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633711249 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633711249 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159110376 I also noticed this yesterday. @danielcweeks do you know why this is? This will increase the number of conflicts while there shouldn't be an issue with adding a snapshot and upda

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633719086 ## api/src/main/java/org/apache/iceberg/Schema.java: ## @@ -507,4 +538,49 @@ public String toString() { .map(this::identifierFieldToString)

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159128257 > This will increase the number of conflicts while there shouldn't be an issue with adding a snapshot and updating the sort order at the same time. My impression was that Assert

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
MehulBatra commented on code in PR #787: URL: https://github.com/apache/iceberg-python/pull/787#discussion_r1632922261 ## pyiceberg/table/snapshots.py: ## @@ -228,6 +229,13 @@ def __eq__(self, other: Any) -> bool: ) +@lru_cache Review Comment: Agree the default

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159222857 > > I think we can overlay the hash distribution above the ranges, > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires st

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi closed pull request #10472: Add .java-version URL: https://github.com/apache/iceberg/pull/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
findepi commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2159226722 I totally forgot about https://github.com/apache/iceberg/pull/8300. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633799253 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633800594 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
findepi commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633802509 ## .github/workflows/spark-ci.yml: ## @@ -101,8 +102,9 @@ jobs: spark-3x-scala-2-13-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159264979 @syun64 Ah, you're right. For some reason I thought the table-uuid would be refreshed when a table was being updated, but this isn't the case: > A UUID that identifies the table

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko merged PR #804: URL: https://github.com/apache/iceberg-python/pull/804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159270792 Thanks @syun64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159279193 > > > I think we can overlay the hash distribution above the ranges, > > > > > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distri

Re: [PR] AssertTableUUID on all Transactions except CreateTableTransaction [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on PR #804: URL: https://github.com/apache/iceberg-python/pull/804#issuecomment-2159287595 No problem!!! Thank you for the review. Much appreciated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2159288156 @chinmay-bhat Looks like the CI is still a bit sad :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Support snapshot management operations like creating tags by adding `ManageSnapshots` API [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1633831929 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = properti

Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-10 Thread via GitHub
Fokko commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159294140 I agree that you cannot write a single field of 2GB+ to a parquet file. In that case, Parquet is probably not the best way of storing such a big blob. The difference between how

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633842060 ## api/src/main/java/org/apache/iceberg/types/AssignIds.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-10 Thread via GitHub
syun64 commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159307590 Gotcha - thank you for the explanation @Fokko I didn't think of how using a large_binary could actually improve the performance because the data is grouped together into large bu

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633849167 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -355,6 +355,17 @@ public static Schema reassignOrRefreshIds( return new Schema(struct.f

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633849778 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -475,6 +475,10 @@ public NestedField asRequired() { return new NestedField(false, id, na

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633852261 ## core/src/main/java/org/apache/iceberg/ManifestReader.java: ## @@ -114,7 +114,7 @@ protected ManifestReader( this.spec = readPartitionSpec(file);

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
RussellSpitzer commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633901791 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } +

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633910150 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -140,6 +142,30 @@ public StructType partitionType() { return lazyPartitionType; } + /**

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1633910949 ## core/src/main/java/org/apache/iceberg/ManifestReader.java: ## @@ -114,7 +114,7 @@ protected ManifestReader( this.spec = readPartitionSpec(file); }

Re: [PR] Add .java-version [iceberg]

2024-06-10 Thread via GitHub
szehon-ho commented on PR #10472: URL: https://github.com/apache/iceberg/pull/10472#issuecomment-2159421290 No problem, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Bump pypa/cibuildwheel from 2.18.1 to 2.19.0 [iceberg-python]

2024-06-10 Thread via GitHub
dependabot[bot] opened a new pull request, #805: URL: https://github.com/apache/iceberg-python/pull/805 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.1 to 2.19.0. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwhee

[PR] Bump griffe from 0.45.2 to 0.45.3 [iceberg-python]

2024-06-10 Thread via GitHub
dependabot[bot] opened a new pull request, #806: URL: https://github.com/apache/iceberg-python/pull/806 Bumps [griffe](https://github.com/mkdocstrings/griffe) from 0.45.2 to 0.45.3. Release notes Sourced from https://github.com/mkdocstrings/griffe/releases";>griffe's releases.

Re: [PR] Run Flink and Spark3 tests on Java 17 too [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10477: URL: https://github.com/apache/iceberg/pull/10477#discussion_r1633990025 ## .github/workflows/flink-ci.yml: ## @@ -70,8 +70,9 @@ jobs: flink-scala-2-12-tests: runs-on: ubuntu-22.04 strategy: + fail-fast: false ma

  1   2   >