Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-17 Thread via GitHub
Xuanwo commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1363344838 ## crates/iceberg/src/avro/schema.rs: ## @@ -96,7 +98,13 @@ impl SchemaVisitor for SchemaToAvroSchema { _struct: &StructType, Review Comment: Undetstood. T

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-17 Thread via GitHub
nastra commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767781293 > Additionally, Iceberg has a UUID type, which seems to be supported in Spark but not in Flink: https://github.com/apache/iceberg/pull/7496 So Spark itself doesn't support UUIDs as a

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-17 Thread via GitHub
fengjiajie commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767764825 > @fengjiajie: Checked the codepath for the Spark readers and I have 2 questions: > > * What about ORC and Avro files? Don't we have the same issue there? > * Would it worth t

Re: [PR] [1.4.x] AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) [iceberg]

2023-10-17 Thread via GitHub
nastra commented on PR #8843: URL: https://github.com/apache/iceberg/pull/8843#issuecomment-176772 Thanks everyone for the feedback. I'll go ahead and include this in the 1.4.1 RC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [1.4.x] AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) [iceberg]

2023-10-17 Thread via GitHub
nastra merged PR #8843: URL: https://github.com/apache/iceberg/pull/8843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] [1.4.x] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
nastra merged PR #8861: URL: https://github.com/apache/iceberg/pull/8861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[PR] fix: Add catalog pages to docs. [iceberg-docs]

2023-10-17 Thread via GitHub
liurenjie1024 opened a new pull request, #284: URL: https://github.com/apache/iceberg-docs/pull/284 Close https://github.com/apache/iceberg/issues/8850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Flink 1.17: Use awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra commented on PR #8852: URL: https://github.com/apache/iceberg/pull/8852#issuecomment-1767711923 thanks for reviewing @amogh-jahagirdar and @nk1506. @nk1506 would you be interested in backporting this to older Flink versions? -- This is an automated message from the Apache Git Servi

Re: [PR] Flink 1.17: Use awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra merged PR #8852: URL: https://github.com/apache/iceberg/pull/8852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] fix: avro bytes test for Literal [iceberg-rust]

2023-10-17 Thread via GitHub
liurenjie1024 commented on PR #80: URL: https://github.com/apache/iceberg-rust/pull/80#issuecomment-1767686793 > The PR looks good to me overall, but I'm slightly concerned about the use of `Intointo(literal)`. Will our users have to write code in this manner? Or does it occur internall

Re: [I] Distributed execution of DeleteReachableFilesSparkAction [iceberg]

2023-10-17 Thread via GitHub
tmnd1991 commented on issue #8862: URL: https://github.com/apache/iceberg/issues/8862#issuecomment-1767681093 Hi @RussellSpitzer , yes I mean having the delete operations distributed. I guess it’s difficult because you don’t know how many spark executor cores you might have in any given mom

Re: [PR] Nessie: retain authorship information when creating a namespace [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on PR #8857: URL: https://github.com/apache/iceberg/pull/8857#issuecomment-1767680941 > It also switches to Nessie API V2 for the commit operation. I didn't see this change (mentioned in PR description). It currently works for both v1 and v2 right? -- This is

Re: [PR] Nessie: retain authorship information when creating a namespace [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on PR #8857: URL: https://github.com/apache/iceberg/pull/8857#issuecomment-1767679787 cc: @snazy, @dimas-b -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Nessie: retain authorship information when creating a namespace [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1363197313 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,23 +181,38 @@ public IcebergTable table(TableIdentifier tableIdentifier) {

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-17 Thread via GitHub
liurenjie1024 commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1363213683 ## crates/iceberg/src/avro/schema.rs: ## @@ -96,7 +98,13 @@ impl SchemaVisitor for SchemaToAvroSchema { _struct: &StructType, Review Comment: The `_

Re: [PR] fix: avro bytes test for Literal [iceberg-rust]

2023-10-17 Thread via GitHub
ZENOTME commented on code in PR #80: URL: https://github.com/apache/iceberg-rust/pull/80#discussion_r1363200948 ## crates/iceberg/src/spec/values.rs: ## @@ -995,7 +995,7 @@ mod tests { assert_eq!(literal, expected_literal); let mut writer = apache_avro::Write

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-17 Thread via GitHub
pvary commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767664345 @fengjiajie: Checked the codepath for the Spark readers and I have 2 questions: - What about ORC and Avro files? Don't we have the same issue there? - Would it worth to add the same fi

Re: [PR] fix: avro bytes test for Literal [iceberg-rust]

2023-10-17 Thread via GitHub
Xuanwo commented on PR #80: URL: https://github.com/apache/iceberg-rust/pull/80#issuecomment-1767645545 The PR looks good to me overall, but I'm slightly concerned about the use of `Intointo(literal)`. Will our users have to write code in this manner? Or does it occur internally? --

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-17 Thread via GitHub
Xuanwo commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1363171863 ## crates/iceberg/src/avro/schema.rs: ## @@ -96,7 +98,13 @@ impl SchemaVisitor for SchemaToAvroSchema { _struct: &StructType, results: Vec, ) ->

[I] java.lang.IllegalArgumentException: requirement failed while read migrated parquet table [iceberg]

2023-10-17 Thread via GitHub
camper42 opened a new issue, #8863: URL: https://github.com/apache/iceberg/issues/8863 ### Apache Iceberg version 1.4.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 We have some irrationally partitioned parquet table, which have 3 le

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-17 Thread via GitHub
zhangminglei commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1363068688 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software

Re: [I] Query fails when executed without filter i.e. aggregate pushdown [iceberg]

2023-10-17 Thread via GitHub
huaxingao commented on issue #8859: URL: https://github.com/apache/iceberg/issues/8859#issuecomment-1767476206 @atifiu Do you have a program that can reproduce the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Build: Bump urllib3 from 1.26.17 to 1.26.18 [iceberg-python]

2023-10-17 Thread via GitHub
dependabot[bot] opened a new pull request, #84: URL: https://github.com/apache/iceberg-python/pull/84 Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.17 to 1.26.18. Release notes Sourced from https://github.com/urllib3/urllib3/releases";>urllib3's releases. 1.2

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-17 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1362965809 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [I] Add checks in create/alter table if table location conflicts [iceberg]

2023-10-17 Thread via GitHub
github-actions[bot] commented on issue #7238: URL: https://github.com/apache/iceberg/issues/7238#issuecomment-1767384670 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] should we throw the exception when we have no privilege to access the directory in findVersion method [iceberg]

2023-10-17 Thread via GitHub
github-actions[bot] closed issue #7285: should we throw the exception when we have no privilege to access the directory in findVersion method URL: https://github.com/apache/iceberg/issues/7285 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] should we throw the exception when we have no privilege to access the directory in findVersion method [iceberg]

2023-10-17 Thread via GitHub
github-actions[bot] commented on issue #7285: URL: https://github.com/apache/iceberg/issues/7285#issuecomment-1767384651 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] column define by NestedField.required for orc can still insert null [iceberg]

2023-10-17 Thread via GitHub
github-actions[bot] closed issue #7288: column define by NestedField.required for orc can still insert null URL: https://github.com/apache/iceberg/issues/7288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] column define by NestedField.required for orc can still insert null [iceberg]

2023-10-17 Thread via GitHub
github-actions[bot] commented on issue #7288: URL: https://github.com/apache/iceberg/issues/7288#issuecomment-1767384603 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Distributed execution of DeleteReachableFilesSparkAction [iceberg]

2023-10-17 Thread via GitHub
RussellSpitzer commented on issue #8862: URL: https://github.com/apache/iceberg/issues/8862#issuecomment-1767364918 It is distributed in file discovery. Do you mean have the deletes distributed? Previously we didn't want to do this because it's very difficult to control parallelism of delet

Re: [I] DeleteOrphanFiles or ExpireSnapshots outofmemory [iceberg]

2023-10-17 Thread via GitHub
RussellSpitzer commented on issue #3703: URL: https://github.com/apache/iceberg/issues/3703#issuecomment-1767362201 Not for broadcast, for that you just need to disable broadcast join in spark -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] DeleteOrphanFiles or ExpireSnapshots outofmemory [iceberg]

2023-10-17 Thread via GitHub
RLashofRegas commented on issue #3703: URL: https://github.com/apache/iceberg/issues/3703#issuecomment-1767331436 @RussellSpitzer You mentioned you were working on a patch that might affect this. Should I expect the issue I mentioned above to go away if we upgraded to a more recent version

Re: [PR] Make to_arrow function capable of handling parquet files with sanitized name due to Avro restirction [iceberg-python]

2023-10-17 Thread via GitHub
puchengy commented on PR #83: URL: https://github.com/apache/iceberg-python/pull/83#issuecomment-1767128295 @Fokko Can I get a review? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Spark 3.5: Fix specific field values treated as unequal while comparing rows for carry-over removal [iceberg]

2023-10-17 Thread via GitHub
flyrain commented on PR #8799: URL: https://github.com/apache/iceberg/pull/8799#issuecomment-1767090521 Hi @rdblue, yes, we don't have to include it in 1.4.1 if it is behavior change. I'd consider it more a bug fix though. But open for ideas. -- This is an automated message from the Apach

[PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar opened a new pull request, #8861: URL: https://github.com/apache/iceberg/pull/8861 Cherry pick #8860 onto the 1.4.x branch for 1.4.1 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
rdblue commented on PR #8860: URL: https://github.com/apache/iceberg/pull/8860#issuecomment-1767011681 Merged. Thanks for getting this read, @amogh-jahagirdar! For context, this catches bad metadata written by 1.4.0 and ignores it. This is needed if tables have bad split offsets in or

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
rdblue merged PR #8860: URL: https://github.com/apache/iceberg/pull/8860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Spark 3.5: Fix specific field values treated as unequal while comparing rows for carry-over removal [iceberg]

2023-10-17 Thread via GitHub
rdblue commented on PR #8799: URL: https://github.com/apache/iceberg/pull/8799#issuecomment-1767003787 What's the argument for including this in 1.4.1? It seems like a behavior change that isn't fixing a regression. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [1.4.x] AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) [iceberg]

2023-10-17 Thread via GitHub
singhpk234 commented on PR #8843: URL: https://github.com/apache/iceberg/pull/8843#issuecomment-1767000250 > It seems like a behavior change that would only be safe if the behavior is always the same when creating multiple credentials providers. checked the DEFAULT_CREDENTIALS_PROVIDE

Re: [PR] Spec: Add section on `null_value_counts` [iceberg]

2023-10-17 Thread via GitHub
Fokko commented on code in PR #8611: URL: https://github.com/apache/iceberg/pull/8611#discussion_r1362621678 ## format/spec.md: ## @@ -450,6 +451,48 @@ Notes: 2. For `float` and `double`, the value `-0.0` must precede `+0.0`, as in the IEEE 754 `totalOrder` predicate. NaNs are

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
singhpk234 commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362614788 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
singhpk234 commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362607781 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
singhpk234 commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362607781 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362567101 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362567101 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362567101 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Spark 3.5: Fix specific field values treated as unequal while comparing rows for carry-over removal [iceberg]

2023-10-17 Thread via GitHub
flyrain commented on PR #8799: URL: https://github.com/apache/iceberg/pull/8799#issuecomment-1766912688 cc @aokolnychyi @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-17 Thread via GitHub
stevenzwu commented on PR #8803: URL: https://github.com/apache/iceberg/pull/8803#issuecomment-1766844420 @pvary I think we probably want to push the `copyStatsForColumns` down to ManifestReader. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReade

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
singhpk234 commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362493799 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,16 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362490912 ## core/src/test/java/org/apache/iceberg/TableTestBase.java: ## @@ -110,7 +110,7 @@ public class TableTestBase { static final DataFile FILE_C = DataF

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362487232 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] [1.4.x] AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) [iceberg]

2023-10-17 Thread via GitHub
stevenzwu commented on PR #8843: URL: https://github.com/apache/iceberg/pull/8843#issuecomment-1766827281 > would only be safe if the behavior is always the same when creating multiple credentials providers. @rdblue I think it is a safe change from a static global singleton to creati

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362487232 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362467796 ## core/src/test/java/org/apache/iceberg/TableTestBase.java: ## @@ -110,7 +110,7 @@ public class TableTestBase { static final DataFile FILE_C = DataF

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362467796 ## core/src/test/java/org/apache/iceberg/TableTestBase.java: ## @@ -110,7 +110,7 @@ public class TableTestBase { static final DataFile FILE_C = DataF

Re: [PR] [1.4.x] AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) [iceberg]

2023-10-17 Thread via GitHub
nastra commented on PR #8843: URL: https://github.com/apache/iceberg/pull/8843#issuecomment-1766801596 > @nastra and @singhpk234, is this safe for a patch release? It seems like a behavior change that would only be safe if the behavior is always the same when creating multiple credentials p

Re: [I] Replace Thread.sleep() usage in test code with Awaitility [iceberg]

2023-10-17 Thread via GitHub
nastra commented on issue #7154: URL: https://github.com/apache/iceberg/issues/7154#issuecomment-1766792091 I've opened https://github.com/apache/iceberg/pull/8853 and https://github.com/apache/iceberg/pull/8852 to give an idea about places that are good candidates to replace with Awaitilit

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362445708 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1362441102 ## aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbLockManager.java: ## @@ -141,11 +142,8 @@ public void testAcquireSingleProcess() throws Exception

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1362431071 ## aws/src/integration/java/org/apache/iceberg/aws/TestAssumeRoleAwsClientFactory.java: ## @@ -189,7 +192,17 @@ public void testAssumeRoleS3FileIO() throws Exception {

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1362429576 ## aws/src/integration/java/org/apache/iceberg/aws/TestAssumeRoleAwsClientFactory.java: ## @@ -189,7 +192,17 @@ public void testAssumeRoleS3FileIO() throws Exception {

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1362427190 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -62,6 +70,54 @@ public static long waitUntilAfter(long timestampMillis) { return current; } +

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1362428279 ## api/src/test/java/org/apache/iceberg/metrics/TestDefaultTimer.java: ## @@ -101,14 +103,7 @@ public void closeableTimer() throws InterruptedException { @Test pub

Re: [PR] [1.4.x] Flink: Reverting the default custom partitioner for bucket column (#8848) [iceberg]

2023-10-17 Thread via GitHub
nastra merged PR #8858: URL: https://github.com/apache/iceberg/pull/8858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362421006 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362421006 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
bryanck commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362419546 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +//

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
bryanck commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362417563 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,12 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +//

Re: [PR] Python: Add support for Python 3.12 [iceberg-python]

2023-10-17 Thread via GitHub
steinsgateted commented on PR #35: URL: https://github.com/apache/iceberg-python/pull/35#issuecomment-1766760247 @jayceslesar Thank you for the information -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1362413695 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362408168 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,11 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets()

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
bryanck commented on code in PR #8860: URL: https://github.com/apache/iceberg/pull/8860#discussion_r1362407399 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -460,6 +460,11 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { +//

Re: [PR] Core: Ignore split offsets when the last split offset is past the file length [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on PR #8860: URL: https://github.com/apache/iceberg/pull/8860#issuecomment-1766749771 cc @bryanck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-17 Thread via GitHub
stevenzwu commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1362401859 ## api/src/main/java/org/apache/iceberg/ContentFile.java: ## @@ -177,4 +191,26 @@ default Long fileSequenceNumber() { default F copy(boolean withStats) { retu

Re: [PR] [1.4.x] Core: Do not use a lazy split offset list in manifests (#8834) [iceberg]

2023-10-17 Thread via GitHub
rdblue merged PR #8845: URL: https://github.com/apache/iceberg/pull/8845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-17 Thread via GitHub
nastra closed issue #8847: Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution URL: https://github.com/apache/iceberg/issues/8847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-17 Thread via GitHub
nastra commented on issue #8847: URL: https://github.com/apache/iceberg/issues/8847#issuecomment-1766722846 Closing this as #8848 has been merged to main and I backported it to 1.4.x in https://github.com/apache/iceberg/pull/8858 -- This is an automated message from the Apache Git Service

[PR] Flink: Reverting the default custom partitioner for bucket column (#8848) [iceberg]

2023-10-17 Thread via GitHub
nastra opened a new pull request, #8858: URL: https://github.com/apache/iceberg/pull/8858 This backports #8848 to 1.4.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Flink: Reverting the default custom partitioner for bucket column [iceberg]

2023-10-17 Thread via GitHub
stevenzwu merged PR #8848: URL: https://github.com/apache/iceberg/pull/8848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Flink 1.17: Use awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8852: URL: https://github.com/apache/iceberg/pull/8852#discussion_r1362375826 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java: ## @@ -111,14 +113,19 @@ public void testConsumeWithoutStartSnapsho

Re: [PR] Flink 1.17: Use awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8852: URL: https://github.com/apache/iceberg/pull/8852#discussion_r1362315314 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceFailover.java: ## @@ -98,9 +98,9 @@ protected List generateRecords(int numRecords, lo

[PR] Nessie: retain authorship information when creating a namespace [iceberg]

2023-10-17 Thread via GitHub
adutra opened a new pull request, #8857: URL: https://github.com/apache/iceberg/pull/8857 This change enhances the process of creating new namespaces by retaining commit authorship information when committing the new namespace. It also switches to Nessie API V2 for the commit operatio

Re: [PR] API, Core: Add uuid() to View [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8851: URL: https://github.com/apache/iceberg/pull/8851#discussion_r1362309604 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -97,4 +98,9 @@ public ReplaceViewVersion replaceVersion() { public UpdateLocation updateLocation()

Re: [PR] Spark 3.5: Use Awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra merged PR #8853: URL: https://github.com/apache/iceberg/pull/8853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Spark 3.5: Use Awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nastra commented on code in PR #8853: URL: https://github.com/apache/iceberg/pull/8853#discussion_r1362301067 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/SparkSQLExecutionHelper.java: ## @@ -42,28 +45,26 @@ public static String lastExecutedMetricValue(Spark

[I] Improve `All` Metadata Tables with Snapshot Information [iceberg]

2023-10-17 Thread via GitHub
RussellSpitzer opened a new issue, #8856: URL: https://github.com/apache/iceberg/issues/8856 ### Feature Request / Improvement Currently all versions of metadata tables have the exact same schema as their not "all" versions. This is actually not very useful if you are attempting to l

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-17 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1362161257 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`array` of

Re: [PR] API, Core: Add uuid() to View [iceberg]

2023-10-17 Thread via GitHub
nk1506 commented on code in PR #8851: URL: https://github.com/apache/iceberg/pull/8851#discussion_r1362126880 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -97,4 +98,9 @@ public ReplaceViewVersion replaceVersion() { public UpdateLocation updateLocation()

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar closed pull request #8854: Spark: Fix Fast forward procedure output for non-main branches URL: https://github.com/apache/iceberg/pull/8854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Flink 1.17: Use awaitility instead of Thread.sleep() [iceberg]

2023-10-17 Thread via GitHub
nk1506 commented on code in PR #8852: URL: https://github.com/apache/iceberg/pull/8852#discussion_r1362111563 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java: ## @@ -111,14 +113,19 @@ public void testConsumeWithoutStartSnapsho

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on PR #8854: URL: https://github.com/apache/iceberg/pull/8854#issuecomment-1766410898 logged the flaky test: #8855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] API, Core: Add uuid() to View [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on code in PR #8851: URL: https://github.com/apache/iceberg/pull/8851#discussion_r1362101911 ## api/src/main/java/org/apache/iceberg/view/View.java: ## @@ -111,4 +112,13 @@ default ReplaceViewVersion replaceVersion() { default UpdateLocation updateLocat

[I] Flaky test: TestSparkReaderDeletes.testEqualityDeleteWithDeletedColumn [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat opened a new issue, #8855: URL: https://github.com/apache/iceberg/issues/8855 TestSparkReaderDeletes > [format = orc, vectorized = false, planningMode = DISTRIBUTED] > testEqualityDeleteWithDeletedColumn PR:8854 Build: https://github.com/apache/iceberg/actions/run

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-17 Thread via GitHub
ZENOTME commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1362079101 ## crates/iceberg/src/spec/manifest.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-10-17 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1766378641 Yes it would! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
rakesh-das08 commented on code in PR #8854: URL: https://github.com/apache/iceberg/pull/8854#discussion_r1362071081 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java: ## @@ -77,9 +77,9 @@ public InternalRow[] call(InternalRow a

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on code in PR #8854: URL: https://github.com/apache/iceberg/pull/8854#discussion_r1362069529 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java: ## @@ -77,9 +77,9 @@ public InternalRow[] call(InternalRow a

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #8854: URL: https://github.com/apache/iceberg/pull/8854#discussion_r1362063516 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java: ## @@ -77,9 +77,9 @@ public InternalRow[] call(InternalR

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on code in PR #8854: URL: https://github.com/apache/iceberg/pull/8854#discussion_r1362051079 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/FastForwardBranchProcedure.java: ## @@ -77,9 +77,9 @@ public InternalRow[] call(InternalRow a

Re: [PR] Spark: Fix Fast forward procedure output for non-main branches [iceberg]

2023-10-17 Thread via GitHub
ajantha-bhat commented on code in PR #8854: URL: https://github.com/apache/iceberg/pull/8854#discussion_r1362035790 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestFastForwardBranchProcedure.java: ## @@ -188,4 +188,38 @@ public void testInval

  1   2   >