Re: [PR] Fix: Set Numpy Version Upper Limit `numpy = { version = "^1.22.4", optional = true }` [iceberg-python]

2024-07-22 Thread via GitHub
Fokko commented on code in PR #951: URL: https://github.com/apache/iceberg-python/pull/951#discussion_r1686074065 ## pyproject.toml: ## @@ -61,7 +61,10 @@ tenacity = ">=8.2.3,<9.0.0" pyarrow = { version = ">=9.0.0,<18.0.0", optional = true } pandas = { version = ">=1.0.0,<3.0.

Re: [PR] Build: Bump mkdocs-material from 9.5.28 to 9.5.29 [iceberg]

2024-07-22 Thread via GitHub
Fokko merged PR #10734: URL: https://github.com/apache/iceberg/pull/10734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Build: Bump org.roaringbitmap:RoaringBitmap from 1.2.0 to 1.2.1 [iceberg]

2024-07-22 Thread via GitHub
Fokko merged PR #10733: URL: https://github.com/apache/iceberg/pull/10733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.26.20 to 2.26.21 [iceberg]

2024-07-22 Thread via GitHub
Fokko merged PR #10729: URL: https://github.com/apache/iceberg/pull/10729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Build: Bump io.netty:netty-buffer from 4.1.111.Final to 4.1.112.Final [iceberg]

2024-07-22 Thread via GitHub
Fokko merged PR #10726: URL: https://github.com/apache/iceberg/pull/10726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Build: Bump orc from 1.9.3 to 1.9.4 [iceberg]

2024-07-22 Thread via GitHub
Fokko merged PR #10728: URL: https://github.com/apache/iceberg/pull/10728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Fix: Set Numpy Version Upper Limit `numpy = { version = "^1.22.4", optional = true }` [iceberg-python]

2024-07-22 Thread via GitHub
HonahX merged PR #951: URL: https://github.com/apache/iceberg-python/pull/951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Bump setuptools from 69.1.0 to 70.0.0 [iceberg-python]

2024-07-22 Thread via GitHub
dependabot[bot] commented on PR #930: URL: https://github.com/apache/iceberg-python/pull/930#issuecomment-2242293691 Looks like setuptools is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Bump setuptools from 69.1.0 to 70.0.0 [iceberg-python]

2024-07-22 Thread via GitHub
dependabot[bot] closed pull request #930: Bump setuptools from 69.1.0 to 70.0.0 URL: https://github.com/apache/iceberg-python/pull/930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.28.0 to 2.29.2 [iceberg]

2024-07-22 Thread via GitHub
nastra merged PR #10731: URL: https://github.com/apache/iceberg/pull/10731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Flink: Migrate remaining classes to JUnit5 [iceberg]

2024-07-22 Thread via GitHub
tomtongue commented on code in PR #10684: URL: https://github.com/apache/iceberg/pull/10684#discussion_r1686114529 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestCompressionSettings.java: ## @@ -91,19 +94,36 @@ public void testCompressionAvro() throws Excep

Re: [PR] Flink: Migrate remaining classes to JUnit5 [iceberg]

2024-07-22 Thread via GitHub
tomtongue commented on code in PR #10684: URL: https://github.com/apache/iceberg/pull/10684#discussion_r1686114970 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestCompressionSettings.java: ## @@ -91,19 +94,36 @@ public void testCompressionAvro() throws Excep

Re: [PR] Flink: Migrate remaining classes to JUnit5 [iceberg]

2024-07-22 Thread via GitHub
nastra merged PR #10684: URL: https://github.com/apache/iceberg/pull/10684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-07-22 Thread via GitHub
nastra commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1686170249 ## open-api/rest-catalog-open-api.py: ## @@ -441,6 +441,30 @@ class AssertViewUUID(BaseModel): uuid: str +class AzureCredentials(BaseModel): +account_name

Re: [I] Unable to use GlueCatalog in flink environments without hadoop [iceberg]

2024-07-22 Thread via GitHub
RoeeDev commented on issue #3044: URL: https://github.com/apache/iceberg/issues/3044#issuecomment-2242394813 @mgmarino - I am also trying to use your solution in my pyFlink application running on managed Flink, but the only thing I can't understand yet is - how do I incorporate the `HadoopU

Re: [PR] Flink: Migrate remaining classes to JUnit5 [iceberg]

2024-07-22 Thread via GitHub
tomtongue commented on PR #10684: URL: https://github.com/apache/iceberg/pull/10684#issuecomment-2242396006 Thanks for the review! Will submit a backport PR, then the Flink migration will be complete. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-07-22 Thread via GitHub
nastra commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1686287333 ## open-api/rest-catalog-open-api.yaml: ## @@ -2747,6 +2747,54 @@ components: uuid: type: string +AzureCredentials: Review Comment: there

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-22 Thread via GitHub
jeesou commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1686321521 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [I] Serialization of the org.apache.iceberg.io.WriteResult class. [iceberg]

2024-07-22 Thread via GitHub
pvary commented on issue #10710: URL: https://github.com/apache/iceberg/issues/10710#issuecomment-2242675737 With a well configured IcebergSink, the number of `WriteResults` are quite low compared to the number of records, we did not spent the resources on writing the serializer/deserialize

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
pvary commented on code in PR #10548: URL: https://github.com/apache/iceberg/pull/10548#discussion_r1686382506 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSpecExecSupport.java: ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
pvary commented on code in PR #10548: URL: https://github.com/apache/iceberg/pull/10548#discussion_r1686382902 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSpecExecSupport.java: ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
pvary commented on PR #10548: URL: https://github.com/apache/iceberg/pull/10548#issuecomment-2242713133 @venkata91: Nicely done, just a few small nits, and then I am fine with the change. I will give @stevenzwu a few days, if he wants to chime in. If not, then we could move forward.

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686399659 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -136,30 +169,33 @@ private boolean checkTasks() { } } - return !clos

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686399659 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -136,30 +169,33 @@ private boolean checkTasks() { } } - return !clos

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686401382 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -136,30 +169,33 @@ private boolean checkTasks() { } } - return !clos

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686408402 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -20,84 +20,117 @@ import java.io.Closeable; import java.io.IOException; +import java.io

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686424317 ## core/src/test/java/org/apache/iceberg/util/TestParallelIterable.java: ## @@ -133,6 +140,47 @@ public CloseableIterator iterator() { .untilAsserted(() -> a

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
pvary commented on code in PR #10548: URL: https://github.com/apache/iceberg/pull/10548#discussion_r1686427190 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSpecExecSupport.java: ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundat

[I] Query on nested struct field with PyIceberg? [iceberg-python]

2024-07-22 Thread via GitHub
cfrancois7 opened a new issue, #953: URL: https://github.com/apache/iceberg-python/issues/953 ### Question I'm looking for a tutorial to make a query on one subfield of a struct field. I scrolled all internet but failed to find a way to do it simply with pyiceberg. To make i

Re: [I] Remaining Kafka Connect sink tasks [iceberg]

2024-07-22 Thread via GitHub
ajantha-bhat commented on issue #10740: URL: https://github.com/apache/iceberg/issues/10740#issuecomment-2242792614 I think it is better to have individual issue trackers for each feature (with `KAFAKACONNECT` label). So, we can close them when each task is completed. -- This is an aut

Re: [I] Detecting duplicates in the Flink Data Stream API [iceberg]

2024-07-22 Thread via GitHub
lkokhreidze commented on issue #10683: URL: https://github.com/apache/iceberg/issues/10683#issuecomment-2242798889 Hi @pvary thanks for the reply. I do not know internals of Paimon, if it inserts both rows or not. But from the reader perspective, only the first row will be visible. Behav

Re: [PR] Flink-1.19: Fix the file offset mismatch when Flink reader first seek… [iceberg]

2024-07-22 Thread via GitHub
pvary commented on PR #10567: URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2242800689 @zhongyujiang: Do I understand correctly, that the issue happens when the following conditions are met: - We have at least 3 FileScanTasks (FS1, FS2, FS3) to read - We have a filter

Re: [PR] Flink: Fix duplicate data in Flink's upsert writer for format V2 [iceberg]

2024-07-22 Thread via GitHub
zhongqishang commented on code in PR #10526: URL: https://github.com/apache/iceberg/pull/10526#discussion_r1686446191 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -426,30 +425,44 @@ private void commitOperation( } @Ov

Re: [PR] Kafka Connect: Runtime distribution and integration tests [iceberg]

2024-07-22 Thread via GitHub
ajantha-bhat commented on PR #10739: URL: https://github.com/apache/iceberg/pull/10739#issuecomment-2242818777 > This adds building of runtime distributions plus some integration tests Can we please split into two PRs. one for adding runtime distributions and one for integration tests

Re: [I] Detecting duplicates in the Flink Data Stream API [iceberg]

2024-07-22 Thread via GitHub
pvary commented on issue #10683: URL: https://github.com/apache/iceberg/issues/10683#issuecomment-2242830276 Thanks @lkokhreidze! Currently there is no such thing in the Flink Iceberg Sink. You need to build your own operator for it. -- This is an automated message from the Apache Git

Re: [PR] Kafka Connect: Runtime distribution and integration tests [iceberg]

2024-07-22 Thread via GitHub
bryanck commented on PR #10739: URL: https://github.com/apache/iceberg/pull/10739#issuecomment-2242960006 The integration tests depend on the runtime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix: Set Numpy Version Upper Limit `numpy = { version = "^1.22.4", optional = true }` [iceberg-python]

2024-07-22 Thread via GitHub
syun64 commented on code in PR #951: URL: https://github.com/apache/iceberg-python/pull/951#discussion_r1686558332 ## pyproject.toml: ## @@ -61,7 +61,10 @@ tenacity = ">=8.2.3,<9.0.0" pyarrow = { version = ">=9.0.0,<18.0.0", optional = true } pandas = { version = ">=1.0.0,<3.0

Re: [PR] Fix: Set Numpy Version Upper Limit `numpy = { version = "^1.22.4", optional = true }` [iceberg-python]

2024-07-22 Thread via GitHub
Fokko commented on code in PR #951: URL: https://github.com/apache/iceberg-python/pull/951#discussion_r1686567665 ## pyproject.toml: ## @@ -61,7 +61,10 @@ tenacity = ">=8.2.3,<9.0.0" pyarrow = { version = ">=9.0.0,<18.0.0", optional = true } pandas = { version = ">=1.0.0,<3.0.

Re: [PR] Fix: Set Numpy Version Upper Limit `numpy = { version = "^1.22.4", optional = true }` [iceberg-python]

2024-07-22 Thread via GitHub
syun64 commented on code in PR #951: URL: https://github.com/apache/iceberg-python/pull/951#discussion_r1686574599 ## pyproject.toml: ## @@ -61,7 +61,10 @@ tenacity = ">=8.2.3,<9.0.0" pyarrow = { version = ">=9.0.0,<18.0.0", optional = true } pandas = { version = ">=1.0.0,<3.0

Re: [PR] Kafka Connect: Runtime distribution and integration tests [iceberg]

2024-07-22 Thread via GitHub
ajantha-bhat commented on PR #10739: URL: https://github.com/apache/iceberg/pull/10739#issuecomment-2243015462 > The integration tests depend on the runtime. I know. We can review and merge the runtime PR first and then rebase the integration test PR? -- This is an automated messa

Re: [PR] Kafka Connect: Runtime distribution and integration tests [iceberg]

2024-07-22 Thread via GitHub
bryanck commented on PR #10739: URL: https://github.com/apache/iceberg/pull/10739#issuecomment-2243024192 The integration tests are testing the runtime, so I feel they belong together. Also, the runtime build is just the `build.gradle` so this seems unneccessary to me. -- This is an auto

Re: [PR] Flink-1.19: Fix the file offset mismatch when Flink reader first seek… [iceberg]

2024-07-22 Thread via GitHub
zhongyujiang commented on PR #10567: URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2243038965 Hi @pvary Thanks for reviewing. I think the issue here is somewhat different from what you understand. > We have at least 3 FileScanTasks (FS1, FS2, FS3) to read > W

Re: [PR] Flink-1.19: Fix the file offset mismatch when Flink reader first seek… [iceberg]

2024-07-22 Thread via GitHub
zhongyujiang commented on PR #10567: URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2243048745 > I can see 2 ways to fix this: > > Count every file in the fileOffset - even the ones which are skipped. This seems more natural to me, but the state need to be converted

Re: [PR] Flink-1.19: Fix the file offset mismatch when Flink reader first seek… [iceberg]

2024-07-22 Thread via GitHub
zhongyujiang commented on PR #10567: URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2243074141 > Edit: Would it be possible to create an e2e like unit test to simulate the issue? It might be easier to understand the issue, or debug. Unfortunately, I am unsure how to cr

Re: [I] Publish Iceberg kafka connect runtime to Confluet hub [iceberg]

2024-07-22 Thread via GitHub
jbonofre commented on issue #10745: URL: https://github.com/apache/iceberg/issues/10745#issuecomment-2243116628 It makes sense to me. I would also include the Kafka connect artifact on https://iceberg.apache.org/releases/ -- This is an automated message from the Apache Git Service. To res

Re: [PR] Spec: Clarify time travel implementation in Iceberg [iceberg]

2024-07-22 Thread via GitHub
dimas-b commented on code in PR #8982: URL: https://github.com/apache/iceberg/pull/8982#discussion_r1686695087 ## format/spec.md: ## @@ -1370,3 +1370,16 @@ Writing v2 metadata: * `sort_columns` was removed Note that these requirements apply when writing data to a v2 tabl

Re: [I] Remaining Kafka Connect sink tasks [iceberg]

2024-07-22 Thread via GitHub
bryanck commented on issue #10740: URL: https://github.com/apache/iceberg/issues/10740#issuecomment-2243249790 Thanks @nk1506 that would be great! There definitely are some opportunities. This is just a placeholder issue to note the big ticket items, I'll create individual tasks soon for so

[I] table.delete()/overwrite() with null values in table and with non-null filter will delete null rows [iceberg-python]

2024-07-22 Thread via GitHub
jqin61 opened a new issue, #954: URL: https://github.com/apache/iceberg-python/issues/954 ### Apache Iceberg version None ### Please describe the bug 🐞 Hi I added this test which breaks: ``` def test_delete_overwrite_with_null(session_catalog: RestCatalog) -> None:

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686777206 ## core/src/test/java/org/apache/iceberg/util/TestParallelIterable.java: ## @@ -133,6 +140,47 @@ public CloseableIterator iterator() { .untilAsserted(

Re: [I] table.delete()/overwrite() with null values in table and with non-null filter will delete null rows [iceberg-python]

2024-07-22 Thread via GitHub
syun64 commented on issue #954: URL: https://github.com/apache/iceberg-python/issues/954#issuecomment-2243289640 Hi @jqin61 - this looks like a critical issue that should be fixed for the 0.7.0. Thank you very much for flagging this issue and starting to work on the fix!! -- This

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1686808627 ## .palantir/revapi.yml: ## @@ -874,6 +874,10 @@ acceptedBreaks: justification: "Static utility class - should not have public constructor" "1.4.0":

Re: [I] Formal verification discovers potential consistency issue [iceberg]

2024-07-22 Thread via GitHub
Vanlightly commented on issue #10720: URL: https://github.com/apache/iceberg/issues/10720#issuecomment-2243351688 @amogh-jahagirdar I don't see a way of running a delete operation and specifying the VALIDATE_FROM_SNAPSHOT option. The Spark dataframe API allows me to set the option but doesn

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1686819845 ## core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1686819845 ## core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Api, Build: Fix typo in comments in `Table` and `gradlew` [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar merged PR #10744: URL: https://github.com/apache/iceberg/pull/10744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] UpdatePartitionSpec: Added ability to not set the new partition spec as default [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on PR #10736: URL: https://github.com/apache/iceberg/pull/10736#issuecomment-2243378858 My big question here is what is the value of adding the Spec before you are ready to write data? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Core: Add DataFiles builder API to enable users to specify their own custom conversion logic for string partition values [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10724: URL: https://github.com/apache/iceberg/pull/10724#discussion_r1686839951 ## core/src/main/java/org/apache/iceberg/DataFiles.java: ## @@ -259,11 +268,19 @@ public Builder withFileSizeInBytes(long newFileSizeInBytes) { }

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686847688 ## core/src/test/java/org/apache/iceberg/util/TestParallelIterable.java: ## @@ -133,6 +140,47 @@ public CloseableIterator iterator() { .untilAsserted(() -> a

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686853441 ## core/src/test/java/org/apache/iceberg/util/TestParallelIterable.java: ## @@ -133,6 +140,47 @@ public CloseableIterator iterator() { .untilAsserted(

Re: [I] Remaining Kafka Connect sink tasks [iceberg]

2024-07-22 Thread via GitHub
nk1506 commented on issue #10740: URL: https://github.com/apache/iceberg/issues/10740#issuecomment-2243412771 Thanks @bryanck , By any chance are we planning to consider adding `upsert` support here too? I saw few thread on the slack where there was mixed feedback about `upsert` performance

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
RussellSpitzer commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1686864538 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Flink: parameterize Flink table source tests to test both old and FLIP-27 source implementations [iceberg]

2024-07-22 Thread via GitHub
stevenzwu merged PR #10741: URL: https://github.com/apache/iceberg/pull/10741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Flink: parameterize Flink table source tests to test both old and FLIP-27 source implementations [iceberg]

2024-07-22 Thread via GitHub
stevenzwu commented on code in PR #10741: URL: https://github.com/apache/iceberg/pull/10741#discussion_r1686874797 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkSourceConfig.java: ## @@ -46,8 +47,11 @@ public void testFlinkHintConfig() { assert

Re: [PR] Flink: parameterize Flink table source tests to test both old and FLIP-27 source implementations [iceberg]

2024-07-22 Thread via GitHub
stevenzwu commented on PR #10741: URL: https://github.com/apache/iceberg/pull/10741#issuecomment-2243434588 thanks @nastra and @pvary for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Core: Add DataFiles builder API to enable users to specify their own custom conversion logic for string partition values [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10724: URL: https://github.com/apache/iceberg/pull/10724#discussion_r1686880606 ## core/src/main/java/org/apache/iceberg/DataFiles.java: ## @@ -259,11 +268,19 @@ public Builder withFileSizeInBytes(long newFileSizeInBytes) { }

Re: [PR] Core: Add DataFiles builder API to enable users to specify their own custom conversion logic for string partition values [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10724: URL: https://github.com/apache/iceberg/pull/10724#discussion_r1686880606 ## core/src/main/java/org/apache/iceberg/DataFiles.java: ## @@ -259,11 +268,19 @@ public Builder withFileSizeInBytes(long newFileSizeInBytes) { }

Re: [PR] UpdatePartitionSpec: Added ability to not set the new partition spec as default [iceberg]

2024-07-22 Thread via GitHub
shanielh commented on PR #10736: URL: https://github.com/apache/iceberg/pull/10736#issuecomment-2243447797 > My big question here is what is the value of adding the Spec before you are ready to write data? I don't think it matters whether I add the spec just before writing to it or p

Re: [I] Remaining Kafka Connect sink tasks [iceberg]

2024-07-22 Thread via GitHub
bryanck commented on issue #10740: URL: https://github.com/apache/iceberg/issues/10740#issuecomment-2243449689 That's the hope. It will likely involve a discussion with the community first, as there are some performance considerations. -- This is an automated message from the Apache Git S

Re: [PR] Core: Add DataFiles builder API to enable users to specify their own custom conversion logic for string partition values [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10724: URL: https://github.com/apache/iceberg/pull/10724#discussion_r1686880606 ## core/src/main/java/org/apache/iceberg/DataFiles.java: ## @@ -259,11 +268,19 @@ public Builder withFileSizeInBytes(long newFileSizeInBytes) { }

Re: [PR] Api, Build: Fix typo in comments in `Table` and `gradlew` [iceberg]

2024-07-22 Thread via GitHub
hantangwangd commented on PR #10744: URL: https://github.com/apache/iceberg/pull/10744#issuecomment-2243530601 @amogh-jahagirdar My pleasure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1686972022 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1686972022 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1686972022 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Spec: Clarify time travel implementation in Iceberg [iceberg]

2024-07-22 Thread via GitHub
emkornfield commented on code in PR #8982: URL: https://github.com/apache/iceberg/pull/8982#discussion_r1686982629 ## format/spec.md: ## @@ -1370,3 +1370,16 @@ Writing v2 metadata: * `sort_columns` was removed Note that these requirements apply when writing data to a v2

Re: [PR] Spec: Clarify time travel implementation in Iceberg [iceberg]

2024-07-22 Thread via GitHub
emkornfield commented on code in PR #8982: URL: https://github.com/apache/iceberg/pull/8982#discussion_r1686992661 ## format/spec.md: ## @@ -1370,3 +1370,16 @@ Writing v2 metadata: * `sort_columns` was removed Note that these requirements apply when writing data to a v2

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
venkata91 commented on code in PR #10548: URL: https://github.com/apache/iceberg/pull/10548#discussion_r1686993453 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSpecExecSupport.java: ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-22 Thread via GitHub
venkata91 commented on PR #10548: URL: https://github.com/apache/iceberg/pull/10548#issuecomment-2243603565 @pvary should this change be ported to other flink versions like `1.17` and `1.18`? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-07-22 Thread via GitHub
stevenzwu commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1687013697 ## .palantir/revapi.yml: ## @@ -874,6 +874,10 @@ acceptedBreaks: justification: "Static utility class - should not have public constructor" "1.4.0":

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-07-22 Thread via GitHub
stevenzwu commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1687016293 ## core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #10711: URL: https://github.com/apache/iceberg/pull/10711#discussion_r1687030827 ## api/src/main/java/org/apache/iceberg/io/SupportsRecoveryOperations.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API: Add SupportsRecoveryOperations mixin for FileIO [iceberg]

2024-07-22 Thread via GitHub
amogh-jahagirdar commented on PR #10711: URL: https://github.com/apache/iceberg/pull/10711#issuecomment-2243643379 I'll leave this up for a bit, as I carry forward https://github.com/apache/iceberg/pull/10445 maybe it'll be helpful to have more of that implementation in place before we merg

Re: [PR] Kafka Connect: Runtime distribution with integration tests [iceberg]

2024-07-22 Thread via GitHub
danielcweeks commented on code in PR #10739: URL: https://github.com/apache/iceberg/pull/10739#discussion_r1686952595 ## kafka-connect/kafka-connect-runtime/src/main/resources/manifest.json: ## @@ -0,0 +1,47 @@ +{ + "title": "Apache Iceberg Sink Connector", + "name": "iceberg-

Re: [PR] Kafka Connect: Docs on configuring the sink [iceberg]

2024-07-22 Thread via GitHub
danielcweeks commented on code in PR #10746: URL: https://github.com/apache/iceberg/pull/10746#discussion_r1687067850 ## docs/docs/kafka-connect.md: ## @@ -0,0 +1,354 @@ +--- +title: "Kafka Connect" +--- + + +# Kafka Connect + +[Kafka Connect](https://docs.confluent.io/platform/

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1687077960 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -136,30 +169,33 @@ private boolean checkTasks() { } } - return !close

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
rdblue merged PR #10691: URL: https://github.com/apache/iceberg/pull/10691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on PR #10691: URL: https://github.com/apache/iceberg/pull/10691#issuecomment-2243728470 Thanks, @findepi! Good work finding a solution here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1687083709 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -20,84 +20,117 @@ import java.io.Closeable; import java.io.IOException; +import java.io.

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687095160 ## open-api/rest-catalog-open-api.yaml: ## @@ -3809,6 +4150,41 @@ components: } } +# Note that this is a representative example response f

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687097927 ## open-api/rest-catalog-open-api.yaml: ## @@ -541,6 +541,130 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}/nam

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687105937 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3786,173 @@ components: type: integer description: "List of equality field IDs" +Pre

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687105937 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3786,173 @@ components: type: integer description: "List of equality field IDs" +Pre

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687110606 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3818,176 @@ components: type: integer description: "List of equality field IDs" +Pre

Re: [I] Query on nested struct field with PyIceberg? [iceberg-python]

2024-07-22 Thread via GitHub
kevinjqliu commented on issue #953: URL: https://github.com/apache/iceberg-python/issues/953#issuecomment-2243766926 I was able to reproduce this on latest main branch. Example: ``` from pyiceberg.catalog.sql import SqlCatalog import pyarrow as pa schema = pa.schema([

Re: [I] Query on nested struct field with PyIceberg? [iceberg-python]

2024-07-22 Thread via GitHub
kevinjqliu commented on issue #953: URL: https://github.com/apache/iceberg-python/issues/953#issuecomment-2243774010 The issue might be in `_parse_row_filter` function ``` (Pdb) _parse_row_filter("employment = 'Employed'") EqualTo(term=Reference(name='employment'), literal=litera

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687128848 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3818,176 @@ components: type: integer description: "List of equality field IDs" +Pre

Re: [I] Query on nested struct field with PyIceberg? [iceberg-python]

2024-07-22 Thread via GitHub
kevinjqliu commented on issue #953: URL: https://github.com/apache/iceberg-python/issues/953#issuecomment-2243789428 Specifically in the parsing code ``` from pyiceberg.expressions.parser import parse parse("employment.status = 'Employed'") # > EqualTo(term=Reference(name='sta

Re: [PR] Flink: handle rescale properly and refactor statistics [iceberg]

2024-07-22 Thread via GitHub
stevenzwu merged PR #10457: URL: https://github.com/apache/iceberg/pull/10457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Flink: handle rescale properly and refactor statistics [iceberg]

2024-07-22 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2243805443 thanks @pvary for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687132205 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3818,176 @@ components: type: integer description: "List of equality field IDs" +Pre

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-07-22 Thread via GitHub
rdblue commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687138776 ## open-api/rest-catalog-open-api.yaml: ## @@ -3647,6 +3818,176 @@ components: type: integer description: "List of equality field IDs" +Pre

  1   2   >