Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-16 Thread via GitHub
mudit-97 commented on PR #9479: URL: https://github.com/apache/iceberg/pull/9479#issuecomment-1893233221 @Fokko @nastra can you please check this once or tag relevant folks if possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Spark: Support renaming views [iceberg]

2024-01-16 Thread via GitHub
nastra commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1453069623 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -635,6 +633,118 @@ private Catalog tableCatalog() { return Sp

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-16 Thread via GitHub
nastra commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1453073856 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/HijackViewCommands.scala: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Softwar

Re: [I] Flaky test: TestSparkReaderDeletes.testEqualityDeleteWithDeletedColumn [iceberg]

2024-01-16 Thread via GitHub
manuzhang commented on issue #8855: URL: https://github.com/apache/iceberg/issues/8855#issuecomment-1893280727 @ajantha-bhat please help review #9445 to fix this test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Core, Spark: Migrate tests that depend on ScanTestBase to JUnit5 [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9416: URL: https://github.com/apache/iceberg/pull/9416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Build: Add `iceberg-bom` artifact [iceberg]

2024-01-16 Thread via GitHub
nastra commented on PR #8065: URL: https://github.com/apache/iceberg/pull/8065#issuecomment-1893303979 thanks for getting this done @snazy :100: . Also thanks to @ajantha-bhat and @danielcweeks for reviewing. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Consider publishing a BOM (bill of materials) [iceberg]

2024-01-16 Thread via GitHub
nastra closed issue #7371: Consider publishing a BOM (bill of materials) URL: https://github.com/apache/iceberg/issues/7371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Build: Add `iceberg-bom` artifact [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #8065: URL: https://github.com/apache/iceberg/pull/8065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Caused by: java.net.SocketException: Connection reset [iceberg]

2024-01-16 Thread via GitHub
javrasya commented on issue #9444: URL: https://github.com/apache/iceberg/issues/9444#issuecomment-1893312918 I wrote my own `S3FileIO` which uses a custom `S3InputStream` which retries when it hits a socket exception and it is all stable that way. Here is the part I modified with a v

Re: [I] access failed from host to iceberg container [iceberg]

2024-01-16 Thread via GitHub
nastra commented on issue #9465: URL: https://github.com/apache/iceberg/issues/9465#issuecomment-1893321474 I think you might be missing the `warehouse` configuration. Here's how it's set for the quickstart example: https://github.com/tabular-io/docker-spark-iceberg/blob/main/spark/spark-de

[PR] Flink 1.18: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra opened a new pull request, #9480: URL: https://github.com/apache/iceberg/pull/9480 This ports the changes that were done for Flink 1.17 in https://github.com/apache/iceberg/pull/9185 and also fixes the current CI failures for Flink 1.18 -- This is an automated message from the Apa

Re: [PR] Spark: Fix flaky TestSparkReaderDeletes tests due to metric not found [iceberg]

2024-01-16 Thread via GitHub
ajantha-bhat commented on code in PR #9445: URL: https://github.com/apache/iceberg/pull/9445#discussion_r1453194254 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -126,6 +126,7 @@ public static void startMetastoreAndSpark() {

Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2024-01-16 Thread via GitHub
nastra commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1893425961 @chinmay-bhat I don't think this is complete yet. There are still plenty of places that use `org.junit.Test` and `@RunWith(Parameterized.class)` within the `iceberg-spark` module --

Re: [PR] Spark: Fix flaky TestSparkReaderDeletes tests due to metric not found [iceberg]

2024-01-16 Thread via GitHub
ajantha-bhat commented on PR #9445: URL: https://github.com/apache/iceberg/pull/9445#issuecomment-1893429962 > @ajantha-bhat please help review Can you please elaborate how `spark.ui.liveUpdate.period` is linked to `lastExecutedMetricValue`. Are we suspecting that some other query is

Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2024-01-16 Thread via GitHub
chinmay-bhat commented on issue #9086: URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1893438723 I'll raise PRs for the remaining migrations in the coming days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Spark: Fix flaky TestSparkReaderDeletes tests due to metric not found [iceberg]

2024-01-16 Thread via GitHub
manuzhang commented on code in PR #9445: URL: https://github.com/apache/iceberg/pull/9445#discussion_r1453220284 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -126,6 +126,7 @@ public static void startMetastoreAndSpark() {

Re: [PR] Spark: propagate snapshot properties for RewriteDataFiles and RewritePositionDeleteFiles [iceberg]

2024-01-16 Thread via GitHub
ajantha-bhat commented on code in PR #9449: URL: https://github.com/apache/iceberg/pull/9449#discussion_r1450170611 ## core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java: ## @@ -36,6 +38,7 @@ public class RewriteDataFilesCommitManager { private f

[PR] Flink 1.16: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra opened a new pull request, #9482: URL: https://github.com/apache/iceberg/pull/9482 This ports the changes that were done for Flink 1.17 in https://github.com/apache/iceberg/pull/9185 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Spark: Support renaming views [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9343: URL: https://github.com/apache/iceberg/pull/9343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Flink 1.18: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra commented on PR #9480: URL: https://github.com/apache/iceberg/pull/9480#issuecomment-1893496257 thanks for reviewing @Fokko. I'll go ahead and merge this, since this is also fixing a CI failure on main with Flink 1.18. -- This is an automated message from the Apache Git Service. To

Re: [PR] Flink 1.18: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9480: URL: https://github.com/apache/iceberg/pull/9480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Spark: Move tests in functions directory to JUnit5 [iceberg]

2024-01-16 Thread via GitHub
nastra commented on PR #9481: URL: https://github.com/apache/iceberg/pull/9481#issuecomment-1893497820 @chinmay-bhat in this case I think we can include this one-liner with other JUnit5 changes ;) -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Spark: propagate snapshot properties for RewriteDataFiles and RewritePositionDeleteFiles [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #9449: URL: https://github.com/apache/iceberg/pull/9449#discussion_r1453279376 ## core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java: ## @@ -36,6 +38,7 @@ public class RewriteDataFilesCommitManager { private fin

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453283842 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453286267 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

[PR] Fix incorrect wget command in Flink documentation [iceberg]

2024-01-16 Thread via GitHub
cxzl25 opened a new pull request, #9483: URL: https://github.com/apache/iceberg/pull/9483 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453286267 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-16 Thread via GitHub
nastra commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1453292988 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/HijackViewCommands.scala: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9421: URL: https://github.com/apache/iceberg/pull/9421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-16 Thread via GitHub
nastra commented on PR #9421: URL: https://github.com/apache/iceberg/pull/9421#issuecomment-1893573458 thanks for reviewing @rdblue. I'll go ahead and merge this, since everything should be addressed -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453308229 ## format/spec.md: ## @@ -1149,6 +1195,12 @@ Each sort field in the fields list is stored as an object with the following pro |--- |--- |--- | |**`Sort Field`**|`

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1453312170 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewCheck.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundat

[I] Support commit retries [iceberg-python]

2024-01-16 Thread via GitHub
Fokko opened a new issue, #269: URL: https://github.com/apache/iceberg-python/issues/269 ### Feature Request / Improvement Within Iceberg, when a commit fails because of a concurrent operation, we can retry the operation by loading the latest version of the snapshot, and re-apply the

Re: [PR] Build: Bump actions/setup-python from 4 to 5 [iceberg]

2024-01-16 Thread via GitHub
Fokko merged PR #9473: URL: https://github.com/apache/iceberg/pull/9473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[I] Support compaction [iceberg-python]

2024-01-16 Thread via GitHub
Fokko opened a new issue, #270: URL: https://github.com/apache/iceberg-python/issues/270 ### Feature Request / Improvement Add support for compaction. This rewrites the existing manifests into a single one, reducing the number of calls to the object store. This should follow the Java

[I] Support writing to a table with sort-order [iceberg-python]

2024-01-16 Thread via GitHub
Fokko opened a new issue, #271: URL: https://github.com/apache/iceberg-python/issues/271 ### Feature Request / Improvement We fail when we see a sort order, it would be great if we could sort+write the data based on the sort-order. -- This is an automated message from the A

Re: [PR] Write support [iceberg-python]

2024-01-16 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1453353542 ## mkdocs/docs/api.md: ## @@ -175,6 +175,104 @@ static_table = StaticTable.from_metadata( The static-table is considered read-only. +## Write support + +With PyIc

Re: [PR] Rest Catalog Support for a Separate OAuth Server URI [iceberg-python]

2024-01-16 Thread via GitHub
Fokko commented on code in PR #233: URL: https://github.com/apache/iceberg-python/pull/233#discussion_r1453354835 ## pyiceberg/catalog/rest.py: ## @@ -265,15 +266,22 @@ def url(self, endpoint: str, prefixed: bool = True, **kwargs: Any) -> str: return url + endpoint.f

Re: [PR] Rest Catalog Support for a Separate OAuth Server URI [iceberg-python]

2024-01-16 Thread via GitHub
Fokko merged PR #233: URL: https://github.com/apache/iceberg-python/pull/233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [I] Rest Catalog Support for a Separate OAuth Server URI [iceberg-python]

2024-01-16 Thread via GitHub
Fokko closed issue #230: Rest Catalog Support for a Separate OAuth Server URI URL: https://github.com/apache/iceberg-python/issues/230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-16 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1893805040 Forget about me saying not knowing the version, I was looking at the code and saw that the version information is put in when enumerator serializes the data. I guess introducing v3 would

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453487893 ## format/spec.md: ## @@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

[PR] Build: Define strict version for Flink / Jackson / Hive2 / Tez 0.8 [iceberg]

2024-01-16 Thread via GitHub
nastra opened a new pull request, #9484: URL: https://github.com/apache/iceberg/pull/9484 I've noticed that we were silently using Flink 1.18.1 because a new patch version was just released, which caused a [test to fail](https://github.com/apache/iceberg/actions/runs/7538529758/job/20520887

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-16 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1453543090 ## pyiceberg/table/__init__.py: ## @@ -831,6 +832,13 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool =

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-16 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1453579122 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-16 Thread via GitHub
Fokko commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1453593588 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[U

Re: [PR] chore(deps): upgrade all the AWS SDK v2 deps, and s3iofs [iceberg-go]

2024-01-16 Thread via GitHub
nastra merged PR #50: URL: https://github.com/apache/iceberg-go/pull/50 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.21.2 to 1.24.1 [iceberg-go]

2024-01-16 Thread via GitHub
dependabot[bot] commented on PR #49: URL: https://github.com/apache/iceberg-go/pull/49#issuecomment-1893992721 Looks like github.com/aws/aws-sdk-go-v2 is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.21.2 to 1.24.1 [iceberg-go]

2024-01-16 Thread via GitHub
dependabot[bot] closed pull request #49: build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.21.2 to 1.24.1 URL: https://github.com/apache/iceberg-go/pull/49 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.19.1 to 1.26.3 [iceberg-go]

2024-01-16 Thread via GitHub
dependabot[bot] commented on PR #48: URL: https://github.com/apache/iceberg-go/pull/48#issuecomment-1893992903 Looks like github.com/aws/aws-sdk-go-v2/config is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.19.1 to 1.26.3 [iceberg-go]

2024-01-16 Thread via GitHub
dependabot[bot] closed pull request #48: build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.19.1 to 1.26.3 URL: https://github.com/apache/iceberg-go/pull/48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Add SqlCatalog _commit_table support [iceberg-python]

2024-01-16 Thread via GitHub
Fokko commented on code in PR #265: URL: https://github.com/apache/iceberg-python/pull/265#discussion_r1453609341 ## pyiceberg/catalog/sql.py: ## @@ -329,8 +363,66 @@ def _commit_table(self, table_request: CommitTableRequest) -> CommitTableRespons Raises:

Re: [PR] Spark: Support renaming views [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1453638953 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -635,6 +633,118 @@ private Catalog tableCatalog() { return Sp

[I] [HadoopCatalog]: [HadoopTableOperations]: Commit flow, renameToFinal does not actually check if lock acquired [iceberg]

2024-01-16 Thread via GitHub
N-o-Z opened a new issue, #9485: URL: https://github.com/apache/iceberg/issues/9485 ### Apache Iceberg version 1.4.2 (latest release) ### Query engine None ### Please describe the bug 🐞 The last part of the commit flow, requires writing the new metadata vers

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-16 Thread via GitHub
pvary commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1894100365 @javrasya: The Spark RewriteDataFiles should create a new snapshot in the table. If the query reads this new snapshot, then it should not read the old delete files anymore. If ExpireS

Re: [PR] Flink 1.16: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra commented on PR #9482: URL: https://github.com/apache/iceberg/pull/9482#issuecomment-1894102191 thanks for reviewing @pvary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Flink 1.16: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9482: URL: https://github.com/apache/iceberg/pull/9482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Caused by: java.net.SocketException: Connection reset [iceberg]

2024-01-16 Thread via GitHub
pvary commented on issue #9444: URL: https://github.com/apache/iceberg/issues/9444#issuecomment-1894119990 @javrasya: Why isn't this issue happening outside Flink? Isn't this a more general S3 issue? -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r145376 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apac

Re: [PR] Parquet: Deprecate readSupport and callInit in the read builder [iceberg]

2024-01-16 Thread via GitHub
nastra merged PR #9325: URL: https://github.com/apache/iceberg/pull/9325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-16 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1453725520 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
nastra commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1453732220 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apache.f

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-16 Thread via GitHub
pvary commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1894176952 I would prefer creating a well balanced writeLongUTF solution which would be a candidate to get into the Flink code later. I prefer your solution where you write out the length in a l

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-16 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1453725520 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [PR] Flink: Remove reading of the data files to fix flakiness [iceberg]

2024-01-16 Thread via GitHub
stevenzwu merged PR #9451: URL: https://github.com/apache/iceberg/pull/9451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Flink: Remove reading of the data files to fix flakiness [iceberg]

2024-01-16 Thread via GitHub
stevenzwu commented on PR #9451: URL: https://github.com/apache/iceberg/pull/9451#issuecomment-1894187395 @pvary thx for the explanation. let's give it a try then. if it still doesn't fix it, let's disable/ignore this test for now. -- This is an automated message from the Apache Git Servi

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
pvary commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1453743735 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apache.fl

Re: [PR] Flink: Remove reading of the data files to fix flakiness [iceberg]

2024-01-16 Thread via GitHub
pvary commented on PR #9451: URL: https://github.com/apache/iceberg/pull/9451#issuecomment-1894189131 @stevenzwu: Ahh.. I forgot to merge. Thanks for finding and merging this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
stevenzwu commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1453749503 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apach

Re: [PR] Build: Bump mkdocs-material from 9.5.3 to 9.5.4 [iceberg-python]

2024-01-16 Thread via GitHub
HonahX commented on PR #267: URL: https://github.com/apache/iceberg-python/pull/267#issuecomment-1894209535 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Support commit retries [iceberg-python]

2024-01-16 Thread via GitHub
nicor88 commented on issue #269: URL: https://github.com/apache/iceberg-python/issues/269#issuecomment-1894219977 Few suggestions on this feature. It will be good to have control of the amount of retries and the retry strategy. After trying out a few retries libraries I found [tenacity](htt

Re: [PR] Build: Bump mkdocs-material from 9.5.3 to 9.5.4 [iceberg-python]

2024-01-16 Thread via GitHub
HonahX merged PR #267: URL: https://github.com/apache/iceberg-python/pull/267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Avro data encryption [iceberg]

2024-01-16 Thread via GitHub
rdblue merged PR #9436: URL: https://github.com/apache/iceberg/pull/9436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-16 Thread via GitHub
bryanck commented on PR #9466: URL: https://github.com/apache/iceberg/pull/9466#issuecomment-1894239101 Sure, thanks, I'll scale down this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-16 Thread via GitHub
jbonofre opened a new pull request, #9487: URL: https://github.com/apache/iceberg/pull/9487 Close #8697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Table Metadata Update: Support SetPropertiesUpdate and RemovePropertiesUpdate [iceberg-python]

2024-01-16 Thread via GitHub
HonahX merged PR #266: URL: https://github.com/apache/iceberg-python/pull/266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-16 Thread via GitHub
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1894252812 Few notes about this PR: 1. In `JdbcUtil`, I used methods to generate the SQL statement for table and view as it's very similar. I used a `boolean` to define if the generated SQL stat

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-16 Thread via GitHub
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1894253318 @ajantha-bhat @nk1506 if you guys want to take a look :) Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1894272954 I gave this PR a round of testing on the cluster and it seems to work as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi merged PR #8755: URL: https://github.com/apache/iceberg/pull/8755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1894289015 Thanks for reviewing, @szehon-ho @RussellSpitzer! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Docs: Enhance documentation on identifier fields [iceberg]

2024-01-16 Thread via GitHub
rdblue merged PR #9478: URL: https://github.com/apache/iceberg/pull/9478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Docs: Enhance documentation on identifier fields [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on PR #9478: URL: https://github.com/apache/iceberg/pull/9478#issuecomment-1894296404 Thanks, @manuzhang! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1453833185 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -40,6 +42,11 @@ case class RewriteViewCommands(spar

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1453833185 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -40,6 +42,11 @@ case class RewriteViewCommands(spar

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1453839867 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apac

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1453840484 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewCheck.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundat

Re: [I] Caused by: java.net.SocketException: Connection reset [iceberg]

2024-01-16 Thread via GitHub
javrasya commented on issue #9444: URL: https://github.com/apache/iceberg/issues/9444#issuecomment-1894309851 No idea tbh. It is very hard to address. This does not even happen when I run it on a standalone Flink cluster running on my local. This happens when my app runs on AWS Managed Flin

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-16 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1453844146 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apac

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-16 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1894338719 I was also leaning toward that one @pvary. But what benefit you think it would bring to write out the data ub 65k sized byte arrays. DataOutputSerializer from Flink also writes it into a

Re: [PR] Spark: propagate snapshot properties for RewriteDataFiles and RewritePositionDeleteFiles [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #9449: URL: https://github.com/apache/iceberg/pull/9449#issuecomment-1894362995 I will take a look this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Spark 3.5: Support specifying filter in RewriteManifestsProcedure [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #9447: URL: https://github.com/apache/iceberg/pull/9447#issuecomment-1894364112 Will review this week, thanks for pinging, @bknbkn! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[I] Cannot write nullable values to non-null column in the Iceberg Table [iceberg]

2024-01-16 Thread via GitHub
abharath9 opened a new issue, #9488: URL: https://github.com/apache/iceberg/issues/9488 Throwing following error when trying into insert data into the Iceberg table with not-null columns constraints. **_Cannot write nullable values to non-null column 'id' exception_** Here is a sa

Re: [PR] Flink: Upgrade Flink version from 1.18 to 1.18.1 [iceberg]

2024-01-16 Thread via GitHub
stevenzwu commented on PR #9486: URL: https://github.com/apache/iceberg/pull/9486#issuecomment-1894493940 thanks @rodmeneses -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Flink: Upgrade Flink version from 1.18 to 1.18.1 [iceberg]

2024-01-16 Thread via GitHub
stevenzwu merged PR #9486: URL: https://github.com/apache/iceberg/pull/9486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1454030078 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -40,6 +42,11 @@ case class RewriteViewCommands(spar

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1454087181 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewCheck.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-16 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1454089362 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewCheck.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1453851019 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table table)

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1454102284 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ## @@ -150,6 +154,21 @@ protected Dataset contentFileDS(Table table, Set s

  1   2   >