Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-21 Thread via GitHub
mgmarino commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1461464809 ## pyiceberg/catalog/glue.py: ## @@ -84,19 +110,105 @@ def _construct_parameters( return new_parameters +def _type_to_glue_type_string(input_type: Iceber

Re: [PR] Arrow, AWS, Core: Remove deprecated code for 1.5.0 release [iceberg]

2024-01-21 Thread via GitHub
nastra merged PR #9505: URL: https://github.com/apache/iceberg/pull/9505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: Cleanup assertion messages in partition spec tests [iceberg]

2024-01-21 Thread via GitHub
nastra merged PR #9528: URL: https://github.com/apache/iceberg/pull/9528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461380205 ## pyiceberg/table/__init__.py: ## @@ -504,6 +504,12 @@ def _(update: AddSchemaUpdate, base_metadata: TableMetadata, context: _TableMeta if update.last_column

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461380205 ## pyiceberg/table/__init__.py: ## @@ -504,6 +504,12 @@ def _(update: AddSchemaUpdate, base_metadata: TableMetadata, context: _TableMeta if update.last_column

Re: [PR] Infra: Increase operations-per-run in stale action to 100 [iceberg]

2024-01-21 Thread via GitHub
nastra merged PR #9529: URL: https://github.com/apache/iceberg/pull/9529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Build: Bump nessie from 0.76.2 to 0.76.3 [iceberg]

2024-01-21 Thread via GitHub
nastra merged PR #9537: URL: https://github.com/apache/iceberg/pull/9537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Speeding up rewrite_data_files encountered concurrent write issue. [iceberg]

2024-01-21 Thread via GitHub
manuzhang commented on issue #9521: URL: https://github.com/apache/iceberg/issues/9521#issuecomment-1903388036 @a8356555 yes, there could be conflicts from concurrent commit from multiple file groups with partial progress enabled. Usually, they will succeed eventually on retry. -- This i

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-21 Thread via GitHub
pvary commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1461387058 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -367,6 +382,8 @@ public void testSpecificSnapshotTimestamp() th

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461380205 ## pyiceberg/table/__init__.py: ## @@ -504,6 +504,12 @@ def _(update: AddSchemaUpdate, base_metadata: TableMetadata, context: _TableMeta if update.last_column

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1461385514 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(Snapsh

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-21 Thread via GitHub
pvary commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1461383936 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/MiniClusterResource.java: ## @@ -50,4 +51,18 @@ public static MiniClusterWithClientResource createWithClasslo

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-21 Thread via GitHub
pvary commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1461383458 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/assigner/SplitAssigner.java: ## @@ -115,4 +115,7 @@ default void onCompletedSplits(Collection complete

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1903304501 Hi @kevinjqliu. In Pyiceberg, the `update_schema()...commit()` increments the schema id: https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-21 Thread via GitHub
zinking commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1461368374 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotPro

Re: [PR] Infra: Increase operations-per-run in stale action to 100 [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat commented on PR #9529: URL: https://github.com/apache/iceberg/pull/9529#issuecomment-1903271987 > @ajantha-bhat @nastra we also need to increase the rate limit to clean up stale issues. Yep. This is what I asked in the previous PR. I don't have an issue with bumpi

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461359249 ## tests/catalog/test_base.py: ## @@ -585,8 +397,10 @@ def test_commit_table(catalog: InMemoryCatalog) -> None: # Then assert response.metadata.tabl

Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-21 Thread via GitHub
mudit-97 commented on PR #9479: URL: https://github.com/apache/iceberg/pull/9479#issuecomment-1903266200 @ajantha-bhat thanks for the help, I will also join the channel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat commented on PR #9479: URL: https://github.com/apache/iceberg/pull/9479#issuecomment-1903265375 @mudit-97: I did ask the question on iceberg slack (dev channel), feel free to join the channel. @amogh-jahagirdar pointed me to the historical PR for the same (https://github.co

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-21 Thread via GitHub
zinking commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1461354262 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotPro

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-21 Thread via GitHub
zinking commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1461354262 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotPro

Re: [PR] chore(deps): Update env_logger requirement from 0.10.0 to 0.11.0 [iceberg-rust]

2024-01-21 Thread via GitHub
liurenjie1024 commented on PR #170: URL: https://github.com/apache/iceberg-rust/pull/170#issuecomment-1903197814 CC @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-21 Thread via GitHub
mudit-97 commented on PR #9479: URL: https://github.com/apache/iceberg/pull/9479#issuecomment-1903196974 > I spent little time and understood this PR. > > Basically you want to enable record level filtering (and you have observed the benefits with this POC PR) for vector reader instea

Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-21 Thread via GitHub
mudit-97 commented on code in PR #9479: URL: https://github.com/apache/iceberg/pull/9479#discussion_r1461333593 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -104,19 +104,21 @@ import org.apache.parquet.column.ParquetProperties.WriterVersion; import o

Re: [PR] Infra: Increase operations-per-run in stale action to 100 [iceberg]

2024-01-21 Thread via GitHub
manuzhang commented on PR #9529: URL: https://github.com/apache/iceberg/pull/9529#issuecomment-1903018966 @ajantha-bhat @nastra we also need to increase the rate limit to clean up stale issues. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] [doc] Word `Gague ` spelled incorrectly in https://iceberg.apache.org/docs/latest/flink-writes/#metrics [iceberg]

2024-01-21 Thread via GitHub
amogh-jahagirdar closed issue #9527: [doc] Word `Gague ` spelled incorrectly in https://iceberg.apache.org/docs/latest/flink-writes/#metrics URL: https://github.com/apache/iceberg/issues/9527 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Docs: Correct the spelling of "gauge" [iceberg]

2024-01-21 Thread via GitHub
amogh-jahagirdar merged PR #9543: URL: https://github.com/apache/iceberg/pull/9543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Docs: Corrects the spelling of "gauge" [iceberg]

2024-01-21 Thread via GitHub
Aireed commented on PR #9543: URL: https://github.com/apache/iceberg/pull/9543#issuecomment-1902885886 cc @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] implement hive catalog `_commit_table` [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu opened a new pull request, #294: URL: https://github.com/apache/iceberg-python/pull/294 https://github.com/apache/iceberg-python/issues/275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1902879704 @mgmarino @nicor88 Thanks for your input. > Did you try to evolve the table schema and see if the changes are properly updated in glue and usable in Athena? I did a simp

Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat commented on code in PR #9479: URL: https://github.com/apache/iceberg/pull/9479#discussion_r1461265842 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -1160,8 +1162,37 @@ public CloseableIterable build() { optionsBuilder.withDecry

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461251541 ## tests/catalog/test_base.py: ## @@ -16,243 +16,41 @@ # under the License. # pylint:disable=redefined-outer-name -from typing import ( -Dict, -Lis

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461251110 ## pyiceberg/catalog/in_memory.py: ## @@ -0,0 +1,222 @@ +import uuid +from typing import ( +Dict, +List, +Optional, +Set, +Union, +) + +from

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1461250937 ## pyiceberg/catalog/in_memory.py: ## @@ -0,0 +1,222 @@ +import uuid +from typing import ( +Dict, +List, +Optional, +Set, +Union, +) + +from

Re: [I] moto server port conflicts on macOS [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu closed issue #291: moto server port conflicts on macOS URL: https://github.com/apache/iceberg-python/issues/291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Fix moto server port conflict [iceberg-python]

2024-01-21 Thread via GitHub
HonahX merged PR #292: URL: https://github.com/apache/iceberg-python/pull/292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[PR] Fix moto server port conflict [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu opened a new pull request, #292: URL: https://github.com/apache/iceberg-python/pull/292 This PR changes the default moto server port from `5000` to `5001`. Port 5000 is used by AirPlay Receiver on MacOS. The `is_port_in_use` function is used to check if the port is availab

[I] moto server port conflicts on macOS [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu opened a new issue, #291: URL: https://github.com/apache/iceberg-python/issues/291 ### Apache Iceberg version None ### Please describe the bug 🐞 Moto server is used to mock S3 client calls MacOS uses port 5000 for AirPlay Receiver which causes port confl

[I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-21 Thread via GitHub
kevinjqliu opened a new issue, #290: URL: https://github.com/apache/iceberg-python/issues/290 ### Apache Iceberg version 0.5.0 (latest release) ### Please describe the bug 🐞 When updating the schema of an iceberg table (such as adding a column), the `schema_id` should be

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-21 Thread via GitHub
nicor88 commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1902724675 Thanks @mgmarino/@HonahX something else that comes to mind when working with glue/ Athena (valid for other engines too). Did you try to evolve the table schema and see if the chang

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-21 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1460954285 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordWriter.java: ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-21 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1460951937 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/Utilities.java: ## @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-21 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1460948481 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/Utilities.java: ## @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-21 Thread via GitHub
mgmarino commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1902635769 > Also, if possible, could you please also add some tests in [integration_test_glue.py](https://github.com/apache/iceberg-python/blob/3085c404c99ba8c5c8856f21a6c8d63a12ca0113/tests/c

Re: [PR] Build: Fix errorprone warning [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat commented on PR #9531: URL: https://github.com/apache/iceberg/pull/9531#issuecomment-1902623107 `iceberg-azure` Build failed with unrelated error. ``` > Caused by: org.testcontainers.containers.ContainerFetchException: Failed to pull image: mcr.microsoft.co

Re: [PR] Build: Fix errorprone warning [iceberg]

2024-01-21 Thread via GitHub
ajantha-bhat closed pull request #9531: Build: Fix errorprone warning URL: https://github.com/apache/iceberg/pull/9531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[PR] chore(deps): Update env_logger requirement from 0.10.0 to 0.11.0 [iceberg-rust]

2024-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #170: URL: https://github.com/apache/iceberg-rust/pull/170 Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version. Changelog Sourced from https://github.com/rust-cli/env_logger/blob/m

Re: [I] When using GlueCatalog, Iceberg table comment is not added to Glue table [iceberg]

2024-01-21 Thread via GitHub
d125q closed issue #9542: When using GlueCatalog, Iceberg table comment is not added to Glue table URL: https://github.com/apache/iceberg/issues/9542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] When using GlueCatalog, Iceberg table comment is not added to Glue table [iceberg]

2024-01-21 Thread via GitHub
d125q commented on issue #9542: URL: https://github.com/apache/iceberg/issues/9542#issuecomment-1902598142 Sorry, I missed #9530 from yesterday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] When using GlueCatalog, Iceberg table comment is not added to Glue table [iceberg]

2024-01-21 Thread via GitHub
d125q opened a new issue, #9542: URL: https://github.com/apache/iceberg/issues/9542 ### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Assume `glue_catalog` is configured to use GlueCatalog. Now, execute

Re: [PR] Flink 1.17: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-21 Thread via GitHub
cgpoh commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1902586061 @nastra, thanks for the backport. Sorry that I'm not able to work on it earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-21 Thread via GitHub
HonahX commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1902557930 Thanks for summarizing the approaches and explanation on the concerns. > I’m not convinced that we can assign ids without relying on the position when generating the name

[PR] AWS: Update S3FileIO test to run when CLIENT_FACTORY is not set [iceberg]

2024-01-21 Thread via GitHub
alok123t opened a new pull request, #9541: URL: https://github.com/apache/iceberg/pull/9541 This PR updates the existing test to check `S3FileIO` works even with the `CLIENT_FACTORY` is not set. This update ensures the two tests below work - `testS3FileIOWithAwsClientFactoryImpl` - checks

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.16.14 to 1.16.16 [iceberg-go]

2024-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #54: URL: https://github.com/apache/iceberg-go/pull/54 Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.16.14 to 1.16.16. Changelog Sourced from https://github.com/aws/aws-sdk-go-v2/blob/v1.16.

Re: [PR] build(deps): bump github.com/wolfeidau/s3iofs from 1.3.0 to 1.4.0 [iceberg-go]

2024-01-21 Thread via GitHub
dependabot[bot] closed pull request #41: build(deps): bump github.com/wolfeidau/s3iofs from 1.3.0 to 1.4.0 URL: https://github.com/apache/iceberg-go/pull/41 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.26.3 to 1.26.5 [iceberg-go]

2024-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #52: URL: https://github.com/apache/iceberg-go/pull/52 Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.26.3 to 1.26.5. Commits https://github.com/aws/aws-sdk-go-v2/commit/a75d7694eb2709212655ee

Re: [PR] build(deps): bump github.com/wolfeidau/s3iofs from 1.3.0 to 1.4.0 [iceberg-go]

2024-01-21 Thread via GitHub
dependabot[bot] commented on PR #41: URL: https://github.com/apache/iceberg-go/pull/41#issuecomment-1902552318 Superseded by #53. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] build(deps): bump github.com/wolfeidau/s3iofs from 1.5.0 to 1.5.2 [iceberg-go]

2024-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #53: URL: https://github.com/apache/iceberg-go/pull/53 Bumps [github.com/wolfeidau/s3iofs](https://github.com/wolfeidau/s3iofs) from 1.5.0 to 1.5.2. Release notes Sourced from https://github.com/wolfeidau/s3iofs/releases";>github.com/wolfe