Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1623887745 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = pr

Re: [PR] Support merge manifests on writes (MergeAppend) [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on PR #363: URL: https://github.com/apache/iceberg-python/pull/363#issuecomment-2144484689 Sorry for the long wait. I've fixed the sequence number inheritance issue. Previously some manifest entry incorrectly persist the `-1` sequence number inherited from a newly construct

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on code in PR #748: URL: https://github.com/apache/iceberg-python/pull/748#discussion_r1623919765 ## pyiceberg/table/__init__.py: ## @@ -1290,6 +1291,17 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(ref.

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on code in PR #748: URL: https://github.com/apache/iceberg-python/pull/748#discussion_r1623919765 ## pyiceberg/table/__init__.py: ## @@ -1290,6 +1291,17 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(ref.

[I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
ochanism opened a new issue, #10429: URL: https://github.com/apache/iceberg/issues/10429 ### Query engine _No response_ ### Question https://iceberg.apache.org/docs/1.5.2/configuration/#hadoop-configuration https://github.com/apache/iceberg/assets/12025151/03ee4b10

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144609379 Hey @ochanism Thanks for reaching out. Hive 4.x supports Iceberg out of the box. Before an external Iceberg dependency was needed, but Hive 4+ ships with Iceberg directly. So

Re: [PR] Open-API: `AssertRefSnapshotId` type should be `branch` or `tag` [iceberg]

2024-06-03 Thread via GitHub
Fokko closed pull request #10423: Open-API: `AssertRefSnapshotId` type should be `branch` or `tag` URL: https://github.com/apache/iceberg/pull/10423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Open-API: `AssertRefSnapshotId` type should be `branch` or `tag` [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on PR #10423: URL: https://github.com/apache/iceberg/pull/10423#issuecomment-2144623659 Hehe, you're right @nastra, thanks! The type is not specified. What confused me is that `SnapshotReference` has a type enum with `[branch, tag]`. -- This is an automated message from th

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #748: URL: https://github.com/apache/iceberg-python/pull/748#discussion_r1624051739 ## pyiceberg/table/__init__.py: ## @@ -1290,6 +1291,17 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(ref.s

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1623646076 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1623646076 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624081189 ## pyiceberg/table/__init__.py: ## @@ -1806,6 +1891,85 @@ def __enter__(self) -> U: return self # type: ignore +class ManageSnapshots(UpdateTableMetada

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624084300 ## pyiceberg/table/__init__.py: ## @@ -1806,6 +1891,85 @@ def __enter__(self) -> U: return self # type: ignore +class ManageSnapshots(UpdateTabl

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624087688 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

[I] For Hive 2.1.1 version support [iceberg]

2024-06-03 Thread via GitHub
liunaijie opened a new issue, #10430: URL: https://github.com/apache/iceberg/issues/10430 ### Query engine _No response_ ### Question Hi team, i am try create an partition table through Hive 2.1.1. But i find the created table doesn't has `partition` column, the partit

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624091349 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624090747 ## pyiceberg/table/__init__.py: ## @@ -1806,6 +1891,85 @@ def __enter__(self) -> U: return self # type: ignore +class ManageSnapshots(UpdateTabl

Re: [I] For Hive 2.1.1 version support [iceberg]

2024-06-03 Thread via GitHub
liunaijie commented on issue #10430: URL: https://github.com/apache/iceberg/issues/10430#issuecomment-2144708209 using iceberg 1.5.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624109091 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = pr

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624116714 ## pyiceberg/table/__init__.py: ## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit s

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624117267 ## pyiceberg/table/__init__.py: ## @@ -1806,6 +1891,85 @@ def __enter__(self) -> U: return self # type: ignore +class ManageSnapshots(UpdateTableMetada

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624119801 ## pyiceberg/table/__init__.py: ## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624119801 ## pyiceberg/table/__init__.py: ## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624130991 ## pyiceberg/table/__init__.py: ## @@ -340,6 +341,86 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624136971 ## pyiceberg/table/__init__.py: ## @@ -340,6 +340,34 @@ def set_properties(self, properties: Properties = EMPTY_DICT, **kwargs: Any) -> updates = propertie

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
ochanism commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144761773 @Fokko Sorry for my ambiguous question. I'm using Trino as a query engine with hive-metastore catalog. And for the data ingestion (streaming), I developed a JAVA server with

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #728: URL: https://github.com/apache/iceberg-python/pull/728#discussion_r1624138385 ## pyiceberg/table/__init__.py: ## @@ -277,6 +279,50 @@ def __init__(self, table: Table, autocommit: bool = False): self._autocommit = autocommit s

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144767087 @ochanism Thanks for clearing that up, that helps. Can you share the compilation error that you're seeing? -- This is an automated message from the Apache Git Service. To respond t

Re: [I] For Hive 2.1.1 version support [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10430: URL: https://github.com/apache/iceberg/issues/10430#issuecomment-2144777340 Hey @liunaijie thanks for reaching out here. Iceberg has a different strategy than Hive when it comes to partitioning. Iceberg expects the partitions to be part of the table itself,

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
ochanism commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144779737 @Fokko This error occurred while initializing hive catalog. ```java var catalog = new HiveCatalog(); catalog.initialize(this.catalogName, this.properties); ```

Re: [I] For Hive 2.1.1 version support [iceberg]

2024-06-03 Thread via GitHub
liunaijie closed issue #10430: For Hive 2.1.1 version support URL: https://github.com/apache/iceberg/issues/10430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] For Hive 2.1.1 version support [iceberg]

2024-06-03 Thread via GitHub
liunaijie commented on issue #10430: URL: https://github.com/apache/iceberg/issues/10430#issuecomment-2144871422 @Fokko thanks for your quick response, just werid the different behavior when i created table with in hive version 4 and 2.1.1. In hive 4, i use same ddl to create table,

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144887605 I see, the property has been updated since Hive 4: https://github.com/apache/hive/commit/b33b3d3454cc9c65a1879c68679f33f207f21c0e#diff-b7bbe8545a21ec7d7e9cfe40ef66444789e332996aaa9e7f

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2144888988 This would entail removing the whole HadoopCatalog. I think this is something that we can bring up with Iceberg 2.0 -- This is an automated message from the Apache Git Service. To

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
ochanism commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2144917470 Thanks for the information. Do you mean that Hive 4.0 with Iceberg is managed by Hive community? I want to use the latest Iceberg version, but the shaded jar used Iceberg 1.4.3

Re: [I] Make ManifestEntry and ManifestReader.liveEntries() as public [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10425: URL: https://github.com/apache/iceberg/issues/10425#issuecomment-2144916554 I'm reluctant to make it part of the public API. Mostly because you need to have a good understanding of Iceberg to know what you're doing, and probably it is better to go through th

Re: [I] End-of-life Flink 1.16 is still referenced in docs [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10412: URL: https://github.com/apache/iceberg/issues/10412#issuecomment-2144930286 @manuzhang Thanks for bringing this up 🙌 What about making the version configurable, both for Spark and Flink, similar to what we did for [Iceberg and Nessie](https://github.

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2144936472 @bitsondatadev After this PR we need to fix this in the [docs branch](https://github.com/apache/iceberg/tree/docs), right? -- This is an automated message from the Apache Git Service. T

Re: [I] Broken links in Spark Writes documentation [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10409: URL: https://github.com/apache/iceberg/issues/10409#issuecomment-2144937841 Thanks for reporting this @gphilipp 🙌 I checked, and the nightly ones are working, so that's good. Looks like @manuzhang is fixing the already published versions. -- This is an au

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2144947788 @Fokko This PR is open against docs branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] End-of-life Flink 1.16 is still referenced in docs [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on issue #10412: URL: https://github.com/apache/iceberg/issues/10412#issuecomment-2144951488 Yes, I'm thinking about introducing a DEFAULT_FLINK_VERSION as we do for the code. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Cache Manifest files [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on PR #787: URL: https://github.com/apache/iceberg-python/pull/787#issuecomment-2144956518 One test is failing CI (test_duckdb_url_import). ``` FAILED tests/integration/test_writes/test_writes.py::test_duckdb_url_import - duckdb.duckdb.IOException: IO Error:

[I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
zhongqishang opened a new issue, #10431: URL: https://github.com/apache/iceberg/issues/10431 ### Apache Iceberg version 1.2.1 ### Query engine Flink ### Please describe the bug 🐞 I have a flink upsert job with a checkpoint interval of 5 minutes and an exter

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10411: URL: https://github.com/apache/iceberg/pull/10411#discussion_r1624294761 ## 1.5.2/docs/configuration.md: ## @@ -133,7 +133,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | clients

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
nastra commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2145011308 @manuzhang did you run this locally and do the links across versions work now correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Core: Introduce AuthConfig [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10161: URL: https://github.com/apache/iceberg/pull/10161#discussion_r1624310832 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -458,32 +458,11 @@ public static class AuthSession { private static final long MAX_REFRE

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10358: URL: https://github.com/apache/iceberg/pull/10358#discussion_r1624356950 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java: ## @@ -18,85 +18,53 @@ */ package org.apache.iceberg.flink.sink; +im

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10358: URL: https://github.com/apache/iceberg/pull/10358#discussion_r1624356950 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2.java: ## @@ -18,85 +18,53 @@ */ package org.apache.iceberg.flink.sink; +im

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10358: URL: https://github.com/apache/iceberg/pull/10358#discussion_r1624374238 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/MiniFlinkClusterExtension.java: ## @@ -50,4 +51,17 @@ public static MiniClusterExtension createWithClasslo

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10358: URL: https://github.com/apache/iceberg/pull/10358#discussion_r1624376948 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/reader/ReaderUtil.java: ## @@ -122,4 +123,24 @@ public static CombinedScanTask createCombinedScanTas

Re: [PR] Build: Require approving review [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10424: URL: https://github.com/apache/iceberg/pull/10424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Parquet,API: Consolidate Parquet's TestHelpers into API module [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10428: URL: https://github.com/apache/iceberg/pull/10428#discussion_r1624384991 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -173,6 +176,26 @@ public static void assertSameSchemaMap(Map map1, Map

Re: [I] Does the Iceberg 1.5.2 supports hive-metastore 4.0.0? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2145114470 @ochanism The problem is that Hive is both a query engine and a metastore (catalog in Iceberg). The maintenance of the query engine (the support to read and write Iceberg), is now co

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #10411: URL: https://github.com/apache/iceberg/pull/10411#discussion_r1624425581 ## 1.5.2/docs/configuration.md: ## @@ -133,7 +133,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | clients

Re: [PR] Parquet,API: Consolidate Parquet's TestHelpers into API module [iceberg]

2024-06-03 Thread via GitHub
advancedxy commented on code in PR #10428: URL: https://github.com/apache/iceberg/pull/10428#discussion_r1624488840 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -173,6 +176,26 @@ public static void assertSameSchemaMap(Map map1, Map

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
bitsondatadev commented on code in PR #10411: URL: https://github.com/apache/iceberg/pull/10411#discussion_r1624498796 ## 1.5.2/docs/configuration.md: ## @@ -133,7 +133,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors | clients

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624502953 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -861,6 +862,19 @@ public static Builder buildFromEmpty() { return new Builder(); } + publ

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624504977 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2655,6 +2656,68 @@ public void testCleanupCleanableExceptionsReplace() { .isInstanc

Re: [PR] Support `Table.to_arrow_batches` to return Iterator[Recordbatch] instead of a fully materialized Arrow Table [iceberg-python]

2024-06-03 Thread via GitHub
syun64 commented on PR #786: URL: https://github.com/apache/iceberg-python/pull/786#issuecomment-2145268428 Thank you for sharing the additional context @djouallah That's interesting! Although we don't use a pyarrow dataset in PyIceberg yet. We use a PyArrow dataset Fragment to creat

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624509499 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2655,6 +2656,68 @@ public void testCleanupCleanableExceptionsReplace() { .isInstanc

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624502953 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -861,6 +862,19 @@ public static Builder buildFromEmpty() { return new Builder(); } + publ

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624512760 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,8 +374,7 @@ private static TableMetadata create(TableOperations ops, UpdateTableRequest

Re: [I] Make ManifestEntry and ManifestReader.liveEntries() as public [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on issue #10425: URL: https://github.com/apache/iceberg/issues/10425#issuecomment-2145346311 @pudidic If you want to read the entries wouldn't using the ManifestFiles.read() API be sufficient? Then you could iterate over the data files via the `iterator` API, and

Re: [PR] Parquet: Remove deprecated TestHelpers in parquet module [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10428: URL: https://github.com/apache/iceberg/pull/10428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-06-03 Thread via GitHub
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1623316927 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,124 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}

Re: [PR] Core: Introduce AuthConfig [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar merged PR #10161: URL: https://github.com/apache/iceberg/pull/10161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Build: Remove links checker [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on PR #10404: URL: https://github.com/apache/iceberg/pull/10404#issuecomment-2145446188 @Fokko I'm good with leaving in the link checker, but we need to update the README which refers to another [python linkchecker](https://github.com/linkchecker/linkchecker). --

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2145501680 @nastra As @Fokko said, #9965 wan't merged in time for 1.5.x branch and this PR basically back-ports the patch (resolving minor conflicts). I've built site locally and manually checke

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145510358 @Fokko Sir, I don't quite understand, do you mean we will delete the whole hadoopCatalog? But we have a large number of customers who are using hadoopCatalog. -- This is an

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145515417 @Fokko HadoopCatalog is working fine after fixing the problems associated with it, so why remove it? -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-03 Thread via GitHub
liurenjie1024 commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1624624064 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -221,67 +413,215 @@ impl ManifestFilterVisitor<'_> { let pos = reference.accessor().p

Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

2024-06-03 Thread via GitHub
syun64 commented on issue #788: URL: https://github.com/apache/iceberg-python/issues/788#issuecomment-2145537794 Thank you for the input @Fokko - sounds good 👍 I've put up https://github.com/apache/iceberg-python/pull/789 to fix this issue -- This is an automated message from the

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145552594 @BsoBird I'm sorry I misinterpreted what you're suggesting. What kind of alternative are you suggesting for the version hint file? One of the requirements of the File System Catalog

Re: [PR] `include_field_ids` flag in `schema_to_pyarrow` [iceberg-python]

2024-06-03 Thread via GitHub
syun64 commented on PR #789: URL: https://github.com/apache/iceberg-python/pull/789#issuecomment-2145637904 Thanks for the review @HonahX ! Could I ask for your help in getting this PR merged? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on PR #748: URL: https://github.com/apache/iceberg-python/pull/748#issuecomment-2145641873 Thank you for the review @Fokko and @HonahX ! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

2024-06-03 Thread via GitHub
Fokko closed issue #788: Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID URL: https://github.com/apache/iceberg-python/issues/788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] `include_field_ids` flag in `schema_to_pyarrow` [iceberg-python]

2024-06-03 Thread via GitHub
Fokko merged PR #789: URL: https://github.com/apache/iceberg-python/pull/789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX merged PR #748: URL: https://github.com/apache/iceberg-python/pull/748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145670035 @Fokko We just need to drop the use of the versionHint file, and the hadoopcatalog is now atomic. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on PR #748: URL: https://github.com/apache/iceberg-python/pull/748#issuecomment-2145671039 Merged! Thanks @chinmay-bhat for the great work! Thanks @Fokko @syun64 @ndrluis for the review and discussions! -- This is an automated message from the Apache Git Service. To respo

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145678553 @Fokko Because, for the file system catalog, the client's behaviour is currently to write to a temp file with a random ID and then rename the file to complete the commit. This

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145684619 However, whether or not we use a versionHint file, we can find the latest version of the commit by some means. In other words, as long as the metadata file is renamed, the commit i

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145692918 In our production environment, we use hadoopcatalog heavily. After fixing the above issues, it performs very well. -- This is an automated message from the Apache Git Service. To

[PR] Add support for orc format [iceberg-python]

2024-06-03 Thread via GitHub
MehulBatra opened a new pull request, #790: URL: https://github.com/apache/iceberg-python/pull/790 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
alonahmias commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145800170 Hi, we would like to contribute to this issue, is it possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] ORC file format support [iceberg-python]

2024-06-03 Thread via GitHub
MehulBatra commented on issue #20: URL: https://github.com/apache/iceberg-python/issues/20#issuecomment-2145803829 Initial Progress: https://github.com/apache/iceberg-python/pull/790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-06-03 Thread via GitHub
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1624850309 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,113 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1624777278 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/CollectingSink.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Support `Table.to_arrow_batches` to return Iterator[Recordbatch] instead of a fully materialized Arrow Table [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #786: URL: https://github.com/apache/iceberg-python/pull/786#discussion_r1624861038 ## pyiceberg/io/pyarrow.py: ## @@ -1005,36 +1004,42 @@ def _task_to_table( columns=[col.name for col in file_project_schema.columns], ) -

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624862497 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -861,6 +862,19 @@ public static Builder buildFromEmpty() { return new Builder(); } +

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624863729 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2655,6 +2656,68 @@ public void testCleanupCleanableExceptionsReplace() { .isI

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624864787 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,8 +374,7 @@ private static TableMetadata create(TableOperations ops, UpdateTableR

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145832215 It looks like that [Nessie](https://www.dremio.com/press-releases/dremio-reinforces-ongoing-commitment-to-open-lakehouses-with-new-support-for-apache-iceberg-rest-catalog-specificati

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
dimas-b commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145848086 ATM, Nessie [has Iceberg REST API](https://github.com/projectnessie/nessie/pull/7043) on `main`, but it's not released yet. -- This is an automated message from the Apache Git

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
chayalipy commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145889373 Is there a release date? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Rest Catalog: `catalog.name` should not be part of namespace [iceberg-python]

2024-06-03 Thread via GitHub
c-thiel commented on issue #742: URL: https://github.com/apache/iceberg-python/issues/742#issuecomment-2145900655 @Fokko do you maybe have some thoughts on this? I am happy to prepare a PR, but would like to get some Feedback first. -- This is an automated message from the Apache Git Ser

[PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar opened a new pull request, #10433: URL: https://github.com/apache/iceberg/pull/10433 This is an alternative approach to https://github.com/apache/iceberg/pull/4912/files and https://github.com/apache/iceberg/pull/8221/files#diff-0b632866a3b10fac55c442b08178ec0ac72b3b6008782

[I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-03 Thread via GitHub
syun64 opened a new issue, #791: URL: https://github.com/apache/iceberg-python/issues/791 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 `schema_to_pyarrow` converts BinaryType to `pa.large_binary()` type. This creates inconsistencies wit

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624920900 ## aws/src/test/java/org/apache/iceberg/aws/s3/TestFuzzyS3InputStream.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624919847 ## aws/src/test/java/org/apache/iceberg/aws/s3/TestFuzzyS3InputStream.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

  1   2   >