Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on PR #289: URL: https://github.com/apache/iceberg-python/pull/289#issuecomment-1903442816 Thanks for working on this @kevinjqliu. The issues was created a long time ago, before we had the SqlCatalog with sqlite support. Sqlite can also work [in memory](https://www.sqlite.or

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
mgmarino commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1461474013 ## pyiceberg/catalog/glue.py: ## @@ -84,19 +110,105 @@ def _construct_parameters( return new_parameters +def _type_to_glue_type_string(input_type: Iceber

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
mgmarino commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1461475758 ## pyiceberg/catalog/glue.py: ## @@ -84,19 +110,105 @@ def _construct_parameters( return new_parameters +def _type_to_glue_type_string(input_type: Iceber

Re: [PR] implement hive catalog `_commit_table` [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on PR #294: URL: https://github.com/apache/iceberg-python/pull/294#issuecomment-1903458521 Thanks for working on this @kevinjqliu 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] implement hive catalog `_commit_table` [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on code in PR #294: URL: https://github.com/apache/iceberg-python/pull/294#discussion_r1461468502 ## pyiceberg/catalog/hive.py: ## @@ -150,6 +150,7 @@ def _construct_hive_storage_descriptor(schema: Schema, location: Optional[str]) PROP_TABLE_TYPE = "table_type"

Re: [PR] implement hive catalog `_commit_table` [iceberg-python]

2024-01-22 Thread via GitHub
Fokko merged PR #294: URL: https://github.com/apache/iceberg-python/pull/294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
mgmarino commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903463726 Thanks, @HonahX, @nicor88. Yes, I verified the correct behavior as well (creation of table, schema change, etc.) before pushing, but I am happy to formalize this in an integr

Re: [I] InMemory Catalog [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on issue #293: URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1903462763 Forwarding my comment here as well: https://github.com/apache/iceberg-python/pull/289#issuecomment-1903442816 Maybe we should add this to the documentation of the SqlCatalog

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
nicor88 commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903491270 Regarding moto - I totally agree with you @mgmarino, it helps to test only API specs, and as the main catalog source of true is glue (also for athena), mocking athena via moto is not

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
HonahX commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1461512416 ## pyiceberg/catalog/glue.py: ## @@ -84,19 +110,105 @@ def _construct_parameters( return new_parameters +def _type_to_glue_type_string(input_type: IcebergT

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
mgmarino commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903532176 @HonahX I just realized that the integration tests for glue are not automatically collected/run in CI, so I guess these are just up to "us" to run by hand? That makes it a bit easie

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
HonahX commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903540111 @mgmarino You are correct. We need to use our own AWS accounts to run them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
nicor88 commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903624202 @mgmarino @HonahX - I was testing this, and after the change I confirm that I can query the table in Athena (I'm still doing some deep dive on why the table is not droppable in athen

Re: [I] Speeding up rewrite_data_files encountered concurrent write issue. [iceberg]

2024-01-22 Thread via GitHub
a8356555 commented on issue #9521: URL: https://github.com/apache/iceberg/issues/9521#issuecomment-1903635357 > @a8356555 yes, there could be conflicts from concurrent commit from multiple file groups with partial progress enabled. Usually, they will succeed eventually on retry. Part

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-22 Thread via GitHub
mgmarino commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1903703045 Integration tests added in 40ab6e617c96712d5020e06b741fdbbd1963e75a -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Core: Use correct headers for Multi-Table commits [iceberg]

2024-01-22 Thread via GitHub
nastra commented on code in PR #9523: URL: https://github.com/apache/iceberg/pull/9523#discussion_r1461677965 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -1022,11 +1022,17 @@ public void commitTransaction(SessionContext context, List commits)

Re: [I] refactor: Remove support of manifest list format as a list of file paths. [iceberg-rust]

2024-01-22 Thread via GitHub
hiirrxnn commented on issue #158: URL: https://github.com/apache/iceberg-rust/issues/158#issuecomment-1903753977 Could I work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] refactor: Remove `async_trait` in `Catalog` trait. [iceberg-rust]

2024-01-22 Thread via GitHub
hiirrxnn commented on issue #139: URL: https://github.com/apache/iceberg-rust/issues/139#issuecomment-1903756750 Could I work on this ? Please tell me what precisely is to be done here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] MetricsReporter support close [iceberg]

2024-01-22 Thread via GitHub
huyuanfeng2018 closed issue #9349: MetricsReporter support close URL: https://github.com/apache/iceberg/issues/9349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Spark: Bump Spark minor versions for 3.3 and 3.4 [iceberg]

2024-01-22 Thread via GitHub
ajantha-bhat commented on PR #9187: URL: https://github.com/apache/iceberg/pull/9187#issuecomment-1903942392 I think testcase was faliing for non-iceberg tables managed by spark session catalog. So, should not be an impact for iceberg (tables)/users? -- This is an automated message from t

Re: [PR] Support force option on RegisterTable procedure [iceberg]

2024-01-22 Thread via GitHub
abfisher0417 commented on PR #5327: URL: https://github.com/apache/iceberg/pull/5327#issuecomment-1904279772 I am also interested in this capability. I have a use case where I create/update Iceberg tables in an isolated cloud environment separate from consumers/readers of those tables. Cons

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-22 Thread via GitHub
syun64 commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1904289015 > what do we do with the name-mapping created in step 1 after the table is created? Do we just discard it or put it in schema.name-mapping.default? If the later, I think we need

Re: [I] REPLACE TABLE Support [iceberg-python]

2024-01-22 Thread via GitHub
syun64 commented on issue #281: URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1904310699 In order to reduce duplication of code, would it make sense to combine the job of [TypeUtil.assignFreshIds](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apac

Re: [I] REPLACE TABLE Support [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on issue #281: URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1904323323 @syun64 I started on #284 today. It re-uses the `UpdateSchema` API which already sets the right IDs, and maintains IDs for the existing fields. I don't think that doesn't help us

Re: [I] InMemory Catalog [iceberg-python]

2024-01-22 Thread via GitHub
asheeshgarg commented on issue #293: URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1904328604 @Fokko Is this https://github.com/apache/iceberg/pull/4518 supported as part of pycieberg? -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] When will the 0.6.0 version be released? [iceberg-python]

2024-01-22 Thread via GitHub
asheeshgarg commented on issue #192: URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1904330526 I am also looking for it. Any update on this will be helpful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Docs: Add clear indicators for required fields in Spark syntax on CREATE TABLE. [iceberg]

2024-01-22 Thread via GitHub
bitsondatadev opened a new issue, #9545: URL: https://github.com/apache/iceberg/issues/9545 > Is there a way to set a table column as an identifier field and required field using Spark DDL? I’m not seeing anything here … https://iceberg.apache.org/docs/latest/spark-ddl/ … which I assume me

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-22 Thread via GitHub
szehon-ho commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1462129885 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotP

Re: [PR] Parquet: Add system config for unsafe Parquet ID fallback. [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on code in PR #9324: URL: https://github.com/apache/iceberg/pull/9324#discussion_r1462131129 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -72,6 +72,19 @@ private SystemConfigs() {} 8, Integer::parseUnsignedInt); + /

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-22 Thread via GitHub
ajantha-bhat commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1462140319 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(Snapsh

Re: [I] When will the 0.6.0 version be released? [iceberg-python]

2024-01-22 Thread via GitHub
Fokko commented on issue #192: URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1904422158 We're almost ready to release, I would like to include one more bugfix that comes with Arrow 15 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-22 Thread via GitHub
szehon-ho commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1904441605 Hi @advancedxy , I'm ok to leave that for the next pr. How about we just keep the notes for PartitionField and SortOrder like? ``` 1. For partition fields with a transform wi

Re: [PR] Revert "Build: Bump org.apache.httpcomponents.client5:httpclient5 (#9260)" [iceberg]

2024-01-22 Thread via GitHub
amogh-jahagirdar merged PR #9544: URL: https://github.com/apache/iceberg/pull/9544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-22 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1462199036 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-22 Thread via GitHub
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1904506328 @nastra @jbonofre the PR is ready for a new round 😄 Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-22 Thread via GitHub
kevinjqliu commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1904513996 Thank you @HonahX In https://github.com/apache/iceberg-python/pull/289 I reimplemented `_commit_table` with `update_table_metadata` and still got the problem above.

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-22 Thread via GitHub
kevinjqliu commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1904519923 Somewhat related, I noticed that `Schema` class `__eq__` function does not check if the `schema_id`s are equal. See https://github.com/apache/iceberg-python/blob/a56838dc

Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-22 Thread via GitHub
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1462219314 ## tests/catalog/test_base.py: ## @@ -572,6 +379,11 @@ def test_commit_table(catalog: InMemoryCatalog) -> None: NestedField(4, "add", LongType()),

Re: [PR] Parquet: Add system config for unsafe Parquet ID fallback. [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on PR #9324: URL: https://github.com/apache/iceberg/pull/9324#issuecomment-1904527162 Merged. Thanks for reviewing, @aokolnychyi and @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Parquet: Add system config for unsafe Parquet ID fallback. [iceberg]

2024-01-22 Thread via GitHub
rdblue merged PR #9324: URL: https://github.com/apache/iceberg/pull/9324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Parquet: Add system config for unsafe Parquet ID fallback. [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on code in PR #9324: URL: https://github.com/apache/iceberg/pull/9324#discussion_r1462220664 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -1119,27 +1120,29 @@ public CloseableIterable build() { ParquetReadOptions options =

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-22 Thread via GitHub
stevenzwu commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1462236676 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -18,16 +18,44 @@ */ package org.apache.iceberg.flink.util; +import java

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
stevenzwu commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462251282 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -367,6 +382,8 @@ public void testSpecificSnapshotTimestamp(

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2024-01-22 Thread via GitHub
szehon-ho commented on PR #9335: URL: https://github.com/apache/iceberg/pull/9335#issuecomment-1904576031 Hi @hsiang-c , we took another look with @RussellSpitzer , it seems the manifests are de-duped on the traversal down, but the entries (referred to by manifests) should not be de-duped.

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1462292947 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/PartitionedAppendWriter.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1462294450 ## kafka-connect/build.gradle: ## @@ -30,3 +30,30 @@ project(":iceberg-kafka-connect:iceberg-kafka-connect-events") { useJUnitPlatform() } } + +project(":ice

Re: [I] InMemory Catalog [iceberg-python]

2024-01-22 Thread via GitHub
kevinjqliu commented on issue #293: URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1904641372 @Fokko Didn't know the in-memory sqlite option was available! That's awesome. I was able to read/write using the `SqlCatalog`. Metadata is saved in memory using sqlite

Re: [I] When will the 0.6.0 version be released? [iceberg-python]

2024-01-22 Thread via GitHub
asheeshgarg commented on issue #192: URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1904671807 @Fokko current write support doesn't support nested structure writes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-22 Thread via GitHub
rdblue commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462320114 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcUtil.java: ## @@ -18,6 +18,22 @@ */ package org.apache.iceberg.jdbc; +import static org.apache.iceberg.jdbc.Jd

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-22 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462324465 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcUtil.java: ## @@ -18,6 +18,22 @@ */ package org.apache.iceberg.jdbc; +import static org.apache.iceberg.jdbc.

Re: [I] InMemory Catalog [iceberg-python]

2024-01-22 Thread via GitHub
kevinjqliu commented on issue #293: URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1904684492 Looks like there's also a `InMemoryCatalog` in the Java lib lgithub.com/apache/iceberg/blob/0f509d2d678db2d7322dafded58ec0ca6d7fb268/core/src/main/java/org/apache/iceberg/i

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-22 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1462334518 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -18,16 +18,44 @@ */ package org.apache.iceberg.flink.util; +import jav

Re: [I] [BUG] CLI fails with Glue catalog because of missing URI [iceberg-python]

2024-01-22 Thread via GitHub
stefnba closed issue #255: [BUG] CLI fails with Glue catalog because of missing URI URL: https://github.com/apache/iceberg-python/issues/255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-22 Thread via GitHub
stevenzwu commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1462361510 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -18,16 +18,44 @@ */ package org.apache.iceberg.flink.util; +import java

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-01-22 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1462366234 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/PartitionedAppendWriter.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-22 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1904732583 Is the the conclusion to wait for them to provide that new utility or proceed and improve this PR according to our discussions? -- This is an automated message from the Apache Git Serv

Re: [I] Caused by: java.net.SocketException: Connection reset [iceberg]

2024-01-22 Thread via GitHub
javrasya commented on issue #9444: URL: https://github.com/apache/iceberg/issues/9444#issuecomment-1904737183 Thank you for jumping in @amogh-jahagirdar . The way it is not literally unusable for us so I had to write my own S3FileIO together with all the nested classes so that eventually I

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-22 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1904744066 @pvary , thank you for that. You are right. It is all immutable so it makes sense that rewrite operation would create another snapshot and I should be using that not a prior one to

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-22 Thread via GitHub
anupam-saini commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1904757960 Hello, I would like to put up a PR as per the discussion above if no one has started working already. Please let me know if this is fine. Also, @syun64 and I work together

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-22 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1462427200 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-22 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1462459442 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] API, Core, Spark: Change behavior of fastForward/replace to create the from branch if it does not exist [iceberg]

2024-01-22 Thread via GitHub
amogh-jahagirdar commented on PR #9196: URL: https://github.com/apache/iceberg/pull/9196#issuecomment-1904974901 Thanks for reviewing @nastra @rdblue , merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] API, Core, Spark: Change behavior of fastForward/replace to create the from branch if it does not exist [iceberg]

2024-01-22 Thread via GitHub
amogh-jahagirdar merged PR #9196: URL: https://github.com/apache/iceberg/pull/9196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

[PR] Build: Bump pyarrow from 14.0.2 to 15.0.0 [iceberg-python]

2024-01-22 Thread via GitHub
dependabot[bot] opened a new pull request, #295: URL: https://github.com/apache/iceberg-python/pull/295 Bumps [pyarrow](https://github.com/apache/arrow) from 14.0.2 to 15.0.0. Commits https://github.com/apache/arrow/commit/a61f4af724cd06c3a9b4abd20491345997e532c0";>a61f4af MINO

Re: [I] fast_forward does not work for the first commit in Spark [iceberg]

2024-01-22 Thread via GitHub
amogh-jahagirdar closed issue #8849: fast_forward does not work for the first commit in Spark URL: https://github.com/apache/iceberg/issues/8849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-22 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1462459442 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462562799 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/assigner/SplitAssigner.java: ## @@ -115,4 +115,7 @@ default void onCompletedSplits(Collection compl

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462562799 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/assigner/SplitAssigner.java: ## @@ -115,4 +115,7 @@ default void onCompletedSplits(Collection compl

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462566583 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -367,6 +382,8 @@ public void testSpecificSnapshotTimestamp()

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462568032 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462571996 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/MiniClusterResource.java: ## @@ -50,4 +51,18 @@ public static MiniClusterWithClientResource createWithClas

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462568032 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-22 Thread via GitHub
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462568032 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous

Re: [I] PARTITION_DATA_ID_START is hard-coded [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #449: URL: https://github.com/apache/iceberg/issues/449#issuecomment-1905060326 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Adding support for time-based partitioning on long column type [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #417: URL: https://github.com/apache/iceberg/issues/417#issuecomment-1905060298 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Write a dataFrame to a table while after add an 'optional longtype' column , then get some dirty data from the new column [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #463: URL: https://github.com/apache/iceberg/issues/463#issuecomment-1905060351 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Support External name mapping in Iceberg generic single message encoder [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #500: URL: https://github.com/apache/iceberg/issues/500#issuecomment-1905060474 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Iceberg Pig reader should support all catalogs [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #541: URL: https://github.com/apache/iceberg/issues/541#issuecomment-1905060559 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] add optional stringType column after a new long type column ,when write data get an exception [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #464: URL: https://github.com/apache/iceberg/issues/464#issuecomment-1905060367 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] TimestampWriter isn't being used [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #475: URL: https://github.com/apache/iceberg/issues/475#issuecomment-1905060384 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Should issue an error/warning message when no data file to delete [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #492: URL: https://github.com/apache/iceberg/issues/492#issuecomment-1905060431 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Respect commit.manifest.min-count-to-merge while appending manifests [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #490: URL: https://github.com/apache/iceberg/issues/490#issuecomment-1905060406 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Add back ability to set custom name on transformed field [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #495: URL: https://github.com/apache/iceberg/issues/495#issuecomment-1905060457 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Remove withPartitionPath from the public API [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #507: URL: https://github.com/apache/iceberg/issues/507#issuecomment-1905060488 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Update ORC version in Iceberg [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #515: URL: https://github.com/apache/iceberg/issues/515#issuecomment-1905060504 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Vectorize read of complex/nested data types [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #521: URL: https://github.com/apache/iceberg/issues/521#issuecomment-1905060540 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Support other data formats in Iceberg Pig reader [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #542: URL: https://github.com/apache/iceberg/issues/542#issuecomment-1905060581 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Make field access by name work for Avro schema and record apis in Iceberg generics module [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #571: URL: https://github.com/apache/iceberg/issues/571#issuecomment-1905060621 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Make field access by name work for Avro schema and record apis in Spark module [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #572: URL: https://github.com/apache/iceberg/issues/572#issuecomment-1905060649 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Make field access by name work for Avro schema and record apis in Avro generics module [iceberg]

2024-01-22 Thread via GitHub
github-actions[bot] commented on issue #570: URL: https://github.com/apache/iceberg/issues/570#issuecomment-1905060599 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] refactor: Remove support of manifest list format as a list of file paths. [iceberg-rust]

2024-01-22 Thread via GitHub
liurenjie1024 commented on issue #158: URL: https://github.com/apache/iceberg-rust/issues/158#issuecomment-1905130859 > Could I work on this? Sure, thanks for contributing! @hiirrxnn -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Speeding up rewrite_data_files encountered concurrent write issue. [iceberg]

2024-01-22 Thread via GitHub
manuzhang commented on issue #9521: URL: https://github.com/apache/iceberg/issues/9521#issuecomment-1905133774 You may try tuning the following configs to increase the chance of commit success for each group. ``` Property Default Description commit.retry.num-retries 4

Re: [I] refactor: Remove `async_trait` in `Catalog` trait. [iceberg-rust]

2024-01-22 Thread via GitHub
liurenjie1024 commented on issue #139: URL: https://github.com/apache/iceberg-rust/issues/139#issuecomment-1905134581 > Could I work on this ? Please tell me what precisely is to be done here? The release of rust 1.75 enables a fancy feature so that we no longer need to use `async_tra

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-01-22 Thread via GitHub
szehon-ho commented on code in PR #8907: URL: https://github.com/apache/iceberg/pull/8907#discussion_r1462602403 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -284,7 +284,7 @@ private Map tableOverrideProperties() { } } - protected static S

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-22 Thread via GitHub
HonahX commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1905147806 > I noticed that Schema class __eq__ function does not check if the schema_ids are equal. I think this is the intended behavior. We consider two schemas equal if they shar

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-22 Thread via GitHub
amogh-jahagirdar commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462645409 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -245,13 +270,17 @@ public List listTables(Namespace namespace) { row ->

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-22 Thread via GitHub
zinking commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1462652137 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotPro

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-22 Thread via GitHub
zinking commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1462652137 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotPro

Re: [I] REPLACE TABLE Support [iceberg-python]

2024-01-22 Thread via GitHub
syun64 commented on issue #281: URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1905187291 Hi @Fokko - sounds like you beat me to it 😄 Please let me know if you need any additional heavy lifting on #284 . Happy to help as always. The reason I was curious if there

Re: [PR] Support force option on RegisterTable procedure [iceberg]

2024-01-22 Thread via GitHub
yabola commented on PR #5327: URL: https://github.com/apache/iceberg/pull/5327#issuecomment-1905187665 @abfisher0417 Thank you. If community agrees, I can complete this PR again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

  1   2   >