Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-02-03 Thread via GitHub
manuzhang commented on PR #9584: URL: https://github.com/apache/iceberg/pull/9584#issuecomment-1925616713 Rebased on #9605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Avoid clone in TableMetadata Serialization [iceberg-rust]

2024-02-03 Thread via GitHub
liurenjie1024 commented on issue #184: URL: https://github.com/apache/iceberg-rust/issues/184#issuecomment-1925613604 I agree that this may be a little wasteful, but the cost maybe small: the heavy ones are hold by `Arc`. We can do this optimization when necessary. -- This is an automated

Re: [PR] refactor: rm async_trait and add trait_variant [iceberg-rust]

2024-02-03 Thread via GitHub
liurenjie1024 commented on code in PR #186: URL: https://github.com/apache/iceberg-rust/pull/186#discussion_r1477195324 ## crates/iceberg/src/catalog/mod.rs: ## @@ -25,16 +25,16 @@ use crate::spec::{ }; use crate::table::Table; use crate::{Error, ErrorKind, Result}; -use asyn

Re: [PR] feat: add handwritten serialize [iceberg-rust]

2024-02-03 Thread via GitHub
liurenjie1024 commented on PR #185: URL: https://github.com/apache/iceberg-rust/pull/185#issuecomment-1925611867 cc @ZENOTME @Xuanwo @Fokko PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] Build: Bump datamodel-code-generator from 0.25.2 to 0.25.3 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9639: URL: https://github.com/apache/iceberg/pull/9639 Bumps [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator) from 0.25.2 to 0.25.3. Release notes Sourced from https://github.com/koxudaxi/datamodel-code-ge

Re: [PR] Build: Bump mkdocs-material from 9.5.3 to 9.5.5 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] closed pull request #9566: Build: Bump mkdocs-material from 9.5.3 to 9.5.5 URL: https://github.com/apache/iceberg/pull/9566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] Build: Bump mkdocs-material from 9.5.3 to 9.5.7 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9638: URL: https://github.com/apache/iceberg/pull/9638 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.3 to 9.5.7. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdocs-ma

Re: [PR] Build: Bump mkdocs-material from 9.5.3 to 9.5.5 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] commented on PR #9566: URL: https://github.com/apache/iceberg/pull/9566#issuecomment-1925580877 Superseded by #9638. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-02-03 Thread via GitHub
manuzhang commented on PR #9611: URL: https://github.com/apache/iceberg/pull/9611#issuecomment-1925577015 @amogh-jahagirdar This is based on our discussion in [#9400](https://github.com/apache/iceberg/pull/9400#discussion_r1442236190), but I'd like to go one step further. Throwing exception

[PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.37.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9637: URL: https://github.com/apache/iceberg/pull/9637 Bumps [com.palantir.baseline:gradle-baseline-java](https://github.com/palantir/gradle-baseline) from 4.42.0 to 5.37.0. Release notes Sourced from https://github.com/palantir/gradle-b

Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.44.0.0 to 3.45.0.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] commented on PR #9540: URL: https://github.com/apache/iceberg/pull/9540#issuecomment-1925575022 Superseded by #9634. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.31.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9635: URL: https://github.com/apache/iceberg/pull/9635 Bumps [com.google.cloud:libraries-bom](https://github.com/googleapis/java-cloud-bom) from 26.28.0 to 26.31.0. Release notes Sourced from https://github.com/googleapis/java-cloud-bom/

Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.36.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] commented on PR #9567: URL: https://github.com/apache/iceberg/pull/9567#issuecomment-1925575127 Superseded by #9637. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] Build: Bump software.amazon.awssdk:bom from 2.23.12 to 2.23.17 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9633: URL: https://github.com/apache/iceberg/pull/9633 Bumps software.amazon.awssdk:bom from 2.23.12 to 2.23.17. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=softwa

Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.36.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] closed pull request #9567: Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.36.0 URL: https://github.com/apache/iceberg/pull/9567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Build: Bump io.delta:delta-standalone_2.12 from 0.6.0 to 3.1.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9636: URL: https://github.com/apache/iceberg/pull/9636 Bumps [io.delta:delta-standalone_2.12](https://github.com/delta-io/delta) from 0.6.0 to 3.1.0. Release notes Sourced from https://github.com/delta-io/delta/releases";>io.delta:delta-s

Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.30.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] commented on PR #9534: URL: https://github.com/apache/iceberg/pull/9534#issuecomment-1925575046 Superseded by #9635. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.30.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] closed pull request #9534: Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.30.0 URL: https://github.com/apache/iceberg/pull/9534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.44.0.0 to 3.45.0.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] closed pull request #9540: Build: Bump org.xerial:sqlite-jdbc from 3.44.0.0 to 3.45.0.0 URL: https://github.com/apache/iceberg/pull/9540 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Build: Bump org.xerial:sqlite-jdbc from 3.44.0.0 to 3.45.1.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9634: URL: https://github.com/apache/iceberg/pull/9634 Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.44.0.0 to 3.45.1.0. Release notes Sourced from https://github.com/xerial/sqlite-jdbc/releases";>org.xerial

[PR] Build: Bump jetty from 9.4.53.v20231009 to 11.0.20 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9632: URL: https://github.com/apache/iceberg/pull/9632 Bumps `jetty` from 9.4.53.v20231009 to 11.0.20. Updates `org.eclipse.jetty:jetty-server` from 9.4.53.v20231009 to 11.0.20 Updates `org.eclipse.jetty:jetty-servlet` from 9.4.53.v2023100

[PR] Build: Bump io.delta:delta-spark_2.12 from 3.0.0 to 3.1.0 [iceberg]

2024-02-03 Thread via GitHub
dependabot[bot] opened a new pull request, #9631: URL: https://github.com/apache/iceberg/pull/9631 Bumps [io.delta:delta-spark_2.12](https://github.com/delta-io/delta) from 3.0.0 to 3.1.0. Release notes Sourced from https://github.com/delta-io/delta/releases";>io.delta:delta-spark_

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
manuzhang commented on PR #9585: URL: https://github.com/apache/iceberg/pull/9585#issuecomment-1925574651 @rdblue Thanks for review. I've updated the PR accordingly. I have a question on what is a recommended usage and what's not. How is it conveyed? For example, I don't find much inf

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-03 Thread via GitHub
advancedxy commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1477165092 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + pub

Re: [I] when hdfs router restart, task failed with read data files " xx parquet is not a parquet file (length is too low :0) " or manifest "java.io.EOFException: Unexpected EOF with 4 bytes remaining

2024-02-03 Thread via GitHub
link3280 commented on issue #9071: URL: https://github.com/apache/iceberg/issues/9071#issuecomment-1925554243 We get the same problem here with Iceberg 1.3.0. The bug affects not only data files but also metadata.json and .avro files. The files created twice could be corrupted (1-2% c

Re: [PR] Parquet, Arrow: Rename BagePageReader to BasePageReader in VectorizedPageIterator [iceberg]

2024-02-03 Thread via GitHub
wgtmac commented on PR #9630: URL: https://github.com/apache/iceberg/pull/9630#issuecomment-1925552879 Thanks @amogh-jahagirdar for the quick review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Vectorized reads- eagerly decode parquet dictionary encoded data for fixed width types [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] commented on issue #835: URL: https://github.com/apache/iceberg/issues/835#issuecomment-1925495813 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Error while using bucket partitions [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] commented on issue #274: URL: https://github.com/apache/iceberg/issues/274#issuecomment-1925495705 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Error while using bucket partitions [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] closed issue #274: Error while using bucket partitions URL: https://github.com/apache/iceberg/issues/274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Vectorized reads - explore replacing DateDayVector and TimestampMicroTZVector with IntVector and BigIntVector respectively [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] commented on issue #834: URL: https://github.com/apache/iceberg/issues/834#issuecomment-1925495805 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] File leaking in RemoveSnapshots API [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] commented on issue #822: URL: https://github.com/apache/iceberg/issues/822#issuecomment-1925495799 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] can not create iceberg table by hive catalog in emr with maridb [iceberg]

2024-02-03 Thread via GitHub
github-actions[bot] commented on issue #798: URL: https://github.com/apache/iceberg/issues/798#issuecomment-1925495790 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [PR] Core: Add strictness flag to prevent loss of view representation when replacing a view [iceberg]

2024-02-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #9620: URL: https://github.com/apache/iceberg/pull/9620#discussion_r1477127468 ## core/src/main/java/org/apache/iceberg/rest/RESTViewOperations.java: ## @@ -59,6 +60,8 @@ public void commit(ViewMetadata base, ViewMetadata metadata) {

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

[I] Catalog table-default and table-override properties [iceberg-python]

2024-02-03 Thread via GitHub
syun64 opened a new issue, #362: URL: https://github.com/apache/iceberg-python/issues/362 ### Feature Request / Improvement In the java code base, catalog configuration includes catalog table-default and table-override properties: Catalog Property Key | Description -- | --

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128678 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9585: URL: https://github.com/apache/iceberg/pull/9585#discussion_r1477126737 ## docs/java-api-quickstart.md: ## @@ -38,37 +38,42 @@ The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9585: URL: https://github.com/apache/iceberg/pull/9585#discussion_r1477126762 ## docs/java-api-quickstart.md: ## @@ -38,37 +38,42 @@ The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9585: URL: https://github.com/apache/iceberg/pull/9585#discussion_r1477126396 ## docs/java-api-quickstart.md: ## @@ -38,37 +38,42 @@ The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9585: URL: https://github.com/apache/iceberg/pull/9585#discussion_r1477126293 ## docs/java-api-quickstart.md: ## @@ -38,37 +38,42 @@ The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog

Re: [PR] Docs: Enhance Java quickstart example [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9585: URL: https://github.com/apache/iceberg/pull/9585#discussion_r1477126245 ## docs/java-api-quickstart.md: ## @@ -38,37 +38,42 @@ The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog

Re: [PR] Core: Add strictness flag to prevent loss of view representation when replacing a view [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9620: URL: https://github.com/apache/iceberg/pull/9620#discussion_r1477126112 ## core/src/main/java/org/apache/iceberg/view/ViewProperties.java: ## @@ -26,6 +26,8 @@ public class ViewProperties { public static final String METADATA_COMPRESSION

Re: [PR] Core: Add strictness flag to prevent loss of view representation when replacing a view [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9620: URL: https://github.com/apache/iceberg/pull/9620#discussion_r1477125566 ## core/src/main/java/org/apache/iceberg/rest/RESTViewOperations.java: ## @@ -59,6 +60,8 @@ public void commit(ViewMetadata base, ViewMetadata metadata) { // this i

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9323: URL: https://github.com/apache/iceberg/pull/9323#discussion_r1477125444 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -221,34 +223,52 @@ protected boolean addsDeleteFiles() { /** Add a data file to the new s

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9323: URL: https://github.com/apache/iceberg/pull/9323#discussion_r1477125444 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -221,34 +223,52 @@ protected boolean addsDeleteFiles() { /** Add a data file to the new s

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on PR #9629: URL: https://github.com/apache/iceberg/pull/9629#issuecomment-1925450899 Thanks for the fix, @aokolnychyi! I think it's important to get this into 1.5 so I merged this. The method name should be okay. -- This is an automated message from the Apache Git Servic

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on PR #9466: URL: https://github.com/apache/iceberg/pull/9466#issuecomment-1925450414 Thanks, @bryanck! This is looking great and I'm excited to get the next steps in. Also thanks to @fqaiser94 for reviewing! -- This is an automated message from the Apache Git Serv

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-03 Thread via GitHub
rdblue merged PR #9629: URL: https://github.com/apache/iceberg/pull/9629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-03 Thread via GitHub
rdblue merged PR #9466: URL: https://github.com/apache/iceberg/pull/9466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
amogh-jahagirdar commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925449043 > For the case (no compression specified) the tests currently pass locally but they shouldn't as we never set zstd as the default The default parquet compression is Z

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477123601 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-03 Thread via GitHub
danielcweeks merged PR #9628: URL: https://github.com/apache/iceberg/pull/9628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925434691 Can you start CI @syun64? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477115866 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477115723 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925432345 @jonashaag thank you for raising the issue and putting this PR together so quickly! We are very excited to group this fix in with the impending 0.6.0 release. I've left some comments

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477114478 ## tests/integration/test_writes.py: ## @@ -489,6 +492,50 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.dele

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477114250 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-03 Thread via GitHub
kevinjqliu commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1925431063 Added example to "getting started" in #361 Didn't use `tempfile` there since I think it's useful to be able to see what data and metadata files are generated by Iceber

[PR] Get Started: Add sqlcatalog and local fs warehouse [iceberg-python]

2024-02-03 Thread via GitHub
kevinjqliu opened a new pull request, #361: URL: https://github.com/apache/iceberg-python/pull/361 Related to #326 [Add example of using PyIceberg with minimal external dependencies] -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477113970 ## tests/integration/test_writes.py: ## @@ -489,6 +492,50 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.dele

[PR] SqlCatalog, default create_engine echo to False [iceberg-python]

2024-02-03 Thread via GitHub
kevinjqliu opened a new pull request, #360: URL: https://github.com/apache/iceberg-python/pull/360 `create_engine`'s `echo` is useful for debugging purposes. Otherwise, it exposes a lot of internal SQLite information when it's not needed. ![Screenshot 2024-02-03 at 11 01 53

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477113742 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

[PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag opened a new pull request, #358: URL: https://github.com/apache/iceberg-python/pull/358 I had to change the `metadata_collector` code due to https://github.com/dask/dask/issues/7977. For the `` case (no compression specified) the tests currently pass locally but they should

Re: [I] `pyiceberg.io.pyarrow.write_file` does not take into account compression settings [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on issue #345: URL: https://github.com/apache/iceberg-python/issues/345#issuecomment-1925385537 Update: Hm, now it doesn't seem to be the case anymore. Not sure what happened there... -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Parquet, Arrow: Rename BagePageReader to BasePageReader in VectorizedPageIterator [iceberg]

2024-02-03 Thread via GitHub
amogh-jahagirdar merged PR #9630: URL: https://github.com/apache/iceberg/pull/9630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] `pyiceberg.io.pyarrow.write_file` does not take into account compression settings [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on issue #345: URL: https://github.com/apache/iceberg-python/issues/345#issuecomment-1925368819 I think the REST catalog ignores the `write.parquet.compression-codec` option. No matter what options I set for the catalog, it always responds here https://github.com/apache/

[PR] refactor: rm async_trait and add trait_variant [iceberg-rust]

2024-02-03 Thread via GitHub
odysa opened a new pull request, #186: URL: https://github.com/apache/iceberg-rust/pull/186 Close #139 Use `trait_variant::make` to support async fn in pub traits. It creates 2 traits `LocalCatalog` for single thread and `Catalog` with `Send` for multithreaded runtime. -- This i

[PR] Parquet, Arrow: Rename BagePageReader to BasePageReader in VectorizedPageIterator [iceberg]

2024-02-03 Thread via GitHub
wgtmac opened a new pull request, #9630: URL: https://github.com/apache/iceberg/pull/9630 I believe the BagePageReader class got its name due to a typo. Fortunately it is not a public class so we have the chance to fix it to BasePageReader. -- This is an automated message from the Apache

Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-02-03 Thread via GitHub
manuzhang commented on PR #9584: URL: https://github.com/apache/iceberg/pull/9584#issuecomment-1925339993 @amogh-jahagirdar In my case of Flink upserting Iceberg table, I'd rolled back table state to the previous snapshot and asked the upstream user to replay from the corresponding checkpoi

Re: [PR] Spark 3.3: Add RemoveDanglingDeletes action [iceberg]

2024-02-03 Thread via GitHub
zinking commented on code in PR #6581: URL: https://github.com/apache/iceberg/pull/6581#discussion_r1477056633 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RemoveDanglingDeletesSparkAction.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Spark 3.3: Add RemoveDanglingDeletes action [iceberg]

2024-02-03 Thread via GitHub
zinking commented on code in PR #6581: URL: https://github.com/apache/iceberg/pull/6581#discussion_r1477056187 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RemoveDanglingDeletesSparkAction.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-03 Thread via GitHub
BsoBird commented on PR #9546: URL: https://github.com/apache/iceberg/pull/9546#issuecomment-1925291024 @RussellSpitzer Hello. sir. I've optimised the hadoopCatalog implementation and I now believe that its execution behaviour is basically SPEC compliant. We don't need the CommitStateUn

[I] Distributed writes in the same iceberg transaction [iceberg-python]

2024-02-03 Thread via GitHub
rahij opened a new issue, #357: URL: https://github.com/apache/iceberg-python/issues/357 ### Feature Request / Improvement I am trying to understand how the new arrow write API can work with distributed writes similar to spark. I have a use case where from different machines, I would

Re: [PR] fix postgres catalog initialization when tables do not exist [iceberg-python]

2024-02-03 Thread via GitHub
rahij commented on PR #356: URL: https://github.com/apache/iceberg-python/pull/356#issuecomment-1925250020 @Fokko would you be the right person to review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] fix postgres catalog initialization when tables do not exist [iceberg-python]

2024-02-03 Thread via GitHub
rahij opened a new pull request, #356: URL: https://github.com/apache/iceberg-python/pull/356 When sqlalchemy encounters an error if the table does not exist, it raises a different exception from sqlite. Hence, when using postgres, it is not possible to even create the catalog, as the excep

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-03 Thread via GitHub
zeodtr commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925228247 @odysa Oh sorry, I totally misunderstood the code. Upon seeing the `expect()` message `"current_schema_id not found in schemas"`, I thought it's the case of `'current_schema_id ==