Re: [PR] Bump pydantic from 2.7.2 to 2.7.3 [iceberg-python]

2024-06-03 Thread via GitHub
HonahX merged PR #795: URL: https://github.com/apache/iceberg-python/pull/795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Bump typing-extensions from 4.12.0 to 4.12.1 [iceberg-python]

2024-06-03 Thread via GitHub
HonahX merged PR #794: URL: https://github.com/apache/iceberg-python/pull/794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Support merge manifests on writes (MergeAppend) [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1625453929 ## pyiceberg/table/__init__.py: ## @@ -2751,10 +2824,12 @@ def _parquet_files_to_data_files(table_metadata: TableMetadata, file_paths: List class _MergingSnaps

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1625428863 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -991,7 +991,7 @@ public Builder assignUUID(String newUuid) { // it is only safe to set the fo

Re: [PR] Open-API: TableRequirements subclasses should inherit 'type' property [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #10434: URL: https://github.com/apache/iceberg/pull/10434#discussion_r1625435557 ## open-api/rest-catalog-open-api.py: ## @@ -361,23 +361,29 @@ class RemovePartitionStatisticsUpdate(BaseUpdate): class TableRequirement(BaseModel): -type: str

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1625424655 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -2581,6 +2583,45 @@ public void testConcurrentReplaceTransactionSortOrderConflict() { a

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10411: URL: https://github.com/apache/iceberg/pull/10411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
nastra commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2146711969 thanks for taking care of this @manuzhang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.27.0 to 2.28.0 [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10418: URL: https://github.com/apache/iceberg/pull/10418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.25.60 to 2.25.64 [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10421: URL: https://github.com/apache/iceberg/pull/10421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Error: Field not found in schema while writing to an iceberg table [iceberg]

2024-06-03 Thread via GitHub
tanweipeng commented on issue #6974: URL: https://github.com/apache/iceberg/issues/6974#issuecomment-2146572782 Hi @shivaprasad-basavaraj, may I know how you fix this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Make ManifestEntry and ManifestReader.liveEntries() as public [iceberg]

2024-06-03 Thread via GitHub
pudidic commented on issue #10425: URL: https://github.com/apache/iceberg/issues/10425#issuecomment-2146549699 I need to rewrite the files with manifest entry status with details. The details include data file numbers, sequence numbers, and snapshot number, which are not in the DataFile or

Re: [PR] Bump duckdb from 0.10.3 to 1.0.0 [iceberg-python]

2024-06-03 Thread via GitHub
HonahX merged PR #793: URL: https://github.com/apache/iceberg-python/pull/793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2146411518 > @zhongqishang: How is your sink/table created? What are the exact records you are sending to the sink? Your issue seems very similar to: #10076 @pvary Thanks for your

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2146407735 @Fokko > We cannot remove the versionHintFile since other systems might depend on it. If we want to do this, we should go through a deprecation cycle. The versionHintFile

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2146405147 > By some means, I assumed that you were hinting at an external service like a `{REST,Hive,Glue}` catalog. Because the FileIO does not allow list operations because that doesn't sc

Re: [I] Bump `HiveCatalog` hive-metastore dependency to Hive 4 [iceberg]

2024-06-03 Thread via GitHub
ochanism commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2146392311 @Fokko Thanks for your kind explanation. I understood the current situation. And the plan for unifying catalogs with the REST catalog looks amazing. I hope that it will be availab

Re: [PR] Open-API: TableRequirements subclasses should inherit 'type' property [iceberg]

2024-06-03 Thread via GitHub
anuragmantri commented on code in PR #10434: URL: https://github.com/apache/iceberg/pull/10434#discussion_r1625207650 ## open-api/rest-catalog-open-api.yaml: ## @@ -2603,30 +2603,29 @@ components: properties: type: type: "string" + enum: +

Re: [PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
flyrain commented on code in PR #10434: URL: https://github.com/apache/iceberg/pull/10434#discussion_r1625196927 ## open-api/rest-catalog-open-api.yaml: ## @@ -2603,30 +2603,29 @@ components: properties: type: type: "string" + enum: +

Re: [PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
anuragmantri commented on PR #10434: URL: https://github.com/apache/iceberg/pull/10434#issuecomment-2146255190 Thanks @Fokko and @flyrain - I think we should keep the inheritance and remove the duplicates in sub types. I updated this PR to do that. @flyrain this pattern of **discrimin

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625129655 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
pvary commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1625128576 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/TestMonitorSource.java: ## @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Fo

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
dimas-b commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2146229908 It might be best to talk about Nessie releases in the project's Zulip chat (the join link is on projectnessie.org) :) -- This is an automated message from the Apache Git Service

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
pvary commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1625118113 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/FlinkSqlExtension.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625116434 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (A

[PR] Bump pydantic from 2.7.2 to 2.7.3 [iceberg-python]

2024-06-03 Thread via GitHub
dependabot[bot] opened a new pull request, #795: URL: https://github.com/apache/iceberg-python/pull/795 Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.7.2 to 2.7.3. Release notes Sourced from https://github.com/pydantic/pydantic/releases";>pydantic's releases.

[PR] Bump typing-extensions from 4.12.0 to 4.12.1 [iceberg-python]

2024-06-03 Thread via GitHub
dependabot[bot] opened a new pull request, #794: URL: https://github.com/apache/iceberg-python/pull/794 Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.12.0 to 4.12.1. Release notes Sourced from https://github.com/python/typing_extensions/releases";>ty

[PR] Bump duckdb from 0.10.3 to 1.0.0 [iceberg-python]

2024-06-03 Thread via GitHub
dependabot[bot] opened a new pull request, #793: URL: https://github.com/apache/iceberg-python/pull/793 Bumps [duckdb](https://github.com/duckdb/duckdb) from 0.10.3 to 1.0.0. Release notes Sourced from https://github.com/duckdb/duckdb/releases";>duckdb's releases. DuckDB 1.0.

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
pvary commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1625114795 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/CollectingSink.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
pvary commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1625114022 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/CollectingSink.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625112317 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java: ## @@ -30,71 +42,99 @@ * {@link AggregatedStatistics} r

Re: [I] Bump `HiveCatalog` hive-metastore dependency to Hive 4 [iceberg]

2024-06-03 Thread via GitHub
pvary commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2146213149 @ochanism: If you are willing to take some risks, you might be able to create your own catalog implementation based on https://github.com/apache/hive/blob/master/iceberg/iceberg-cata

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625107060 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java: ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2146195992 @zhongqishang: How is your sink/table created? What are the exact records you are sending to the sink? Your issue seems very similar to: https://github.com/apache/iceberg/issues/1007

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625095134 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625092558 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625084195 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -47,151 +48,190 @@ * distribution to downstream subtas

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625083271 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java: ## @@ -158,30 +185,29 @@ private void ensureStarted() {

Re: [PR] Build: Remove links checker [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on PR #10404: URL: https://github.com/apache/iceberg/pull/10404#issuecomment-2146176153 @manuzhang Good catch, that one was originally added by @bitsondatadev. Let's keep that one in, it works pretty well: ``` Downloaded: 345.3MB. Content types: 96 image, 19112 t

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625081174 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java: ## @@ -104,30 +144,135 @@ AggregatedStatistics updateAndC

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1625080773 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java: ## @@ -30,71 +42,99 @@ * {@link AggregatedStatistics} r

Re: [PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
flyrain commented on PR #10434: URL: https://github.com/apache/iceberg/pull/10434#issuecomment-2146160956 We should go either with inheritance by using `AllOf`, or with `[discriminator](https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/)` in open API spec, but n

Re: [PR] AWS: Retain Glue Catalog column comment after updating Iceberg table [iceberg]

2024-06-03 Thread via GitHub
lawofcycles commented on PR #10276: URL: https://github.com/apache/iceberg/pull/10276#issuecomment-2146150455 Hi @geruh would you review the latest change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1624900416 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/ManifestOutputFileFactory.java: ## @@ -41,6 +42,16 @@ class ManifestOutputFileFactory { private

Re: [PR] Open-API: `AssertRefSnapshotId` type should be `branch` or `tag` [iceberg]

2024-06-03 Thread via GitHub
flyrain commented on code in PR #10423: URL: https://github.com/apache/iceberg/pull/10423#discussion_r1625037682 ## open-api/rest-catalog-open-api.yaml: ## @@ -2643,7 +2643,7 @@ components: properties: type: type: string - enum: [ "assert-ref-

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
singhpk234 commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1625032460 ## api/src/main/java/org/apache/iceberg/io/RetryableInputStream.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on PR #10434: URL: https://github.com/apache/iceberg/pull/10434#issuecomment-2146090622 Thanks for raising this @anuragmantri Looking at the changes in the code, it will remove the inheritance: ```python diff --git a/open-api/rest-catalog-open-api.py b/open-

Re: [PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
anuragmantri commented on PR #10434: URL: https://github.com/apache/iceberg/pull/10434#issuecomment-2146087005 @flyrain could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Open-API: TableRequirements subclasses need not inherit parent properties [iceberg]

2024-06-03 Thread via GitHub
anuragmantri opened a new pull request, #10434: URL: https://github.com/apache/iceberg/pull/10434 Since we are using the [discriminator](https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/) field, and do not have any common properties between the parent and child

Re: [I] How do I find if there is residual in the table scan/plan files? [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on issue #785: URL: https://github.com/apache/iceberg-python/issues/785#issuecomment-2146079682 @maytasm The old evaluator might be a good starting point as it is almost a 1-to-1 copy of the Java implementation. I would double check if there are additions to the Java Residua

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
danielcweeks commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1625018098 ## api/src/main/java/org/apache/iceberg/io/RetryableInputStream.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1625017201 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3InputStream.java: ## @@ -139,7 +140,11 @@ private InputStream readRange(String range) { S3RequestUtil

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
singhpk234 commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1625011756 ## api/src/main/java/org/apache/iceberg/io/RetryableInputStream.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [I] How do I find if there is residual in the table scan/plan files? [iceberg-python]

2024-06-03 Thread via GitHub
maytasm commented on issue #785: URL: https://github.com/apache/iceberg-python/issues/785#issuecomment-2146039778 @Fokko I can look into contributing. I am not too familiar with the new pyiceberg rewrite (current state of this library) but was wondering if it would be something like porting

Re: [I] How do I find if there is residual in the table scan/plan files? [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on issue #785: URL: https://github.com/apache/iceberg-python/issues/785#issuecomment-2146014752 Hey @maytasm Thanks for raising this. We don't have the [ResidualEvaluator] today, but it would be great to add that. We can take inspiration from Java: https://github.com/apache/

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
danielcweeks commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624953864 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3InputStream.java: ## @@ -139,7 +140,11 @@ private InputStream readRange(String range) { S3RequestUtil.con

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145959788 Thanks for the additional context. This is where I got confused: > we can find the latest version of the commit by some means By some means, I assumed that you were hinti

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624930096 ## gradle/libs.versions.toml: ## @@ -37,6 +37,7 @@ delta-standalone = "3.1.0" delta-spark = "3.2.0" esotericsoftware-kryo = "4.0.3" errorprone-annotations

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624927489 ## gradle/libs.versions.toml: ## @@ -95,7 +96,7 @@ antlr-antlr4 = { module = "org.antlr:antlr4", version.ref = "antlr" } antlr-runtime = { module = "org.an

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624925629 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3InputStream.java: ## @@ -139,7 +140,11 @@ private InputStream readRange(String range) { S3RequestUtil

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624919847 ## aws/src/test/java/org/apache/iceberg/aws/s3/TestFuzzyS3InputStream.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624919847 ## aws/src/test/java/org/apache/iceberg/aws/s3/TestFuzzyS3InputStream.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1624920900 ## aws/src/test/java/org/apache/iceberg/aws/s3/TestFuzzyS3InputStream.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

2024-06-03 Thread via GitHub
syun64 opened a new issue, #791: URL: https://github.com/apache/iceberg-python/issues/791 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 `schema_to_pyarrow` converts BinaryType to `pa.large_binary()` type. This creates inconsistencies wit

[PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar opened a new pull request, #10433: URL: https://github.com/apache/iceberg/pull/10433 This is an alternative approach to https://github.com/apache/iceberg/pull/4912/files and https://github.com/apache/iceberg/pull/8221/files#diff-0b632866a3b10fac55c442b08178ec0ac72b3b6008782

Re: [I] Rest Catalog: `catalog.name` should not be part of namespace [iceberg-python]

2024-06-03 Thread via GitHub
c-thiel commented on issue #742: URL: https://github.com/apache/iceberg-python/issues/742#issuecomment-2145900655 @Fokko do you maybe have some thoughts on this? I am happy to prepare a PR, but would like to get some Feedback first. -- This is an automated message from the Apache Git Ser

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
chayalipy commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145889373 Is there a release date? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
dimas-b commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145848086 ATM, Nessie [has Iceberg REST API](https://github.com/projectnessie/nessie/pull/7043) on `main`, but it's not released yet. -- This is an automated message from the Apache Git

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145832215 It looks like that [Nessie](https://www.dremio.com/press-releases/dremio-reinforces-ongoing-commitment-to-open-lakehouses-with-new-support-for-apache-iceberg-rest-catalog-specificati

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624864787 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,8 +374,7 @@ private static TableMetadata create(TableOperations ops, UpdateTableR

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624863729 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2655,6 +2656,68 @@ public void testCleanupCleanableExceptionsReplace() { .isI

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
hantangwangd commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624862497 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -861,6 +862,19 @@ public static Builder buildFromEmpty() { return new Builder(); } +

Re: [PR] Support `Table.to_arrow_batches` to return Iterator[Recordbatch] instead of a fully materialized Arrow Table [iceberg-python]

2024-06-03 Thread via GitHub
Fokko commented on code in PR #786: URL: https://github.com/apache/iceberg-python/pull/786#discussion_r1624861038 ## pyiceberg/io/pyarrow.py: ## @@ -1005,36 +1004,42 @@ def _task_to_table( columns=[col.name for col in file_project_schema.columns], ) -

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-06-03 Thread via GitHub
stevenzwu commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1624777278 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/CollectingSink.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-06-03 Thread via GitHub
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1624850309 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,113 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}

Re: [I] ORC file format support [iceberg-python]

2024-06-03 Thread via GitHub
MehulBatra commented on issue #20: URL: https://github.com/apache/iceberg-python/issues/20#issuecomment-2145803829 Initial Progress: https://github.com/apache/iceberg-python/pull/790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Support Nessie catalog [iceberg-python]

2024-06-03 Thread via GitHub
alonahmias commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2145800170 Hi, we would like to contribute to this issue, is it possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] Add support for orc format [iceberg-python]

2024-06-03 Thread via GitHub
MehulBatra opened a new pull request, #790: URL: https://github.com/apache/iceberg-python/pull/790 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145692918 In our production environment, we use hadoopcatalog heavily. After fixing the above issues, it performs very well. -- This is an automated message from the Apache Git Service. To

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145684619 However, whether or not we use a versionHint file, we can find the latest version of the commit by some means. In other words, as long as the metadata file is renamed, the commit i

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145678553 @Fokko Because, for the file system catalog, the client's behaviour is currently to write to a temp file with a random ID and then rename the file to complete the commit. This

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX commented on PR #748: URL: https://github.com/apache/iceberg-python/pull/748#issuecomment-2145671039 Merged! Thanks @chinmay-bhat for the great work! Thanks @Fokko @syun64 @ndrluis for the review and discussions! -- This is an automated message from the Apache Git Service. To respo

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145670035 @Fokko We just need to drop the use of the versionHint file, and the hadoopcatalog is now atomic. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
HonahX merged PR #748: URL: https://github.com/apache/iceberg-python/pull/748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

2024-06-03 Thread via GitHub
Fokko closed issue #788: Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID URL: https://github.com/apache/iceberg-python/issues/788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] `include_field_ids` flag in `schema_to_pyarrow` [iceberg-python]

2024-06-03 Thread via GitHub
Fokko merged PR #789: URL: https://github.com/apache/iceberg-python/pull/789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Support getting snapshot at or right before the given timestamp [iceberg-python]

2024-06-03 Thread via GitHub
chinmay-bhat commented on PR #748: URL: https://github.com/apache/iceberg-python/pull/748#issuecomment-2145641873 Thank you for the review @Fokko and @HonahX ! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] `include_field_ids` flag in `schema_to_pyarrow` [iceberg-python]

2024-06-03 Thread via GitHub
syun64 commented on PR #789: URL: https://github.com/apache/iceberg-python/pull/789#issuecomment-2145637904 Thanks for the review @HonahX ! Could I ask for your help in getting this PR merged? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
Fokko commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145552594 @BsoBird I'm sorry I misinterpreted what you're suggesting. What kind of alternative are you suggesting for the version hint file? One of the requirements of the File System Catalog

Re: [I] Inconsistent PyArrow Schema Field Metadata on `project_table`: Parquet Field ID [iceberg-python]

2024-06-03 Thread via GitHub
syun64 commented on issue #788: URL: https://github.com/apache/iceberg-python/issues/788#issuecomment-2145537794 Thank you for the input @Fokko - sounds good 👍 I've put up https://github.com/apache/iceberg-python/pull/789 to fix this issue -- This is an automated message from the

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-06-03 Thread via GitHub
liurenjie1024 commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1624624064 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -221,67 +413,215 @@ impl ManifestFilterVisitor<'_> { let pos = reference.accessor().p

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145515417 @Fokko HadoopCatalog is working fine after fixing the problems associated with it, so why remove it? -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Should we remove the use of versionHintFile from the entire FileSystemCatalog? [iceberg]

2024-06-03 Thread via GitHub
BsoBird commented on issue #10427: URL: https://github.com/apache/iceberg/issues/10427#issuecomment-2145510358 @Fokko Sir, I don't quite understand, do you mean we will delete the whole hadoopCatalog? But we have a large number of customers who are using hadoopCatalog. -- This is an

Re: [PR] Docs: Fix internal links in 1.5.x releases [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on PR #10411: URL: https://github.com/apache/iceberg/pull/10411#issuecomment-2145501680 @nastra As @Fokko said, #9965 wan't merged in time for 1.5.x branch and this PR basically back-ports the patch (resolving minor conflicts). I've built site locally and manually checke

Re: [PR] Build: Remove links checker [iceberg]

2024-06-03 Thread via GitHub
manuzhang commented on PR #10404: URL: https://github.com/apache/iceberg/pull/10404#issuecomment-2145446188 @Fokko I'm good with leaving in the link checker, but we need to update the README which refers to another [python linkchecker](https://github.com/linkchecker/linkchecker). --

Re: [PR] Core: Introduce AuthConfig [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar merged PR #10161: URL: https://github.com/apache/iceberg/pull/10161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Add PrePlanTable and PlanTable Endpoints to open api spec [iceberg]

2024-06-03 Thread via GitHub
jackye1995 commented on code in PR #9695: URL: https://github.com/apache/iceberg/pull/9695#discussion_r1623316927 ## open-api/rest-catalog-open-api.yaml: ## @@ -537,6 +537,124 @@ paths: 5XX: $ref: '#/components/responses/ServerErrorResponse' + /v1/{prefix}

Re: [PR] Parquet: Remove deprecated TestHelpers in parquet module [iceberg]

2024-06-03 Thread via GitHub
nastra merged PR #10428: URL: https://github.com/apache/iceberg/pull/10428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Make ManifestEntry and ManifestReader.liveEntries() as public [iceberg]

2024-06-03 Thread via GitHub
amogh-jahagirdar commented on issue #10425: URL: https://github.com/apache/iceberg/issues/10425#issuecomment-2145346311 @pudidic If you want to read the entries wouldn't using the ManifestFiles.read() API be sufficient? Then you could iterate over the data files via the `iterator` API, and

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-06-03 Thread via GitHub
nastra commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1624512760 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,8 +374,7 @@ private static TableMetadata create(TableOperations ops, UpdateTableRequest

  1   2   >