Re: [PR] Update table metadata throughout transaction [iceberg-python]

2024-02-27 Thread via GitHub
HonahX commented on code in PR #471: URL: https://github.com/apache/iceberg-python/pull/471#discussion_r1505504295 ## pyiceberg/table/__init__.py: ## @@ -219,68 +220,41 @@ def property_as_int(properties: Dict[str, str], property_name: str, default: Opt class Transaction:

Re: [PR] Update table metadata throughout transaction [iceberg-python]

2024-02-27 Thread via GitHub
HonahX commented on code in PR #471: URL: https://github.com/apache/iceberg-python/pull/471#discussion_r1505504295 ## pyiceberg/table/__init__.py: ## @@ -219,68 +220,41 @@ def property_as_int(properties: Dict[str, str], property_name: str, default: Opt class Transaction:

Re: [PR] Update table metadata throughout transaction [iceberg-python]

2024-02-27 Thread via GitHub
HonahX commented on code in PR #471: URL: https://github.com/apache/iceberg-python/pull/471#discussion_r1505495519 ## pyiceberg/table/__init__.py: ## @@ -219,68 +220,41 @@ def property_as_int(properties: Dict[str, str], property_name: str, default: Opt class Transaction:

Re: [I] Why does FlinkSink writes position deletes in append-mode if identifier fields are specified? [iceberg]

2024-02-27 Thread via GitHub
tibercus commented on issue #9773: URL: https://github.com/apache/iceberg/issues/9773#issuecomment-1968339465 @pvary Hi! > What do you mean by > > > FlinkSink in append-mode I mean appending data with DataStream: https://iceberg.apache.org/docs/latest/flink-write

Re: [PR] feat: Add expression builder and display. [iceberg-rust]

2024-02-27 Thread via GitHub
liurenjie1024 commented on code in PR #169: URL: https://github.com/apache/iceberg-rust/pull/169#discussion_r1505434226 ## crates/iceberg/src/expr/term.rs: ## @@ -17,21 +17,91 @@ //! Term definition. -use crate::spec::NestedFieldRef; +use crate::expr::{BinaryExpression, Pre

Re: [PR] Flink: Supports specifying comment for iceberg fields in create table and addcolumn syntax using flinksql [iceberg]

2024-02-27 Thread via GitHub
pvary commented on code in PR #9606: URL: https://github.com/apache/iceberg/pull/9606#discussion_r1505428596 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/FlinkSchemaUtil.java: ## @@ -68,6 +75,42 @@ public static Schema convert(TableSchema schema) { return fre

Re: [PR] Flink: Supports specifying comment for iceberg fields in create table and addcolumn syntax using flinksql [iceberg]

2024-02-27 Thread via GitHub
pvary commented on code in PR #9606: URL: https://github.com/apache/iceberg/pull/9606#discussion_r1505427401 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/FlinkDynamicTableFactory.java: ## @@ -187,7 +188,7 @@ private static TableLoader createTableLoader( // Cr

Re: [PR] Flink: Adds support for 1.18 version [iceberg]

2024-02-27 Thread via GitHub
pvary commented on PR #9211: URL: https://github.com/apache/iceberg/pull/9211#issuecomment-1968313622 @andrea-zanetti, @FranMorilloAWS: Iceberg 1.5.0 will be the first version which supports Flink 1.18.x. It will be released soon, as the release process is already started -- This is an a

Re: [PR] [WIP] feat: basic implementation of dynamodb catalog [iceberg-rust]

2024-02-27 Thread via GitHub
Xuanwo commented on PR #223: URL: https://github.com/apache/iceberg-rust/pull/223#issuecomment-1968229710 > They suggest to use Glue. Let me figure out does it work for me. cc @Xuanwo Thanks! Feel free to implement the DynamoDB catalog in iceberg-rust. I just wanted to update you on t

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg]

2024-02-27 Thread via GitHub
odysa commented on PR #9783: URL: https://github.com/apache/iceberg/pull/9783#issuecomment-1968212249 Some catalogs could be community-owned. While the Iceberg team may officially deprecate some catalogs, open-source communities can still maintain them, like @jackye1995 suggested -- T

Re: [PR] feat: Add expression builder and display. [iceberg-rust]

2024-02-27 Thread via GitHub
Xuanwo commented on code in PR #169: URL: https://github.com/apache/iceberg-rust/pull/169#discussion_r1505328726 ## crates/iceberg/src/expr/mod.rs: ## @@ -18,25 +18,126 @@ //! This module contains expressions. mod term; + +use std::fmt::{Display, Formatter}; + pub use term:

[I] [bug]OversizedAllocationException when query data with Spark [iceberg]

2024-02-27 Thread via GitHub
zhangpenggh opened a new issue, #9820: URL: https://github.com/apache/iceberg/issues/9820 ### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I created an Iceberg table, wrote data into it using Flink, and w

Re: [PR] [WIP] feat: basic implementation of dynamodb catalog [iceberg-rust]

2024-02-27 Thread via GitHub
odysa commented on PR #223: URL: https://github.com/apache/iceberg-rust/pull/223#issuecomment-1968164810 They suggest to use Glue. Let me figure out does it work for me. cc @Xuanwo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [WIP] feat: basic implementation of dynamodb catalog [iceberg-rust]

2024-02-27 Thread via GitHub
Xuanwo commented on PR #223: URL: https://github.com/apache/iceberg-rust/pull/223#issuecomment-1968154839 Based on the discussion, it appears that few people utilize this catalog. Do you have a specific use case for it? -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [WIP] feat: basic implementation of dynamodb catalog [iceberg-rust]

2024-02-27 Thread via GitHub
Xuanwo commented on PR #223: URL: https://github.com/apache/iceberg-rust/pull/223#issuecomment-1968141751 There's a discussion about `Deprecate DynamodbCatalog` on dev@i.a.o. Maybe we should join in? LINK: https://lists.apache.org/thread/b92zwcvy7917bhr5t00b07f7r8qqwxqj -- This is

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-02-27 Thread via GitHub
zinking commented on PR #8797: URL: https://github.com/apache/iceberg/pull/8797#issuecomment-1968104833 let me update the PR, and @nastra can have another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] OpenAPI: Add ContentFile types to spec for the PreplanTable and PlanTable API [iceberg]

2024-02-27 Thread via GitHub
stevenzwu commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1505261085 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,274 @@ components: type: integer format: int64 +BooleanTypeValue: + type:

Re: [PR] Docs: Fix links to internal files [iceberg]

2024-02-27 Thread via GitHub
manuzhang commented on PR #9819: URL: https://github.com/apache/iceberg/pull/9819#issuecomment-1968065010 @bitsondatadev PTLK when you return from vacation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Docs: Fix links to internal files [iceberg]

2024-02-27 Thread via GitHub
manuzhang opened a new pull request, #9819: URL: https://github.com/apache/iceberg/pull/9819 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505220977 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -234,7 +254,7 @@ public long newSnapshotId() { } @VisibleForTesting - Path ge

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505220977 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -234,7 +254,7 @@ public long newSnapshotId() { } @VisibleForTesting - Path ge

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505229636 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,64 +309,105 @@ Path versionHintFile() { return metadataPath(Util.VERSION_HIN

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505224636 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path dst, int

Re: [PR] feat: Add expression builder and display. [iceberg-rust]

2024-02-27 Thread via GitHub
liurenjie1024 commented on code in PR #169: URL: https://github.com/apache/iceberg-rust/pull/169#discussion_r1505227423 ## crates/iceberg/src/expr/mod.rs: ## @@ -64,3 +71,73 @@ impl Display for PredicateOperator { } } } + +impl PredicateOperator { +/// Check i

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505225248 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path dst, int

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505225004 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path dst, int

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505224636 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path dst, int

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505223028 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -157,18 +158,37 @@ public void commit(TableMetadata base, TableMetadata metadata) {

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505222429 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -432,7 +500,7 @@ private void deleteRemovedMetadataFiles(TableMetadata base, TableMet

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505220977 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -234,7 +254,7 @@ public long newSnapshotId() { } @VisibleForTesting - Path ge

Re: [PR] API, Core: add multi-arg transform and add zOrder as the first one [iceberg]

2024-02-27 Thread via GitHub
szehon-ho commented on code in PR #9662: URL: https://github.com/apache/iceberg/pull/9662#discussion_r1505175057 ## api/src/main/java/org/apache/iceberg/StructTransform.java: ## @@ -51,11 +53,16 @@ class StructTransform implements StructLike, Serializable { this.transforms

Re: [I] Optimize generation of CombinedScanTask for RewriteDataFilesAction [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1667: URL: https://github.com/apache/iceberg/issues/1667#issuecomment-1967947499 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Do we need to provide a more detailed document for HIVE ? [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1668: URL: https://github.com/apache/iceberg/issues/1668#issuecomment-1967947541 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Action: support spark3 and customer catalog [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1662: URL: https://github.com/apache/iceberg/issues/1662#issuecomment-1967947441 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Extend HadoopTableOperations to also work with other FS guarantees [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1655: URL: https://github.com/apache/iceberg/issues/1655#issuecomment-1967947394 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Outline approach for extending APIs to support cross table operations [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1647: URL: https://github.com/apache/iceberg/issues/1647#issuecomment-1967947348 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] There is a vulnerability in spotless-plugin-gradle 3.14.0 ,upgrade recommended [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1642: URL: https://github.com/apache/iceberg/issues/1642#issuecomment-1967947297 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Implement Initial Spark Structured Streaming Source [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1628: URL: https://github.com/apache/iceberg/issues/1628#issuecomment-1967947253 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Documentation/FAQ: How does time travel work? [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1347: URL: https://github.com/apache/iceberg/issues/1347#issuecomment-1967946926 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Documentation/FAQ: How does time travel work? [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] closed issue #1347: Documentation/FAQ: How does time travel work? URL: https://github.com/apache/iceberg/issues/1347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Spark 3: Consider providing better support for path-based tables [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] closed issue #1306: Spark 3: Consider providing better support for path-based tables URL: https://github.com/apache/iceberg/issues/1306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Spark 3: Consider providing better support for path-based tables [iceberg]

2024-02-27 Thread via GitHub
github-actions[bot] commented on issue #1306: URL: https://github.com/apache/iceberg/issues/1306#issuecomment-1967946879 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] [WIP] Migrate other spark classes to JUnit 5 [iceberg]

2024-02-27 Thread via GitHub
tomtongue commented on PR #9817: URL: https://github.com/apache/iceberg/pull/9817#issuecomment-1967937157 @nastra Migrate the remaining classes to JUnit 5 and remove the old test bases. Could you review them when you have time? If there's still JUnit 4 files, please let me know. -- This

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg-python]

2024-02-27 Thread via GitHub
geruh commented on code in PR #475: URL: https://github.com/apache/iceberg-python/pull/475#discussion_r1505155942 ## pyiceberg/catalog/dynamodb.py: ## @@ -81,6 +82,10 @@ ITEM = "Item" +@deprecated( +deprecated_in="0.6.0", Review Comment: Hey @hussein-awala, yeah I a

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg-python]

2024-02-27 Thread via GitHub
geruh commented on PR #475: URL: https://github.com/apache/iceberg-python/pull/475#issuecomment-1967930755 > Ah, just saw this comment: [apache/iceberg#9783 (comment)](https://github.com/apache/iceberg/pull/9783#issuecomment-1965781361). So maybe we can wait for a while to see how the discu

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg-python]

2024-02-27 Thread via GitHub
geruh commented on code in PR #475: URL: https://github.com/apache/iceberg-python/pull/475#discussion_r1505155942 ## pyiceberg/catalog/dynamodb.py: ## @@ -81,6 +82,10 @@ ITEM = "Item" +@deprecated( +deprecated_in="0.6.0", Review Comment: Hey @hussein-awala, yeah we

Re: [PR] OpenAPI: Add ContentFile types to spec for scan API [iceberg]

2024-02-27 Thread via GitHub
jackye1995 commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1505122910 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,281 @@ components: type: integer format: int64 +BooleanTypeValue: + type

Re: [I] Create table from plain Parquet files [iceberg-python]

2024-02-27 Thread via GitHub
syun64 commented on issue #445: URL: https://github.com/apache/iceberg-python/issues/445#issuecomment-1967800444 Thank you for the explanation @HonahX . Yes that's really great insight. I'm definitely in support of a CreateTableTransaction, because that's what we will need to support `CREAT

Re: [I] Parallel Table.append [iceberg-python]

2024-02-27 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1967795309 Here's the script I used to run the `append()` function to use 8 threads to write multiple parquet files https://gist.github.com/kevinjqliu/e738641ec8f96de554c5ed39ead3f09a

Re: [I] Parallel Table.append [iceberg-python]

2024-02-27 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1967788193 Thanks! It thanks! I can see that its using 8 threads with ``` SELECT * FROM duckdb_settings(); ``` I also ran ``` SET threads TO 8; ``` just i

Re: [PR] OpenAPI: Add ContentFile types to spec for scan API [iceberg]

2024-02-27 Thread via GitHub
geruh commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1505085892 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,297 @@ components: type: integer format: int64 +BooleanTypeValue: + type: boo

Re: [I] Parallel Table.append [iceberg-python]

2024-02-27 Thread via GitHub
bigluck commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1967730771 @kevinjqliu nice, duckdb should use https://duckdb.org/docs/sql/configuration.html -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Parallel Table.append [iceberg-python]

2024-02-27 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1967707048 As a way to benchmark multithreaded writes to multiple parquet files, I've noticed that Duckdb's COPY command has the `per_thread_output` and `file_size_bytes` options.

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg-python]

2024-02-27 Thread via GitHub
hussein-awala commented on code in PR #475: URL: https://github.com/apache/iceberg-python/pull/475#discussion_r1505068598 ## pyiceberg/catalog/dynamodb.py: ## @@ -81,6 +82,10 @@ ITEM = "Item" +@deprecated( +deprecated_in="0.6.0", Review Comment: `0.6.0` was already

Re: [PR] OpenAPI: Add ContentFile types to spec for scan API [iceberg]

2024-02-27 Thread via GitHub
rdblue commented on PR #9717: URL: https://github.com/apache/iceberg/pull/9717#issuecomment-1967671600 Mostly looks good to me. I flagged a couple of minor things. Also, I don't think that we resolved this thread: https://github.com/apache/iceberg/pull/9717#discussion_r1501928527 --

Re: [PR] OpenAPI: Add ContentFile types to spec for scan and append api [iceberg]

2024-02-27 Thread via GitHub
rdblue commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1505053034 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,297 @@ components: type: integer format: int64 +BooleanTypeValue: + type: bo

Re: [PR] OpenAPI: Add ContentFile types to spec for scan and append api [iceberg]

2024-02-27 Thread via GitHub
rdblue commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1505050392 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,297 @@ components: type: integer format: int64 +BooleanTypeValue: + type: bo

Re: [PR] Make issued_token_type optional to support OAuth2 Client Credential Flow [iceberg-python]

2024-02-27 Thread via GitHub
flyrain commented on PR #466: URL: https://github.com/apache/iceberg-python/pull/466#issuecomment-1967655744 Thanks @syun64 for the review. We will need at least an approval from committers. cc @Fokko @danielcweeks -- This is an automated message from the Apache Git Service. To respond to

[PR] [WIP] feat: basic implementation of dynamodb catalog [iceberg-rust]

2024-02-27 Thread via GitHub
odysa opened a new pull request, #223: URL: https://github.com/apache/iceberg-rust/pull/223 todo * [ ] list namesapces * [ ] Integration Test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on PR #9803: URL: https://github.com/apache/iceberg/pull/9803#issuecomment-1967631379 Thanks for reviewing @RussellSpitzer, fixed all the review comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505026488 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -149,26 +183,71 @@ public void commit(TableMetadata base, TableMetadata metada

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505023443 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -234,7 +254,7 @@ public long newSnapshotId() { } @VisibleForTesting -

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505022747 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,64 +309,105 @@ Path versionHintFile() { return metadataPath(Util.VERS

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505021238 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,64 +309,105 @@ Path versionHintFile() { return metadataPath(Util.VERS

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505001461 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505016752 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505016107 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -432,7 +500,7 @@ private void deleteRemovedMetadataFiles(TableMetadata base, T

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505014461 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -157,18 +158,37 @@ public void commit(TableMetadata base, TableMetadata metada

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505013367 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,64 +309,105 @@ Path versionHintFile() { return metadataPath(Util.VERS

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505011454 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505010374 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505005414 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1505001461 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -368,58 +431,63 @@ private void renameToFinal(FileSystem fs, Path src, Path ds

Re: [PR] feat: glue table creation with some docs on testing [iceberg-go]

2024-02-27 Thread via GitHub
wolfeidau commented on code in PR #59: URL: https://github.com/apache/iceberg-go/pull/59#discussion_r1504998764 ## table/metadata.go: ## @@ -399,3 +400,32 @@ func (m *MetadataV2) UnmarshalJSON(b []byte) error { m.preValidate() return m.validate() } + +func NewMe

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1504998435 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -149,26 +183,71 @@ public void commit(TableMetadata base, TableMetadata metada

Re: [PR] feat: glue table creation with some docs on testing [iceberg-go]

2024-02-27 Thread via GitHub
wolfeidau commented on code in PR #59: URL: https://github.com/apache/iceberg-go/pull/59#discussion_r1504982899 ## docs/cfn/AWS_TESTING.md: ## @@ -0,0 +1,74 @@ + + +# AWS integration testing + Review Comment: @zeroshade good question, it was currently a bit adhoc, and mostly

Re: [PR] feat: glue table creation with some docs on testing [iceberg-go]

2024-02-27 Thread via GitHub
wolfeidau commented on code in PR #59: URL: https://github.com/apache/iceberg-go/pull/59#discussion_r1504979600 ## catalog/catalog.go: ## @@ -185,3 +197,33 @@ func TableNameFromIdent(ident table.Identifier) string { func NamespaceFromIdent(ident table.Identifier) table.Identifi

Re: [PR] feat: glue table creation with some docs on testing [iceberg-go]

2024-02-27 Thread via GitHub
wolfeidau commented on code in PR #59: URL: https://github.com/apache/iceberg-go/pull/59#discussion_r1504975878 ## catalog/catalog.go: ## @@ -185,3 +197,33 @@ func TableNameFromIdent(ident table.Identifier) string { func NamespaceFromIdent(ident table.Identifier) table.Identifi

Re: [PR] OpenAPI: Add ContentFile types to spec for scan and append api [iceberg]

2024-02-27 Thread via GitHub
jackye1995 commented on code in PR #9717: URL: https://github.com/apache/iceberg/pull/9717#discussion_r1504942852 ## open-api/rest-catalog-open-api.yaml: ## @@ -3324,6 +3324,297 @@ components: type: integer format: int64 +BooleanTypeValue: + type

Re: [PR] [Bug Fix] cast None `current-snapshot-id` as -1 for Backwards Compatibility [iceberg-python]

2024-02-27 Thread via GitHub
syun64 commented on PR #473: URL: https://github.com/apache/iceberg-python/pull/473#issuecomment-1967538763 So it looks like using a custom @field_serializer isn't working in the [current IcebergBaseModel definition](https://github.com/apache/iceberg-python/blob/pyiceberg-0.6.x/pyiceberg/ty

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg-python]

2024-02-27 Thread via GitHub
geruh commented on code in PR #475: URL: https://github.com/apache/iceberg-python/pull/475#discussion_r1504938114 ## pyiceberg/catalog/dynamodb.py: ## @@ -81,6 +82,10 @@ ITEM = "Item" +@deprecated( +deprecated_in="0.6.0", +removed_in="1.0.0", +) Review Comment:

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504898529 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java: ## @@ -86,6 +88,16 @@ protected SparkShufflingDataRewriter(SparkS

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504898529 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java: ## @@ -86,6 +88,16 @@ protected SparkShufflingDataRewriter(SparkS

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504898529 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java: ## @@ -86,6 +88,16 @@ protected SparkShufflingDataRewriter(SparkS

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg]

2024-02-27 Thread via GitHub
amogh-jahagirdar commented on PR #9783: URL: https://github.com/apache/iceberg/pull/9783#issuecomment-1967496545 Yeah I think it's important to emphasize that deprecating in this repo, and removing it in 2.0 does not mean DynamoDBCatalog cannot exist in some form in some other repo.

Re: [PR] Make issued_token_type optional to support OAuth2 Client Credential Flow [iceberg-python]

2024-02-27 Thread via GitHub
syun64 commented on code in PR #466: URL: https://github.com/apache/iceberg-python/pull/466#discussion_r1504877454 ## pyiceberg/catalog/rest.py: ## @@ -156,8 +156,10 @@ class RegisterTableRequest(IcebergBaseModel): class TokenResponse(IcebergBaseModel): access_token: str =

Re: [PR] Deprecate DynamoDB Catalog to Reduce Catalog Scope [iceberg]

2024-02-27 Thread via GitHub
jackye1995 commented on PR #9783: URL: https://github.com/apache/iceberg/pull/9783#issuecomment-1967478372 @SreeramGarlapati @namrathamyske we discussed specifically about Dynamo in that community sync that we want to deprecate it directly. Let us talk more on the devlist about this.

Re: [PR] Partition Evolution [iceberg-python]

2024-02-27 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1504865802 ## pyiceberg/table/__init__.py: ## @@ -871,6 +924,12 @@ def sort_orders(self) -> Dict[int, SortOrder]: """Return a dict of the sort orders of thi

Re: [PR] Make issued_token_type optional to support OAuth2 Client Credential Flow [iceberg-python]

2024-02-27 Thread via GitHub
flyrain commented on code in PR #466: URL: https://github.com/apache/iceberg-python/pull/466#discussion_r1504837924 ## pyiceberg/catalog/rest.py: ## @@ -156,8 +156,10 @@ class RegisterTableRequest(IcebergBaseModel): class TokenResponse(IcebergBaseModel): access_token: str

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504835503 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java: ## @@ -86,6 +88,16 @@ protected SparkShufflingDataRewriter(SparkS

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504805631 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504804663 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedFileRewriter.java: ## @@ -111,6 +111,8 @@ public abstract class SizeBasedFileRewriter, F exte private bo

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504801995 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504800434 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504798801 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
himadripal commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504797788 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty() {

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504796806 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504794573 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1463,6 +1449,176 @@ public void testSnapshotProperty()

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504784398 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -49,22 +49,7 @@ import java.util.stream.IntStream; im

Re: [PR] Add support for providing output-spec-id during rewrite - spark 3.5 [iceberg]

2024-02-27 Thread via GitHub
RussellSpitzer commented on code in PR #9803: URL: https://github.com/apache/iceberg/pull/9803#discussion_r1504782370 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java: ## @@ -152,11 +165,12 @@ private Dataset transformPlan(Datase

  1   2   >