Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925188382 > The code expect()s current_schema_id should not be None `current_schema_id` is impossible to be `None` in `TableMetadata` > But it is valid to be None in TableMetadata

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476993584 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + pu

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476991582 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + pu

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
zeodtr commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925160320 In the following code, from: https://github.com/apache/iceberg-rust/blob/9ae9e13fb48ea8af20d76644f27dcb2fc8773396/crates/iceberg/src/spec/table_metadata.rs#L679 ```rust

Re: [PR] feat: add parquet writer [iceberg-rust]

2024-02-02 Thread via GitHub
ZENOTME commented on code in PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1476988389 ## crates/iceberg/src/writer/file_writer/mod.rs: ## @@ -18,16 +18,20 @@ //! This module contains the writer for data file format supported by iceberg: parquet, orc

[PR] feat: add handwritten serialize [iceberg-rust]

2024-02-02 Thread via GitHub
odysa opened a new pull request, #185: URL: https://github.com/apache/iceberg-rust/pull/185 Add serialize for `TableMetadata` and use `TryFrom` as discussed in #177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Build: Upgrade to gradle 8.5 [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on PR #8486: URL: https://github.com/apache/iceberg/pull/8486#issuecomment-1925103805 As gradle 8.6 is available, I will resume this PR upgrading to 8.6. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
stevenzwu commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1925055682 yeah. we are on the same page now. `TaskType` enum can be defined in `FileScanTaskParser`. regarding the `StructLike` row serialization, I am thinking maybe we should implem

Re: [PR] feat: add parquet writer [iceberg-rust]

2024-02-02 Thread via GitHub
liurenjie1024 commented on code in PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1476921216 ## crates/iceberg/src/writer/file_writer/mod.rs: ## @@ -18,16 +18,20 @@ //! This module contains the writer for data file format supported by iceberg: parque

Re: [PR] only trim slash when warehouse location is not root path [iceberg]

2024-02-02 Thread via GitHub
abmo-x commented on code in PR #9619: URL: https://github.com/apache/iceberg/pull/9619#discussion_r1476934235 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java: ## @@ -108,7 +108,10 @@ public void initialize(String name, Map properties) { "Cannot initi

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
liurenjie1024 commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925047108 I remember why we use `expect` for `TableMetadata` serialization, see [this comment](https://github.com/apache/iceberg-rust/blob/c91aeaec2aa713a1efdc513e1769220dd53cf443/crat

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925037344 > We may need to modify the From to TryFrom. `serde` does not support the attribute `try_into`. https://github.com/serde-rs/serde/issues/1524 The `into` attribute requires `

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1925022721 > Do you want users to choose their own runtime like [sqlx](https://github.com/launchbadge/sqlx/tree/main#install)? Yes, exactly. I don't think we should bind to some

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
liurenjie1024 commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1925021474 @zeodtr Sorry for the mistake, yes you are right, expect still may cause panic. Let's reopen it. cc @odysa We may need to modify the `From` to `TryFrom`. -- This is an au

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476914683 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-02-02 Thread via GitHub
epgif commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-1924975062 https://github.com/apache/iceberg/actions/runs/7717197684/job/21172136322?pr=9008 looks like a spurious failure? All the rest passed, and even `spark-3x-java-17-tests (3.5, 2.13)` pass

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476899638 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java: ## @@ -331,4 +331,24 @@ private long driverMaxResultSize() { SparkConf sparkConf

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476899638 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java: ## @@ -331,4 +331,24 @@ private long driverMaxResultSize() { SparkConf sparkConf

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476895581 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -188,23 +189,6 @@ public static List> planTaskGroup return taskGroups; } - priv

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476890643 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -188,23 +189,6 @@ public static List> planTaskGroup return taskGroups; } -

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476890643 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -188,23 +189,6 @@ public static List> planTaskGroup return taskGroups; } -

Re: [I] Have tests which test against the shaded runtime artifacts [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] closed issue #257: Have tests which test against the shaded runtime artifacts URL: https://github.com/apache/iceberg/issues/257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Allow reading non-optional unions as struct of optional fields [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #189: URL: https://github.com/apache/iceberg/issues/189#issuecomment-1924927624 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Allow reading non-optional unions as struct of optional fields [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] closed issue #189: Allow reading non-optional unions as struct of optional fields URL: https://github.com/apache/iceberg/issues/189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Large number of external java packages are not relocated in iceberg-runtime.jar and iceberg-presto-runtime.jar [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #168: URL: https://github.com/apache/iceberg/issues/168#issuecomment-1924927598 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Iceberg Table snapshots/manifests using relative path fails to read data [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] closed issue #128: Iceberg Table snapshots/manifests using relative path fails to read data URL: https://github.com/apache/iceberg/issues/128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Consider removing the use of Guava [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #797: URL: https://github.com/apache/iceberg/issues/797#issuecomment-1924927799 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] java.lang.RuntimeException: Metastore operation failed error reason [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #791: URL: https://github.com/apache/iceberg/issues/791#issuecomment-1924927780 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Have tests which test against the shaded runtime artifacts [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #257: URL: https://github.com/apache/iceberg/issues/257#issuecomment-1924927637 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Iceberg Table snapshots/manifests using relative path fails to read data [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] commented on issue #128: URL: https://github.com/apache/iceberg/issues/128#issuecomment-1924927573 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Large number of external java packages are not relocated in iceberg-runtime.jar and iceberg-presto-runtime.jar [iceberg]

2024-02-02 Thread via GitHub
github-actions[bot] closed issue #168: Large number of external java packages are not relocated in iceberg-runtime.jar and iceberg-presto-runtime.jar URL: https://github.com/apache/iceberg/issues/168 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [WIP] Migrate SparkExtensions sub-classes to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on PR #9624: URL: https://github.com/apache/iceberg/pull/9624#issuecomment-1924917685 Sure, thank you for the suggestion. The size of diff would be a bit big if I gather all the diffs in `extension` package in this PR. So, I will separately create PRs, but first add rele

Re: [PR] refactor: Replace unwrap [iceberg-rust]

2024-02-02 Thread via GitHub
zeodtr commented on PR #183: URL: https://github.com/apache/iceberg-rust/pull/183#issuecomment-1924916796 @odysa @liurenjie1024 @Xuanwo @Fokko Please see my comment: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1924916131 Thank you. -- This is an automated message fro

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
zeodtr commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1924916131 @Fokko @liurenjie1024 I think this issue has not been resolved, since I'm kind of disagree with the pull request. What I'm concerned about are as follows: * `expect()` also s

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476874138 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + public

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476868911 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -279,5 +284,29 @@ protected void markRowDeleted(InternalRow row) {

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi merged PR #9603: URL: https://github.com/apache/iceberg/pull/9603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476864901 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + pu

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476864901 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -171,6 +169,10 @@ public PartitionData copy() { return new PartitionData(this); } + pu

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476862386 ## core/src/main/java/org/apache/iceberg/PartitionData.java: ## @@ -221,4 +223,32 @@ public static Object[] copyData(Types.StructType type, Object[] data) {

Re: [PR] Core: Fix performance issue when combining tasks by partition [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9629: URL: https://github.com/apache/iceberg/pull/9629#discussion_r1476863172 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -188,23 +189,6 @@ public static List> planTaskGroup return taskGroups; } - priv

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
danielcweeks commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476859199 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceb

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
danielcweeks commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476857183 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceb

Re: [PR] Remove nightly and add .asf.yaml [iceberg]

2024-02-02 Thread via GitHub
danielcweeks merged PR #9622: URL: https://github.com/apache/iceberg/pull/9622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Remove publish directive from .asf.yaml [iceberg-docs]

2024-02-02 Thread via GitHub
danielcweeks merged PR #309: URL: https://github.com/apache/iceberg-docs/pull/309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476851450 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceberg-Da

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476850002 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceberg-Da

[PR] Remove publish directive from .asf.yaml [iceberg-docs]

2024-02-02 Thread via GitHub
bitsondatadev opened a new pull request, #309: URL: https://github.com/apache/iceberg-docs/pull/309 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476795233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476795233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

[PR] Add Daft examples and code into PyIceberg docs and Table [iceberg-python]

2024-02-02 Thread via GitHub
jaychia opened a new pull request, #355: URL: https://github.com/apache/iceberg-python/pull/355 1. Adds a new optional installation arg `daft`, so that `pip install pyiceberg[daft]` will pull Daft in as a dependency 2. Adds a new `Table.to_daft()` method to convert a table into a Daft da

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924672646 > Do you want users to choose their own runtime like [sqlx](https://github.com/launchbadge/sqlx/tree/main?rgh-link-date=2024-02-02T17%3A02%3A32Z#install)? They are building an abstr

Re: [I] add type: Timestamp with nanosecond units [iceberg]

2024-02-02 Thread via GitHub
jacobmarble commented on issue #8657: URL: https://github.com/apache/iceberg/issues/8657#issuecomment-1924559864 Maintainers: please also assign @epgif -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476567943 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -279,5 +284,29 @@ protected void markRowDeleted(InternalRow row) {

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476567943 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -279,5 +284,29 @@ protected void markRowDeleted(InternalRow row) {

Re: [PR] Spark 3.4, 3.5: Use ProcedureInput for RewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho commented on PR #8583: URL: https://github.com/apache/iceberg/pull/8583#issuecomment-1924504708 Merged, thanks @dramaticlly , and thanks all for additional reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Switch to using ProcedureInput for rewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho closed issue #8582: Switch to using ProcedureInput for rewriteDataFiles URL: https://github.com/apache/iceberg/issues/8582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Spark 3.4, 3.5: Use ProcedureInput for RewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho merged PR #8583: URL: https://github.com/apache/iceberg/pull/8583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[PR] Flink: backport #9547 to 1.17 and 1.16 for Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-02-02 Thread via GitHub
rodmeneses opened a new pull request, #9627: URL: https://github.com/apache/iceberg/pull/9627 1.17 came out clean 1.16 came out clean, after adding couple extra lines on `TestStreamScanSql` -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Flink: change defaultFlinkVersion back to 1.18 [iceberg]

2024-02-02 Thread via GitHub
pvary merged PR #9625: URL: https://github.com/apache/iceberg/pull/9625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] partitioned write support [iceberg-python]

2024-02-02 Thread via GitHub
jqin61 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1476487793 ## pyiceberg/table/__init__.py: ## @@ -2467,3 +2462,131 @@ def commit(self) -> Snapshot: ) return snapshot + + +@dataclass(frozen=True) +cla

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
szehon-ho commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476484200 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -0,0 +1,444 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1924405491 @nastra @danielcweeks @rdblue I updated the PR. You can already do a new round (I'm checking a couple of stuff but ready to review already). -- This is an automated message from the Ap

Re: [I] Add View Support to Spark [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on issue #7938: URL: https://github.com/apache/iceberg/issues/7938#issuecomment-1924377622 I think we can call this one done with all of the PRs from @nastra that we've merged lately. Thanks @jzhuge and @nastra for getting this ready! -- This is an automated message from

Re: [I] Add View Support to Spark [iceberg]

2024-02-02 Thread via GitHub
rdblue closed issue #7938: Add View Support to Spark URL: https://github.com/apache/iceberg/issues/7938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475826840 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java: ## @@ -182,18 +169,13 @@ private void createTable(String newMetadataLocation) throws SQLExcept

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475821544 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.name

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476443719 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [WIP] Migrate SparkExtensions sub-classes to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on PR #9624: URL: https://github.com/apache/iceberg/pull/9624#issuecomment-1924340442 > Will add all sub-classes in this PR depending on the size of the diff it's also fine to split this into 2-3 PRs. You could probably start within a specific package and combine subc

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476417816 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
nastra commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1924328488 @stevenzwu this is only because `ReportMetricsRequest` is a REST request class for a `MetricsReport`. So in the case of this issue here we'd define the enum type at the JSON level in

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924284690 > I mean we may need an extra layer for task scheduling, so that we can be adopted to any async runtime such as tokio, async-std. Do you want users to choose their own runtime

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476325368 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476307388 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] partitioned write support [iceberg-python]

2024-02-02 Thread via GitHub
syun64 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1476291241 ## pyiceberg/table/__init__.py: ## @@ -2467,3 +2462,131 @@ def commit(self) -> Snapshot: ) return snapshot + + +@dataclass(frozen=True) +cla

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476289111 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476285119 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476280534 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476276022 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924204635 ++, this lowers the barrier to entry by a lot. It's a lot of work to spin up docker/s3/minio integration and hive 😱! -- This is an automated message from the Apache Git Ser

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476270090 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
bitsondatadev commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924196245 Yeah I want to sprinkle this literally everywhere in the docs so please go for it. I think this will be my preferred way of teaching Iceberg. -- This is an automated mes

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476264616 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on PR #6981: URL: https://github.com/apache/iceberg/pull/6981#issuecomment-1924192337 Oops, I didn't mean to close this! I want to work on getting it in next -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
huaxingao opened a new pull request, #6981: URL: https://github.com/apache/iceberg/pull/6981 Push down min/max/count with group by if group by is on partition columns For example: ``` CREATE TABLE test (id LONG, ts TIMESTAMP, data INT) USING iceberg PARTITIONED BY (id, ts); S

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue closed pull request #6981: Push down group by for partition columns URL: https://github.com/apache/iceberg/pull/6981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on PR #6981: URL: https://github.com/apache/iceberg/pull/6981#issuecomment-1924191483 @amogh-jahagirdar, can you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Spark: Fix CREATE OR REPLACE VIEW when view doesn't exist [iceberg]

2024-02-02 Thread via GitHub
rdblue merged PR #9621: URL: https://github.com/apache/iceberg/pull/9621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476258875 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/IcebergWriterFactory.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476250156 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/IcebergWriter.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

Re: [I] What is Table Identifier? [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #341: URL: https://github.com/apache/iceberg-python/issues/341#issuecomment-1924165256 So if I understand correctly, TableIdentifier consists of the namespace and the table name. Namespace can be multiple parts, for example ("com"."apache"."iceberg") or "com.ap

Re: [I] Iceberg with Glue Catalog updates glue table version on every commit, but there's a maximum of 100,000 versions [iceberg]

2024-02-02 Thread via GitHub
idrissa-mgs commented on issue #5965: URL: https://github.com/apache/iceberg/issues/5965#issuecomment-1924162113 @vshel Did you finally find a long term solution rather than asking for an increase of the aws soft limit on tables versionning ? Did the skipArchive flag help ? -- This is

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924154815 that's a great idea! I'm thinking of adding this in [Getting started with PyIceberg](https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/index.md) WDYT?

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
stevenzwu commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1924110032 @nastra `ReportMetricsRequest` has the type at API level. ``` ReportType reportType(); ``` -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1476197844 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.name

Re: [PR] Partition Evolution [iceberg-python]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1476195413 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2331,242 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Spark: Bypass Spark's ViewCatalog API when replacing a view [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9596: URL: https://github.com/apache/iceberg/pull/9596#discussion_r1476195226 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -608,6 +608,53 @@ public View createView( "Creating a view is not supported

Re: [PR] Partition Evolution [iceberg-python]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1476173833 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2331,242 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1476145516 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.namesp

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
wooyeong commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1476145477 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEager

  1   2   >