[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-13 Thread GitBox
gaborkaszab commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046773176 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -304,7 +306,9 @@ public Table loadTable(SessionContext context, TableIdentifier identi

[GitHub] [iceberg] nastra commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction

2022-12-13 Thread GitBox
nastra commented on code in PR #6074: URL: https://github.com/apache/iceberg/pull/6074#discussion_r1046829891 ## .palantir/revapi.yml: ## @@ -273,6 +273,15 @@ acceptedBreaks: - code: "java.method.addedToInterface" new: "method java.util.List org.apache.iceberg.Table

[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction

2022-12-13 Thread GitBox
gaborkaszab commented on code in PR #6074: URL: https://github.com/apache/iceberg/pull/6074#discussion_r1046906474 ## .palantir/revapi.yml: ## @@ -273,6 +273,15 @@ acceptedBreaks: - code: "java.method.addedToInterface" new: "method java.util.List org.apache.iceberg.

[GitHub] [iceberg] jiron12 opened a new issue, #6418: Transactions for write operations

2022-12-13 Thread GitBox
jiron12 opened a new issue, #6418: URL: https://github.com/apache/iceberg/issues/6418 ### Feature Request / Improvement Is full transaction support anywhere on the roadmap? What I tried (SQL): ` START TRANSACTION; insert into my_trino_schema.my_table select 43, 'hel

[GitHub] [iceberg] nastra commented on issue #6418: Transactions for write operations

2022-12-13 Thread GitBox
nastra commented on issue #6418: URL: https://github.com/apache/iceberg/issues/6418#issuecomment-1348173560 @jiron12 given that the Trino Iceberg connector is developed outside of the Iceberg project, you might want to rather move this issue to https://github.com/trinodb/trino/issues --

[GitHub] [iceberg] gaborkaszab commented on pull request #6074: API,Core: SnapshotManager to be created through Transaction

2022-12-13 Thread GitBox
gaborkaszab commented on PR #6074: URL: https://github.com/apache/iceberg/pull/6074#issuecomment-1348180097 > This looks good to me other than a couple of small things to fix: > > * revapi suppressions are reordered, which is going to introduce churn > * It isn't clear why this woul

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-13 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1047057758 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -357,13 +358,10 @@ public IcebergSource build() { if (readerF

[GitHub] [iceberg] jiamin13579 opened a new pull request, #6419: Update spark-ddl.md

2022-12-13 Thread GitBox
jiamin13579 opened a new pull request, #6419: URL: https://github.com/apache/iceberg/pull/6419 Example of correcting the document add/drop partition truncate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-13 Thread GitBox
ajantha-bhat commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1348577920 @rdblue, @findepi, @amogh-jahagirdar: Handled the comments. Please take a look at it again. Also, #6267 is ready. -- This is an automated message from the Apache Git Servic

[GitHub] [iceberg] Fokko merged pull request #6413: Python: Remove outdated docs + some suggestions for textual improvements

2022-12-13 Thread GitBox
Fokko merged PR #6413: URL: https://github.com/apache/iceberg/pull/6413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on pull request #6413: Python: Remove outdated docs + some suggestions for textual improvements

2022-12-13 Thread GitBox
Fokko commented on PR #6413: URL: https://github.com/apache/iceberg/pull/6413#issuecomment-1348611301 Thanks @rubenvdg for working on this, much appreciated 🙌🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [iceberg] JanKaul opened a new issue, #6420: Iceberg Materialized View Spec

2022-12-13 Thread GitBox
JanKaul opened a new issue, #6420: URL: https://github.com/apache/iceberg/issues/6420 ### Feature Request / Improvement # Iceberg Materialized View Spec ## Background and Motivation A materialized view precomputes results of a query to be used as a logical table. When

[GitHub] [iceberg] JanKaul commented on issue #6420: Iceberg Materialized View Spec

2022-12-13 Thread GitBox
JanKaul commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1348693373 The draft has to be seen as an initial starting point. obviously the design is open for discussion. -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [iceberg] jiron12 commented on issue #6418: Transactions for write operations

2022-12-13 Thread GitBox
jiron12 commented on issue #6418: URL: https://github.com/apache/iceberg/issues/6418#issuecomment-1348755733 Ok, posted this at the Trino side: https://github.com/trinodb/trino/issues/15385 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [iceberg] nastra closed issue #6418: Transactions for write operations

2022-12-13 Thread GitBox
nastra closed issue #6418: Transactions for write operations URL: https://github.com/apache/iceberg/issues/6418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6376: Docs: Add register table Spark procedure documentation

2022-12-13 Thread GitBox
RussellSpitzer commented on code in PR #6376: URL: https://github.com/apache/iceberg/pull/6376#discussion_r1047327247 ## docs/spark-procedures.md: ## @@ -493,6 +493,38 @@ CALL spark_catalog.system.add_files( ) ``` +### `register_table` + +Creates a catalog entry for a metada

[GitHub] [iceberg] RussellSpitzer merged pull request #6376: Docs: Add register table Spark procedure documentation

2022-12-13 Thread GitBox
RussellSpitzer merged PR #6376: URL: https://github.com/apache/iceberg/pull/6376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

[GitHub] [iceberg] RussellSpitzer commented on pull request #6376: Docs: Add register table Spark procedure documentation

2022-12-13 Thread GitBox
RussellSpitzer commented on PR #6376: URL: https://github.com/apache/iceberg/pull/6376#issuecomment-1348826087 Thanks @rajarshisarkar for updating the docs and for working through the syntax issues I that I only remembered from old docs! -- This is an automated message from the Apache Git

[GitHub] [iceberg] stevenzwu merged pull request #6377: Flink: add util class to generate test data with extensive coverage d…

2022-12-13 Thread GitBox
stevenzwu merged PR #6377: URL: https://github.com/apache/iceberg/pull/6377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6377: Flink: add util class to generate test data with extensive coverage d…

2022-12-13 Thread GitBox
stevenzwu commented on PR #6377: URL: https://github.com/apache/iceberg/pull/6377#issuecomment-1348974325 thanks @hililiwei and @pvary for reviewing. will follow up on Peter's comment with a separate PR -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [iceberg] stevenzwu commented on pull request #6377: Flink: add util class to generate test data with extensive coverage d…

2022-12-13 Thread GitBox
stevenzwu commented on PR #6377: URL: https://github.com/apache/iceberg/pull/6377#issuecomment-1348983387 @pvary I missed the FixedType because of this code snippet in Iceberg ``` private static final ImmutableMap TYPES = ImmutableMap.builder() .put(BooleanTy

[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue

2022-12-13 Thread GitBox
asheeshgarg commented on issue #6415: URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1349350170 @nastra filled in the missing bits So this schema that is define in Iceberg entity_status is UTF8 Schema

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1047584271 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -83,4 +85,25 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] e-gat opened a new issue, #6421: Running rewriteDataFiles on multiple executors in Spark

2022-12-13 Thread GitBox
e-gat opened a new issue, #6421: URL: https://github.com/apache/iceberg/issues/6421 ### Query engine Spark/EMR ### Question Can Spark (3.2.1) / EMR 6.7 with iceberg 1.1 supports running rewriteDataFiles across multiple executors or only on one? If so, what is the reco

[GitHub] [iceberg] szehon-ho merged pull request #6284: Spark 3.2: Fix a separate table cache being created for each rewriteFiles

2022-12-13 Thread GitBox
szehon-ho merged PR #6284: URL: https://github.com/apache/iceberg/pull/6284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho merged pull request #6285: Spark 3.1: Fix a separate table cache being created for each rewriteFiles

2022-12-13 Thread GitBox
szehon-ho merged PR #6285: URL: https://github.com/apache/iceberg/pull/6285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho commented on pull request #6285: Spark 3.1: Fix a separate table cache being created for each rewriteFiles

2022-12-13 Thread GitBox
szehon-ho commented on PR #6285: URL: https://github.com/apache/iceberg/pull/6285#issuecomment-1349850082 Thanks @manuzhang , @hililiwei and @ajantha-bhat for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] szehon-ho commented on pull request #6284: Spark 3.2: Fix a separate table cache being created for each rewriteFiles

2022-12-13 Thread GitBox
szehon-ho commented on PR #6284: URL: https://github.com/apache/iceberg/pull/6284#issuecomment-1349850868 Thanks @manuzhang , @hililiwei and @ajantha-bhat for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] mtnrabi opened a new issue, #6422: How compaction works along side incremental read

2022-12-13 Thread GitBox
mtnrabi opened a new issue, #6422: URL: https://github.com/apache/iceberg/issues/6422 ### Query engine Spark ### Question In the docs, it’s mentioned that incremental read “Currently gets only the data from append operation. Cannot support replace, overwrite, delete ope

[GitHub] [iceberg] szehon-ho commented on pull request #6419: Doc:Example of correcting the document add/drop partition truncate

2022-12-13 Thread GitBox
szehon-ho commented on PR #6419: URL: https://github.com/apache/iceberg/pull/6419#issuecomment-1349942377 Not sure its necessary, looks like for now width can be any of the arguments: https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/Spar

[GitHub] [iceberg] jackye1995 merged pull request #6408: Spark: Cleanup commented out code in SparkValueReaders

2022-12-13 Thread GitBox
jackye1995 merged PR #6408: URL: https://github.com/apache/iceberg/pull/6408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] jackye1995 commented on pull request #6408: Spark: Cleanup commented out code in SparkValueReaders

2022-12-13 Thread GitBox
jackye1995 commented on PR #6408: URL: https://github.com/apache/iceberg/pull/6408#issuecomment-1349969230 Thanks for the cleanup! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6408: Spark: Cleanup commented out code in SparkValueReaders

2022-12-13 Thread GitBox
amogh-jahagirdar commented on PR #6408: URL: https://github.com/apache/iceberg/pull/6408#issuecomment-1349996341 thanks for all the reviews! @Fokko @singhpk234 @jackye1995 @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [iceberg] HyukjinKwon commented on issue #5153: Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions

2022-12-13 Thread GitBox
HyukjinKwon commented on issue #5153: URL: https://github.com/apache/iceberg/issues/5153#issuecomment-1350095844 I happened to read the related links. Thanks @singhpk234 for elaborating Spark's CI. To be more clear, https://github.com/apache/spark/pull/32092 implemented the logic you explai

[GitHub] [iceberg] HyukjinKwon commented on issue #5153: Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions

2022-12-13 Thread GitBox
HyukjinKwon commented on issue #5153: URL: https://github.com/apache/iceberg/issues/5153#issuecomment-1350098810 In this way, we can remove all the overhead in the current repo, and leverage the resources from the forked repositories. Spark was one of the projects that uses the GitHub res

[GitHub] [iceberg] github-actions[bot] commented on issue #5065: The check-ordering purpose

2022-12-13 Thread GitBox
github-actions[bot] commented on issue #5065: URL: https://github.com/apache/iceberg/issues/5065#issuecomment-1350134025 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] github-actions[bot] commented on issue #5038: Python: Decimal scale can't be greater than precision

2022-12-13 Thread GitBox
github-actions[bot] commented on issue #5038: URL: https://github.com/apache/iceberg/issues/5038#issuecomment-1350134067 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-13 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1047957805 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -264,6 +264,14 @@ public void cleanFiles(TableMetadata beforeExpiration, TableMetadata

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-13 Thread GitBox
ajantha-bhat commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1047963446 ## core/src/main/java/org/apache/iceberg/SnapshotSummary.java: ## @@ -148,7 +148,7 @@ public void set(String property, String value) { } private void u

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-13 Thread GitBox
hililiwei commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1048054508 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -603,7 +605,103 @@ public void testFilterPushDown2Literal() { }

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-13 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1048106490 ## core/src/main/java/org/apache/iceberg/SnapshotSummary.java: ## @@ -148,7 +148,7 @@ public void set(String property, String value) { } private void updateP