[GitHub] [iceberg] jackye1995 merged pull request #6746: AWS: Load HttpClientBuilder dynamically to avoid runtime deps of both urlconnection and apache client

2023-02-09 Thread via GitHub
jackye1995 merged PR #6746: URL: https://github.com/apache/iceberg/pull/6746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] aokolnychyi commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
aokolnychyi commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424577897 I'll take a look this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1424578537 @amogh-jahagirdar I have another job that works just fine and I took a look at the `inputFormat`, `outputFormat`, `serialization.lib` for the iceberg table that is working and

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1424582427 Or if using the same catalog name `glue_catalog` in each jobs is causing an issue. This is my --conf for each job.. The top one works fine while the second one is for the

[GitHub] [iceberg] RobbertDM commented on issue #6467: Does iceberg has plan to support Json Type?

2023-02-09 Thread via GitHub
RobbertDM commented on issue #6467: URL: https://github.com/apache/iceberg/issues/6467#issuecomment-1424590282 @amogh-jahagirdar I think I can't access that thread since I don't have an `@apache.org` account. Do you mind giving us a TL;DR? -- This is an automated message from the Apache G

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1101840634 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] snazy closed pull request #6649: Nessie-build: add test dependencies

2023-02-09 Thread via GitHub
snazy closed pull request #6649: Nessie-build: add test dependencies URL: https://github.com/apache/iceberg/pull/6649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [iceberg] amogh-jahagirdar commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
amogh-jahagirdar commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1424617561 > I have another job that works just fine and I took a look at the inputFormat, outputFormat, serialization.lib for the iceberg table that is working and they aren't set eit

[GitHub] [iceberg] aokolnychyi merged pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-09 Thread via GitHub
aokolnychyi merged PR #6776: URL: https://github.com/apache/iceberg/pull/6776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] aokolnychyi commented on pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-09 Thread via GitHub
aokolnychyi commented on PR #6776: URL: https://github.com/apache/iceberg/pull/6776#issuecomment-1424623094 Thanks for reviewing, @szehon-ho @amogh-jahagirdar @singhpk234! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [iceberg] RussellSpitzer commented on issue #6467: Does iceberg has plan to support Json Type?

2023-02-09 Thread via GitHub
RussellSpitzer commented on issue #6467: URL: https://github.com/apache/iceberg/issues/6467#issuecomment-1424637359 https://join.slack.com/t/apache-iceberg/shared_invite/zt-1oj35f7yc-wuTEhvkiqjGLje83B7rG8A

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6791: Core: Refactor validation in TableScanUtil

2023-02-09 Thread via GitHub
aokolnychyi opened a new pull request, #6791: URL: https://github.com/apache/iceberg/pull/6791 This PR refactors validation in `TableScanUtil` into a separate method as the same logic is currently used in 3 different methods. In addition, it switches to shorter messages to stay on one line

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6792: Spark 3.3: Remove redundant vars in ChangelogRowReader

2023-02-09 Thread via GitHub
aokolnychyi opened a new pull request, #6792: URL: https://github.com/apache/iceberg/pull/6792 This PR removes redundant variables in `ChangelogRowReader`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6793: Spark 3.3: Fix comment formatting

2023-02-09 Thread via GitHub
aokolnychyi opened a new pull request, #6793: URL: https://github.com/apache/iceberg/pull/6793 This PR fixes auto formatting for some existing comments that look weird after applying Spotless. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [iceberg] RobbertDM commented on issue #6467: Does iceberg has plan to support Json Type?

2023-02-09 Thread via GitHub
RobbertDM commented on issue #6467: URL: https://github.com/apache/iceberg/issues/6467#issuecomment-1424668247 Thanks! The above link seems to be broken though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [iceberg] rdblue commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1101920517 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1234,6 +1243,95 @@ public void testMultipleRefsAndCleanExpiredFilesFailsForIncrementalCleanup(

[GitHub] [iceberg] rdblue commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1101922436 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1515,4 +1599,47 @@ private RemoveSnapshots removeSnapshots(Table table) { RemoveSnapshots

[GitHub] [iceberg] rdblue commented on pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2023-02-09 Thread via GitHub
rdblue commented on PR #6432: URL: https://github.com/apache/iceberg/pull/6432#issuecomment-1424698512 @rbalamohan, looks like this is causing tests to fail consistently. Can you look into the memory issue? FYI @nastra. -- This is an automated message from the Apache Git Service. T

[GitHub] [iceberg] rdblue commented on pull request #6327: ORC: Fix error when projecting nested indentity partition column

2023-02-09 Thread via GitHub
rdblue commented on PR #6327: URL: https://github.com/apache/iceberg/pull/6327#issuecomment-1424699284 @shardulm94, yes. I'll put it in the queue. Thanks for pinging me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101931339 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java: ## @@ -83,7 +83,11 @@ public Long endSnapshotId() { } public String branch() { -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101932641 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java: ## @@ -304,4 +305,13 @@ public boolean caseSensitive() { .defaultValue(SQLConf.C

[GitHub] [iceberg] aokolnychyi merged pull request #6744: Spark: Backport handling ResolvingFileIO in determining locality - PR-6655

2023-02-09 Thread via GitHub
aokolnychyi merged PR #6744: URL: https://github.com/apache/iceberg/pull/6744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101934027 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -103,6 +103,10 @@ protected Types.StructType groupingKeyType() { protected a

[GitHub] [iceberg] aokolnychyi commented on pull request #6744: Spark: Backport handling ResolvingFileIO in determining locality - PR-6655

2023-02-09 Thread via GitHub
aokolnychyi commented on PR #6744: URL: https://github.com/apache/iceberg/pull/6744#issuecomment-1424710418 Thanks, @singhpk234! Thanks for reviewing @jackye1995 @ajantha-bhat! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101934841 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java: ## @@ -277,6 +279,7 @@ private void commitOperation(SnapshotUpdate operat

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101936845 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -313,6 +316,7 @@ public void commit(WriterCommitMessage[] messages) { }

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101937377 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -536,7 +539,7 @@ protected void commit(SnapshotUpdate snapshotUpdate, long epo

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101937377 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -536,7 +539,7 @@ protected void commit(SnapshotUpdate snapshotUpdate, long epo

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101938075 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -570,7 +573,7 @@ protected String mode() { @Override protected void

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101939426 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java: ## @@ -71,9 +72,18 @@ public class TestSparkDataWrite { @Rule public Temp

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101939155 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java: ## @@ -71,9 +72,18 @@ public class TestSparkDataWrite { @Rule public Temp

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101940677 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -281,6 +281,12 @@ public boolean canDeleteWhere(Filter[] filters) { private b

[GitHub] [iceberg] jackye1995 commented on pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-09 Thread via GitHub
jackye1995 commented on PR #6777: URL: https://github.com/apache/iceberg/pull/6777#issuecomment-1424717284 Thanks for the fix @RussellSpitzer , and thanks for the review @amogh-jahagirdar , @nastra , @yyanyy ! I will rebase #6772 after this is merged. -- This is an automated message from

[GitHub] [iceberg] rdblue commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1101941155 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -289,6 +295,10 @@ private boolean canDeleteUsingMetadata(Expression deleteExpr)

[GitHub] [iceberg] jackye1995 merged pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-09 Thread via GitHub
jackye1995 merged PR #6777: URL: https://github.com/apache/iceberg/pull/6777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6717: spark 3.3 read by snapshot ref schema

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6717: URL: https://github.com/apache/iceberg/pull/6717#discussion_r1101943185 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -722,6 +741,9 @@ private Table loadFromPathIdentifier(PathIdentifier ident) {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6717: spark 3.3 read by snapshot ref schema

2023-02-09 Thread via GitHub
rdblue commented on code in PR #6717: URL: https://github.com/apache/iceberg/pull/6717#discussion_r1101943508 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -368,7 +368,7 @@ private static CaseInsensitiveStringMap addSnapshotId( s

[GitHub] [iceberg] dramaticlly opened a new issue, #6794: Support Changed PartitionCount in AddFiles Procedure output

2023-02-09 Thread via GitHub
dramaticlly opened a new issue, #6794: URL: https://github.com/apache/iceberg/issues/6794 ### Feature Request / Improvement Today Spark https://iceberg.apache.org/docs/latest/spark-procedures/#add_files only return number of files added to the iceberg table but missing other statisti

[GitHub] [iceberg] jackye1995 commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
jackye1995 commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424727896 I would +1 on having this at catalog API level, and +1 on making this a catalog-specific implementation for drop table so user can just run `DROP TABLE` as usual, instead of having an

[GitHub] [iceberg] RussellSpitzer commented on issue #6794: Support Changed PartitionCount in AddFiles Procedure output

2023-02-09 Thread via GitHub
RussellSpitzer commented on issue #6794: URL: https://github.com/apache/iceberg/issues/6794#issuecomment-1424728854 I think the Procedure we can actually change the output for in a minor (as long as we are just adding fields) I forgot that this wasn't actually Action! I thought there was an

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1101938158 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -43,15 +43,21 @@ public class PositionDeletesTable extends BaseMetadataTable { priva

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1101959806 ## core/src/main/java/org/apache/iceberg/MetadataTable.java: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1101965485 ## core/src/main/java/org/apache/iceberg/MetadataTable.java: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] agnes-xinyi-lu commented on issue #6778: Rest Catalog UpdateTableRequest IOException handling could cause data discrepancy in case of response getting lost

2023-02-09 Thread via GitHub
agnes-xinyi-lu commented on issue #6778: URL: https://github.com/apache/iceberg/issues/6778#issuecomment-1424819107 This is a different issue comparing to 5694. HttpStatusCode is supposed to reflect server side errors, but if it's a connection issue, server can't send the response back to t

[GitHub] [iceberg-docs] Fokko commented on pull request #200: Update Slack Join Invite Link

2023-02-09 Thread via GitHub
Fokko commented on PR #200: URL: https://github.com/apache/iceberg-docs/pull/200#issuecomment-1424823212 Thanks @RussellSpitzer for updating the link 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg-docs] Fokko merged pull request #200: Update Slack Join Invite Link

2023-02-09 Thread via GitHub
Fokko merged PR #200: URL: https://github.com/apache/iceberg-docs/pull/200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6717: spark 3.3 read by snapshot ref schema

2023-02-09 Thread via GitHub
jackye1995 commented on code in PR #6717: URL: https://github.com/apache/iceberg/pull/6717#discussion_r1102022039 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -722,6 +741,9 @@ private Table loadFromPathIdentifier(PathIdentifier ident) {

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1102025442 ## core/src/main/java/org/apache/iceberg/MetadataTable.java: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1102025442 ## core/src/main/java/org/apache/iceberg/MetadataTable.java: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1101949695 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/PositionDeleteRowReader.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [iceberg] Fokko merged pull request #6790: Nessie 0.48.2

2023-02-09 Thread via GitHub
Fokko merged PR #6790: URL: https://github.com/apache/iceberg/pull/6790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6795: AWS: Remove unused validateTableIdentifier method in IcebergToGlueConverter

2023-02-09 Thread via GitHub
amogh-jahagirdar opened a new pull request, #6795: URL: https://github.com/apache/iceberg/pull/6795 IcebergToGlueConverter.validateTableIdentifier is unused and since IcebergToGlueConverter is package private we should be good to remove it. CC: @jackye1995 @singhpk234 @yyanyy -- T

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1424907711 @amogh-jahagirdar I must be using OOP incorrectly because the same job works without an OOP approach.. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [iceberg] szehon-ho merged pull request #6410: Configurable metrics reporter by catalog properties

2023-02-09 Thread via GitHub
szehon-ho merged PR #6410: URL: https://github.com/apache/iceberg/pull/6410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho commented on pull request #6410: Configurable metrics reporter by catalog properties

2023-02-09 Thread via GitHub
szehon-ho commented on PR #6410: URL: https://github.com/apache/iceberg/pull/6410#issuecomment-1424909013 Merged, thanks @kmozaid , also @nastra , @gaborkaszab , @PraveenNanda124 for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] thomasaNvidia closed issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
thomasaNvidia closed issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table URL: https://github.com/apache/iceberg/issues/6783 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [iceberg] romanstreamsets opened a new issue, #6796: AvtoSchemaUtils.convert() method produces Iceberg schema different from that by Hive/Spark

2023-02-09 Thread via GitHub
romanstreamsets opened a new issue, #6796: URL: https://github.com/apache/iceberg/issues/6796 ### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Say, I run this in Spark/Hive: `CREATE TABLE FOO (col1 int) U

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] srilman closed issue #6620: Python: More Flexible Dependency Requirements, especially for Optional Deps

2023-02-09 Thread via GitHub
srilman closed issue #6620: Python: More Flexible Dependency Requirements, especially for Optional Deps URL: https://github.com/apache/iceberg/issues/6620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [iceberg] dramaticlly opened a new pull request, #6797: Spark: Return partition stats for AddFiles procedure

2023-02-09 Thread via GitHub
dramaticlly opened a new pull request, #6797: URL: https://github.com/apache/iceberg/pull/6797 Close #6794 This add `changed_partition_count` as results from Spark AddFilesProcedure. Before this change, spark only return `added_files_count` when adding external files to iceberg tabl

[GitHub] [iceberg] jackye1995 commented on pull request #6598: Core: View representation core implementation

2023-02-09 Thread via GitHub
jackye1995 commented on PR #6598: URL: https://github.com/apache/iceberg/pull/6598#issuecomment-1425003848 Seems like we don't have much movement on the review at this point, given the fact that all current comments are addressed and this is a part of many PRs for view catalog integration,

[GitHub] [iceberg] jackye1995 merged pull request #6598: Core: View representation core implementation

2023-02-09 Thread via GitHub
jackye1995 merged PR #6598: URL: https://github.com/apache/iceberg/pull/6598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] github-actions[bot] commented on issue #5371: Hive: Concurrency Issue for CachedClientPool

2023-02-09 Thread via GitHub
github-actions[bot] commented on issue #5371: URL: https://github.com/apache/iceberg/issues/5371#issuecomment-1425005125 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #5371: Hive: Concurrency Issue for CachedClientPool

2023-02-09 Thread via GitHub
github-actions[bot] closed issue #5371: Hive: Concurrency Issue for CachedClientPool URL: https://github.com/apache/iceberg/issues/5371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] github-actions[bot] commented on issue #5273: Migrate to Spark DS V2 Filter

2023-02-09 Thread via GitHub
github-actions[bot] commented on issue #5273: URL: https://github.com/apache/iceberg/issues/5273#issuecomment-1425005151 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] jackye1995 commented on pull request #6795: AWS: Remove unused validateTableIdentifier method in IcebergToGlueConverter

2023-02-09 Thread via GitHub
jackye1995 commented on PR #6795: URL: https://github.com/apache/iceberg/pull/6795#issuecomment-1425005881 Thanks for the work @amogh-jahagirdar ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] jackye1995 merged pull request #6795: AWS: Remove unused validateTableIdentifier method in IcebergToGlueConverter

2023-02-09 Thread via GitHub
jackye1995 merged PR #6795: URL: https://github.com/apache/iceberg/pull/6795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6716: Spark 3.3: Implement Position Deletes Table

2023-02-09 Thread via GitHub
szehon-ho commented on code in PR #6716: URL: https://github.com/apache/iceberg/pull/6716#discussion_r1102154369 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/PositionDeleteRowReader.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [iceberg] huaxingao commented on issue #5273: Migrate to Spark DS V2 Filter

2023-02-09 Thread via GitHub
huaxingao commented on issue #5273: URL: https://github.com/apache/iceberg/issues/5273#issuecomment-1425012296 The first PR (https://github.com/apache/iceberg/pull/5302) was merged. I have the 2nd part on my local and will submit soon. -- This is an automated message from the Apache Git S

[GitHub] [iceberg] haizhou-zhao opened a new issue, #6798: [Rest Catalog Open API] Usage of "oneof" in the definition

2023-02-09 Thread via GitHub
haizhou-zhao opened a new issue, #6798: URL: https://github.com/apache/iceberg/issues/6798 ### Query engine _No response_ ### Question ## **Summary** I attempted to use open api generator (jaxrs-spec) to generate java api&service code using the provided Iceberg Re

[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership

2023-02-09 Thread via GitHub
haizhou-zhao commented on code in PR #6621: URL: https://github.com/apache/iceberg/pull/6621#discussion_r1102184009 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -494,6 +494,17 @@ private void setHmsTableParameters( // remove any pr

[GitHub] [iceberg] lurnagao commented on issue #476: support maven build

2023-02-09 Thread via GitHub
lurnagao commented on issue #476: URL: https://github.com/apache/iceberg/issues/476#issuecomment-1425066709 Hi,Do support maven build now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [iceberg] wypoon opened a new pull request, #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

2023-02-09 Thread via GitHub
wypoon opened a new pull request, #6799: URL: https://github.com/apache/iceberg/pull/6799 This is a continuation of https://github.com/apache/iceberg/pull/5893. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102202202 ## core/src/main/java/org/apache/iceberg/util/NumberUtil.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102202202 ## core/src/main/java/org/apache/iceberg/util/NumberUtil.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102203696 ## core/src/test/java/org/apache/iceberg/TableTestBase.java: ## @@ -237,12 +253,23 @@ ManifestFile writeManifest(DataFile... files) throws IOException { } Manif

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102204451 ## core/src/test/java/org/apache/iceberg/TableTestBase.java: ## @@ -163,6 +166,19 @@ public class TableTestBase { static final FileIO FILE_IO = new TestTables.Local

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102206884 ## core/src/test/java/org/apache/iceberg/TestManifestListWriter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

[GitHub] [iceberg] wypoon commented on a diff in pull request #5893: Core: Use avro compression properties from table properties while writing Manifest and Manifest list files.

2023-02-09 Thread via GitHub
wypoon commented on code in PR #5893: URL: https://github.com/apache/iceberg/pull/5893#discussion_r1102207241 ## build.gradle: ## @@ -286,6 +286,8 @@ project(':iceberg-core') { testImplementation "org.xerial:sqlite-jdbc" testImplementation project(path: ':iceberg-api',

[GitHub] [iceberg] rbalamohan commented on pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2023-02-09 Thread via GitHub
rbalamohan commented on PR #6432: URL: https://github.com/apache/iceberg/pull/6432#issuecomment-142509 I will check making the default threads to 4 instead of "number of processors". In larger systems, users have the flexibility to change it via system property. -- This is an automa

[GitHub] [iceberg] youngxinler commented on pull request #6554: Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518

2023-02-09 Thread via GitHub
youngxinler commented on PR #6554: URL: https://github.com/apache/iceberg/pull/6554#issuecomment-1425091841 Thanks for the review and guidance in the process. @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [iceberg] JonasJ-ap commented on issue #6781: Fix migration of Delta table that has performed VACUUM

2023-02-09 Thread via GitHub
JonasJ-ap commented on issue #6781: URL: https://github.com/apache/iceberg/issues/6781#issuecomment-1425153055 Some context and my thoughts here: Reference: delta lake's [doc](https://docs.delta.io/latest/delta-utility.html): 1. `VACUUM` delete only data files, not log files 2.

[GitHub] [iceberg] JonasJ-ap commented on issue #6768: Support Delta name mapping to Iceberg conversion

2023-02-09 Thread via GitHub
JonasJ-ap commented on issue #6768: URL: https://github.com/apache/iceberg/issues/6768#issuecomment-1425155498 More info regarding this feature request: Delta Lake community is working on supporting column mapping (physical column name) in `delta-standlone` which is used by the `iceberg-d

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1102265083 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1234,6 +1243,95 @@ public void testMultipleRefsAndCleanExpiredFilesFailsForIncrementalCl

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1102266077 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1515,4 +1599,47 @@ private RemoveSnapshots removeSnapshots(Table table) { RemoveSnap

[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2023-02-09 Thread via GitHub
ajantha-bhat commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1425164429 > @ajantha-bhat, looks like there are just two more things to fix. Thanks! Done. Thanks for the review. -- This is an automated message from the Apache Git Service. To respon

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1102289279 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -137,14 +140,24 @@ public static StructLikeSet toEqualitySet( public static PositionDelet

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2023-02-09 Thread via GitHub
aokolnychyi commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1102290538 ## core/src/main/java/org/apache/iceberg/SystemProperties.java: ## @@ -41,6 +41,9 @@ private SystemProperties() {} public static final int IO_MANIFEST_CACHE_MA

[GitHub] [iceberg] aokolnychyi commented on pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2023-02-09 Thread via GitHub
aokolnychyi commented on PR #6432: URL: https://github.com/apache/iceberg/pull/6432#issuecomment-1425211289 @rbalamohan, do you have the same position files that are read over and over again for different data files in a combined scan task? Or is it mostly unique delete files per each data

[GitHub] [iceberg] jackye1995 commented on issue #6781: Fix migration of Delta table that has performed VACUUM

2023-02-09 Thread via GitHub
jackye1995 commented on issue #6781: URL: https://github.com/apache/iceberg/issues/6781#issuecomment-1425214434 Thanks for the explanation! I am not sure how Delta leverages its logs. Does each log has a unique ID? Is that useable by end users? For Iceberg, users can query and do time

[GitHub] [iceberg] xuzhiwen1255 opened a new issue, #6800: Do you need to save all historical json files after cleaning up expired snapshots?

2023-02-09 Thread via GitHub
xuzhiwen1255 opened a new issue, #6800: URL: https://github.com/apache/iceberg/issues/6800 ### Query engine _No response_ ### Question https://user-images.githubusercontent.com/105710753/217992943-217a18fc-708d-42a6-bb2b-8dd2ac355d64.png";> After cleaning and clean

[GitHub] [iceberg] RussellSpitzer commented on issue #6800: Do you need to save all historical json files after cleaning up expired snapshots?

2023-02-09 Thread via GitHub
RussellSpitzer commented on issue #6800: URL: https://github.com/apache/iceberg/issues/6800#issuecomment-1425230959 I think this is worth thinking about but I'm not sure it is as simple as this. Currently we don't really consider the json files to really have the same lifecycle as other met

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6614: Flink:fix flink streaming query problem [ Cannot get a client from a closed pool]

2023-02-09 Thread via GitHub
hililiwei commented on code in PR #6614: URL: https://github.com/apache/iceberg/pull/6614#discussion_r1102304787 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/TableLoader.java: ## @@ -34,12 +34,16 @@ * the cluster (for example, to get splits), not just on the cli

[GitHub] [iceberg] xuzhiwen1255 commented on issue #6800: Do you need to save all historical json files after cleaning up expired snapshots?

2023-02-09 Thread via GitHub
xuzhiwen1255 commented on issue #6800: URL: https://github.com/apache/iceberg/issues/6800#issuecomment-1425256351 > I think this is worth thinking about but I'm not sure it is as simple as this. Currently we don't really consider the json files to really have the same lifecycle as other

[GitHub] [iceberg] Fokko commented on a diff in pull request #6801: API,Core,Spark: Add rewritten bytes to rewrite data files procedure results

2023-02-10 Thread via GitHub
Fokko commented on code in PR #6801: URL: https://github.com/apache/iceberg/pull/6801#discussion_r1102438362 ## core/src/main/java/org/apache/iceberg/actions/BaseFileGroupRewriteResult.java: ## @@ -24,13 +24,26 @@ public class BaseFileGroupRewriteResult implements FileGroupRewr

[GitHub] [iceberg] nastra commented on a diff in pull request #6801: API,Core,Spark: Add rewritten bytes to rewrite data files procedure results

2023-02-10 Thread via GitHub
nastra commented on code in PR #6801: URL: https://github.com/apache/iceberg/pull/6801#discussion_r1102443701 ## core/src/main/java/org/apache/iceberg/actions/BaseFileGroupRewriteResult.java: ## @@ -24,13 +24,26 @@ public class BaseFileGroupRewriteResult implements FileGroupRew

[GitHub] [iceberg] Fokko commented on a diff in pull request #6801: API,Core,Spark: Add rewritten bytes to rewrite data files procedure results

2023-02-10 Thread via GitHub
Fokko commented on code in PR #6801: URL: https://github.com/apache/iceberg/pull/6801#discussion_r1102482709 ## core/src/main/java/org/apache/iceberg/actions/BaseFileGroupRewriteResult.java: ## @@ -24,13 +24,26 @@ public class BaseFileGroupRewriteResult implements FileGroupRewr

<    13   14   15   16   17   18   19   20   21   22   >