[GitHub] [iceberg] Fokko opened a new pull request, #6580: Python: Bump pylint

2023-01-13 Thread GitBox
Fokko opened a new pull request, #6580: URL: https://github.com/apache/iceberg/pull/6580 And remove the temporary hack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [iceberg] Fokko commented on a diff in pull request #6525: Python: Refactor loading manifests

2023-01-13 Thread GitBox
Fokko commented on code in PR #6525: URL: https://github.com/apache/iceberg/pull/6525#discussion_r1069592946 ## python/pyiceberg/avro/reader.py: ## @@ -238,41 +249,50 @@ def skip(self, decoder: BinaryDecoder) -> None: return self.option.skip(decoder) -class Stru

[GitHub] [iceberg] Fokko commented on a diff in pull request #6525: Python: Refactor loading manifests

2023-01-13 Thread GitBox
Fokko commented on code in PR #6525: URL: https://github.com/apache/iceberg/pull/6525#discussion_r1069602456 ## python/pyiceberg/avro/reader.py: ## @@ -238,41 +249,50 @@ def skip(self, decoder: BinaryDecoder) -> None: return self.option.skip(decoder) -class Stru

[GitHub] [iceberg-docs] InvisibleProgrammer commented on pull request #191: Fix sidebar

2023-01-13 Thread GitBox
InvisibleProgrammer commented on PR #191: URL: https://github.com/apache/iceberg-docs/pull/191#issuecomment-1382031000 @samredai : thx for the approve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [iceberg-docs] RussellSpitzer merged pull request #191: Fix sidebar

2023-01-13 Thread GitBox
RussellSpitzer merged PR #191: URL: https://github.com/apache/iceberg-docs/pull/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

[GitHub] [iceberg-docs] RussellSpitzer commented on pull request #191: Fix sidebar

2023-01-13 Thread GitBox
RussellSpitzer commented on PR #191: URL: https://github.com/apache/iceberg-docs/pull/191#issuecomment-1382043820 Thanks for the review @samredai and thank you @InvisibleProgrammer for the pr -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] szehon-ho opened a new pull request, #6581: Spark 3.3: Add RemoveDanglingDeletes action

2023-01-13 Thread GitBox
szehon-ho opened a new pull request, #6581: URL: https://github.com/apache/iceberg/pull/6581 his adds an action to cleanup dangling (invalid) DeleteFiles that may otherwise keep getting carried over with the table's current snapshot. The problem and design doc is here: https://docs.g

[GitHub] [iceberg] krvikash commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

2023-01-13 Thread GitBox
krvikash commented on code in PR #6499: URL: https://github.com/apache/iceberg/pull/6499#discussion_r1069695013 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java: ## @@ -307,6 +312,61 @@ public void testAlreadyExistsException() { () -> catalog

[GitHub] [iceberg] stevenzwu merged pull request #6572: Flink: backport PR #6337, PR #6426, PR #6557 to Flink 1.14 and 1.15 f…

2023-01-13 Thread GitBox
stevenzwu merged PR #6572: URL: https://github.com/apache/iceberg/pull/6572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6572: Flink: backport PR #6337, PR #6426, PR #6557 to Flink 1.14 and 1.15 f…

2023-01-13 Thread GitBox
stevenzwu commented on PR #6572: URL: https://github.com/apache/iceberg/pull/6572#issuecomment-1382135319 thanks @pvary for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] stevenzwu merged pull request #6401: Flink: Change to oldestAncestorAfter

2023-01-13 Thread GitBox
stevenzwu merged PR #6401: URL: https://github.com/apache/iceberg/pull/6401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6401: Flink: Change to oldestAncestorAfter

2023-01-13 Thread GitBox
stevenzwu commented on PR #6401: URL: https://github.com/apache/iceberg/pull/6401#issuecomment-1382149280 thanks @hililiwei for the contribution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [iceberg] stevenzwu merged pull request #6222: Flink: Support inspecting table

2023-01-13 Thread GitBox
stevenzwu merged PR #6222: URL: https://github.com/apache/iceberg/pull/6222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6222: Flink: Support inspecting table

2023-01-13 Thread GitBox
stevenzwu commented on PR #6222: URL: https://github.com/apache/iceberg/pull/6222#issuecomment-1382153049 thanks @hililiwei for contributing this major feature -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [iceberg] rdblue commented on a diff in pull request #6517: Parquet: Fixes Incorrect Skipping of RowGroups with NaNs

2023-01-13 Thread GitBox
rdblue commented on code in PR #6517: URL: https://github.com/apache/iceberg/pull/6517#discussion_r1069784838 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java: ## @@ -560,24 +560,26 @@ private T max(Statistics statistics, int id) { }

[GitHub] [iceberg] rdblue merged pull request #6517: Parquet: Fixes Incorrect Skipping of RowGroups with NaNs

2023-01-13 Thread GitBox
rdblue merged PR #6517: URL: https://github.com/apache/iceberg/pull/6517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue closed issue #6516: Parquet: Metric Row Group Filter handles Undefined Min/Max incorrectly Missing Rows

2023-01-13 Thread GitBox
rdblue closed issue #6516: Parquet: Metric Row Group Filter handles Undefined Min/Max incorrectly Missing Rows URL: https://github.com/apache/iceberg/issues/6516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6517: Parquet: Fixes Incorrect Skipping of RowGroups with NaNs

2023-01-13 Thread GitBox
RussellSpitzer commented on code in PR #6517: URL: https://github.com/apache/iceberg/pull/6517#discussion_r1069785830 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java: ## @@ -560,24 +560,26 @@ private T max(Statistics statistics, int id) {

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6575: URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069789840 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java: ## @@ -231,6 +233,68 @@ public void testVersionAsOf() { assertEquals("Snapshot a

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6575: URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069798452 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throw

[GitHub] [iceberg] rdblue commented on a diff in pull request #6525: Python: Refactor loading manifests

2023-01-13 Thread GitBox
rdblue commented on code in PR #6525: URL: https://github.com/apache/iceberg/pull/6525#discussion_r1069808528 ## python/tests/expressions/test_evaluator.py: ## @@ -52,112 +54,126 @@ ) +def _record_simple(id: int, data: Optional[str]) -> Record: # pylint: disable=redefined

[GitHub] [iceberg] jackye1995 commented on pull request #6576: AWS: Fix check for isTableRegisteredWithLF leading to CREATE table failure

2023-01-13 Thread GitBox
jackye1995 commented on PR #6576: URL: https://github.com/apache/iceberg/pull/6576#issuecomment-1382243877 @xiaoxuandev can you take a look? I believe this case should have been covered in unit test, need to take a deeper look into it -- This is an automated message from the Apache Git Se

[GitHub] [iceberg] huaxingao opened a new pull request, #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao opened a new pull request, #6582: URL: https://github.com/apache/iceberg/pull/6582 Add a Spark procedure to collect NDV, which will be used for CBO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1069901978 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apache.or

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1069914737 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1069918267 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1069937682 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableMetadata.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #6179: AWS: Re-tag files when renaming tables in GlueCatalog

2023-01-13 Thread GitBox
JonasJ-ap opened a new pull request, #6179: URL: https://github.com/apache/iceberg/pull/6179 Follows PR #4402 . As mentioned in https://github.com/apache/iceberg/pull/4402#issuecomment-1261096282: In `GlueCatalog`, if `s3.write.table-name-tag-enabled` and `s3.write.namespace-name-tag

[GitHub] [iceberg] jackye1995 commented on pull request #6179: AWS: Re-tag files when renaming tables in GlueCatalog

2023-01-13 Thread GitBox
jackye1995 commented on PR #6179: URL: https://github.com/apache/iceberg/pull/6179#issuecomment-1382318184 I'd like to discuss this a bit more, since we do have some actual customer use cases for this, because overall the S3 tagging related features in Iceberg integrate very well with S3 li

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
RussellSpitzer commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070017872 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apac

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
RussellSpitzer commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070020289 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software F

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
RussellSpitzer commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070023914 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software F

[GitHub] [iceberg] RussellSpitzer commented on pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

2023-01-13 Thread GitBox
RussellSpitzer commented on PR #6461: URL: https://github.com/apache/iceberg/pull/6461#issuecomment-1382374860 I'm still a little worried about saving this information, what does knowing the sort order mean for a file. Are we guaranteeing that the file is locally sorted by that order? or gl

[GitHub] [iceberg] pvary opened a new pull request, #6583: Flink: Refactor sink tests to use HadoopCatalogResource

2023-01-13 Thread GitBox
pvary opened a new pull request, #6583: URL: https://github.com/apache/iceberg/pull/6583 Refactor Flink Sink tests to use the HadoopCatalogResource. This is a groundwork for adding encryption tests for Flink Sources and Sinks -- This is an automated message from the Apache Git Service.

[GitHub] [iceberg] pvary commented on pull request #6583: Flink: Refactor sink tests to use HadoopCatalogResource

2023-01-13 Thread GitBox
pvary commented on PR #6583: URL: https://github.com/apache/iceberg/pull/6583#issuecomment-1382415642 CC: @hililiwei, @ggershinsky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6581: Spark 3.3: Add RemoveDanglingDeletes action

2023-01-13 Thread GitBox
szehon-ho commented on code in PR #6581: URL: https://github.com/apache/iceberg/pull/6581#discussion_r1070090624 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RemoveDanglingDeletesSparkAction.java: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software

[GitHub] [iceberg] huaxingao commented on issue #6549: Collecting Iceberg NDV Statistics for Spark Engine

2023-01-13 Thread GitBox
huaxingao commented on issue #6549: URL: https://github.com/apache/iceberg/issues/6549#issuecomment-1382430906 Here is the [PR](https://github.com/apache/iceberg/pull/6582) for implementing a Spark stored procedure to collect NDV. -- This is an automated message from the Apache Git Servic

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070099281 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070101248 ## core/src/main/java/org/apache/iceberg/view/ViewHistoryEntryParser.java: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070101425 ## core/src/test/java/org/apache/iceberg/view/TestViewHistoryEntryParser.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070101669 ## core/src/main/java/org/apache/iceberg/view/ViewHistoryEntryParser.java: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6559: Core: View core parser implementations

2023-01-13 Thread GitBox
amogh-jahagirdar commented on PR #6559: URL: https://github.com/apache/iceberg/pull/6559#issuecomment-1382437507 Thanks @nastra I'll be taking these suggestions in all the split PRs I'm raising. Agreed, more tests on nullability/missing fields are needed, and now that we use Immutable depen

[GitHub] [iceberg] amogh-jahagirdar closed pull request #6559: Core: View core parser implementations

2023-01-13 Thread GitBox
amogh-jahagirdar closed pull request #6559: Core: View core parser implementations URL: https://github.com/apache/iceberg/pull/6559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [iceberg] rdblue commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
rdblue commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070122674 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] rdblue commented on pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
rdblue commented on PR #6565: URL: https://github.com/apache/iceberg/pull/6565#issuecomment-1382456468 Looks good other than the name of the history entry interface. Thanks, @amogh-jahagirdar! -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [iceberg] flyrain commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
flyrain commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070140194 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apache.org/

[GitHub] [iceberg] stevenzwu opened a new pull request, #6584: Flink: support reading as Avro GenericRecord for FLIP-27 IcebergSource

2023-01-13 Thread GitBox
stevenzwu opened a new pull request, #6584: URL: https://github.com/apache/iceberg/pull/6584 cc @hililiwei -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070149483 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070149483 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
szehon-ho commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070150005 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070149483 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070149483 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1070158086 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6573: Docs: Add information on how to read from branches and tags in Spark docs

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6573: URL: https://github.com/apache/iceberg/pull/6573#discussion_r1070158484 ## docs/spark-queries.md: ## @@ -126,6 +126,8 @@ To select a specific table snapshot or the snapshot at some time in the DataFram * `snapshot-id` selects a

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070149483 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070159167 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070159167 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070161504 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
amogh-jahagirdar commented on code in PR #6565: URL: https://github.com/apache/iceberg/pull/6565#discussion_r1070161504 ## core/src/main/java/org/apache/iceberg/view/BaseViewHistoryEntry.java: ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1070164244 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1070166470 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] williamhyun opened a new pull request, #6585: Update ORC to 1.8.2

2023-01-13 Thread GitBox
williamhyun opened a new pull request, #6585: URL: https://github.com/apache/iceberg/pull/6585 Apache ORC 1.8.2 is the latest version of ORC which brings the following changes and bug fixes including an SBOM. - https://github.com/apache/orc/releases/tag/v1.8.2 -- This is an automated m

[GitHub] [iceberg] github-actions[bot] commented on issue #5183: Allow to configure Avro block size

2023-01-13 Thread GitBox
github-actions[bot] commented on issue #5183: URL: https://github.com/apache/iceberg/issues/5183#issuecomment-1382598801 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] commented on issue #4607: [Docs] Create an item list for re-organizing docs to the proposed layout

2023-01-13 Thread GitBox
github-actions[bot] commented on issue #4607: URL: https://github.com/apache/iceberg/issues/4607#issuecomment-1382598842 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] github-actions[bot] closed issue #5183: Allow to configure Avro block size

2023-01-13 Thread GitBox
github-actions[bot] closed issue #5183: Allow to configure Avro block size URL: https://github.com/apache/iceberg/issues/5183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [iceberg] dramaticlly commented on a diff in pull request #6574: Python: Raise exception on deletes

2023-01-13 Thread GitBox
dramaticlly commented on code in PR #6574: URL: https://github.com/apache/iceberg/pull/6574#discussion_r1070176238 ## python/pyiceberg/table/__init__.py: ## @@ -341,7 +346,18 @@ def plan_files(self) -> Iterator[FileScanTask]: all_files = files(io.new_input(manifest.

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070181796 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apache.or

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070181906 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070181959 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-13 Thread GitBox
huaxingao commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1070182092 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/DistinctCountProcedure.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] jackye1995 commented on pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
jackye1995 commented on PR #6565: URL: https://github.com/apache/iceberg/pull/6565#issuecomment-1382622527 Since the immutable is addressed and we have enough approvals, I will go ahead to merge the PR, thanks for the review @rdblue and @nastra ! -- This is an automated message from the A

[GitHub] [iceberg] jackye1995 merged pull request #6565: Core: View history entry core implementation

2023-01-13 Thread GitBox
jackye1995 merged PR #6565: URL: https://github.com/apache/iceberg/pull/6565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] jackye1995 commented on pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
jackye1995 commented on PR #6428: URL: https://github.com/apache/iceberg/pull/6428#issuecomment-1382624709 Looks like some CI tests are failing? Could you check? Maybe need to rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6573: Docs: Add information on how to read from branches and tags in Spark docs

2023-01-13 Thread GitBox
jackye1995 commented on code in PR #6573: URL: https://github.com/apache/iceberg/pull/6573#discussion_r1070189868 ## docs/spark-queries.md: ## @@ -126,6 +126,8 @@ To select a specific table snapshot or the snapshot at some time in the DataFram * `snapshot-id` selects a speci

[GitHub] [iceberg] jackye1995 commented on pull request #6573: Docs: Add information on how to read from branches and tags in Spark docs

2023-01-13 Thread GitBox
jackye1995 commented on PR #6573: URL: https://github.com/apache/iceberg/pull/6573#issuecomment-1382625570 Thanks everyone for the review, as I said in the thread for the SQL related changes, I will wait for some more time in case there are disagreements. I will merge this in first and we c

[GitHub] [iceberg] jackye1995 merged pull request #6573: Docs: Add information on how to read from branches and tags in Spark docs

2023-01-13 Thread GitBox
jackye1995 merged PR #6573: URL: https://github.com/apache/iceberg/pull/6573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] renshangtao commented on pull request #6577: Fix the problem of shadow jar without modifying meta-inf/services at the same time

2023-01-13 Thread GitBox
renshangtao commented on PR #6577: URL: https://github.com/apache/iceberg/pull/6577#issuecomment-1382632267 @nastra thank you for your reply. After this modification, the code can be executed as expected. I open the runtime jar package, and the class corresponding to shadowjar in META-INF/s

[GitHub] [iceberg] jackye1995 commented on pull request #6575: Spark 3.3: support version travel by reference name

2023-01-13 Thread GitBox
jackye1995 commented on PR #6575: URL: https://github.com/apache/iceberg/pull/6575#issuecomment-1382632711 @aokolnychyi @RussellSpitzer @rdblue any opinions about this support? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [iceberg] jackye1995 opened a new pull request, #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-13 Thread GitBox
jackye1995 opened a new pull request, #6586: URL: https://github.com/apache/iceberg/pull/6586 Currently no matter in what situation warehouse path must be specified, but in many cases the user just want to initialize Glue catalog to read data, and don't want to pass in a warehouse path. Thi

[GitHub] [iceberg] dennishuo commented on pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
dennishuo commented on PR #6428: URL: https://github.com/apache/iceberg/pull/6428#issuecomment-1382648326 @jackye1995 Thanks for the heads up! Looks like merging to head fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [iceberg] danielcweeks merged pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2023-01-13 Thread GitBox
danielcweeks merged PR #6428: URL: https://github.com/apache/iceberg/pull/6428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[GitHub] [iceberg] dmgcodevil opened a new issue, #6587: Wrong class, java.lang.Long, for object: 19367

2023-01-13 Thread GitBox
dmgcodevil opened a new issue, #6587: URL: https://github.com/apache/iceberg/issues/6587 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 I have a timestamp field of type: `timestamptz`. I'm trying to compact files using S

[GitHub] [iceberg] singhpk234 commented on pull request #6576: AWS: Fix check for isTableRegisteredWithLF leading to CREATE table failure

2023-01-13 Thread GitBox
singhpk234 commented on PR #6576: URL: https://github.com/apache/iceberg/pull/6576#issuecomment-1382673430 did some more digging, posting what i found so far, this issue, is only observed in 0.14.x and 1.0.0 (and I directly tested my fix on top of master :sweat_smile:), 1.1.0 & master are f

[GitHub] [iceberg] singhpk234 closed pull request #6576: AWS: Fix check for isTableRegisteredWithLF leading to CREATE table failure

2023-01-13 Thread GitBox
singhpk234 closed pull request #6576: AWS: Fix check for isTableRegisteredWithLF leading to CREATE table failure URL: https://github.com/apache/iceberg/pull/6576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] singhpk234 commented on issue #6523: Table creation fails with Glue catalog on EMR

2023-01-13 Thread GitBox
singhpk234 commented on issue #6523: URL: https://github.com/apache/iceberg/issues/6523#issuecomment-1382673665 please consider using iceberg 1.1.0 release, instead, it has fix for the failure added as part of : - https://github.com/apache/iceberg/pull/4423/ -- This is an automate

[GitHub] [iceberg] ajantha-bhat commented on pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

2023-01-13 Thread GitBox
ajantha-bhat commented on PR #6461: URL: https://github.com/apache/iceberg/pull/6461#issuecomment-1382677828 > I'm still a little worried about saving this information, what does knowing the sort order mean for a file. Are we guaranteeing that the file is locally sorted by that order? or gl

[GitHub] [iceberg] nastra commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

2023-01-14 Thread GitBox
nastra commented on code in PR #6575: URL: https://github.com/apache/iceberg/pull/6575#discussion_r1070237113 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java: ## @@ -231,6 +234,67 @@ public void testVersionAsOf() { assertEquals("Snapshot at sp

[GitHub] [iceberg] nastra commented on pull request #6577: Fix the problem of shadow jar without modifying meta-inf/services at the same time

2023-01-14 Thread GitBox
nastra commented on PR #6577: URL: https://github.com/apache/iceberg/pull/6577#issuecomment-1382694306 > @nastra thank you for your reply. After this modification, the code can be executed as expected. I open the runtime jar package, and the class corresponding to shadowjar in META-INF/serv

[GitHub] [iceberg] Fokko commented on pull request #6525: Python: Refactor loading manifests

2023-01-14 Thread GitBox
Fokko commented on PR #6525: URL: https://github.com/apache/iceberg/pull/6525#issuecomment-1382722862 Thanks for the thorough review and PR @rdblue! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] Fokko merged pull request #6525: Python: Refactor loading manifests

2023-01-14 Thread GitBox
Fokko merged PR #6525: URL: https://github.com/apache/iceberg/pull/6525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko merged pull request #6585: Update ORC to 1.8.2

2023-01-14 Thread GitBox
Fokko merged PR #6585: URL: https://github.com/apache/iceberg/pull/6585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on a diff in pull request #6574: Python: Raise exception on deletes

2023-01-14 Thread GitBox
Fokko commented on code in PR #6574: URL: https://github.com/apache/iceberg/pull/6574#discussion_r1070259471 ## python/pyiceberg/table/__init__.py: ## @@ -341,7 +346,18 @@ def plan_files(self) -> Iterator[FileScanTask]: all_files = files(io.new_input(manifest.manife

[GitHub] [iceberg] RussellSpitzer opened a new pull request, #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer opened a new pull request, #6588: URL: https://github.com/apache/iceberg/pull/6588 An issue we've run into frequently is that several Spark actions perform deletes on the driver with a default parallelism of 1. This is quite slow for S3 and painfully slow for very large table

[GitHub] [iceberg] RussellSpitzer commented on pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer commented on PR #6588: URL: https://github.com/apache/iceberg/pull/6588#issuecomment-1382730394 @anuragmantri + @aokolnychyi + @rdblue - This is a bit of a big default behavior change but it's been biting a lot of our users lately and the change is relatively safe. -- This

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer commented on code in PR #6588: URL: https://github.com/apache/iceberg/pull/6588#discussion_r1070261827 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -47,4 +47,8 @@ private SparkSQLProperties() {} public static final S

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer commented on code in PR #6588: URL: https://github.com/apache/iceberg/pull/6588#discussion_r1070261893 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ## @@ -231,24 +258,27 @@ protected DeleteSummary deleteFiles( Del

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer commented on code in PR #6588: URL: https://github.com/apache/iceberg/pull/6588#discussion_r1070261893 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ## @@ -231,24 +258,27 @@ protected DeleteSummary deleteFiles( Del

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-14 Thread GitBox
RussellSpitzer commented on code in PR #6588: URL: https://github.com/apache/iceberg/pull/6588#discussion_r1070261927 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -246,12 +246,13 @@ private DeleteOrphanFiles.Result d

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

2023-01-14 Thread GitBox
RussellSpitzer commented on code in PR #6575: URL: https://github.com/apache/iceberg/pull/6575#discussion_r1070263376 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) t

[GitHub] [iceberg] RussellSpitzer commented on issue #6587: Wrong class, java.lang.Long, for object: 19367

2023-01-14 Thread GitBox
RussellSpitzer commented on issue #6587: URL: https://github.com/apache/iceberg/issues/6587#issuecomment-1382733719 Could you post the full trace from the Spark code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

  1   2   3   4   5   6   7   8   9   10   >