[GitHub] [iceberg] findepi commented on issue #6443: Provide Puffin reader API allowing read without decompression

2022-12-18 Thread GitBox
findepi commented on issue #6443: URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1357228866 @ajantha-bhat there may be many different types of stats, and stats can be computed for subset of columns. -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [iceberg] ajantha-bhat commented on issue #6443: Provide Puffin reader API allowing read without decompression

2022-12-18 Thread GitBox
ajantha-bhat commented on issue #6443: URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1357223293 > When a query engine wants to add new stats to a snapshot that already has some stats, it currently needs to merge existing stats file' blobs with new ones. Can you plea

[GitHub] [iceberg] ajantha-bhat commented on issue #6442: Extends Iceberg table stats API to allow publish data and stats atomically

2022-12-18 Thread GitBox
ajantha-bhat commented on issue #6442: URL: https://github.com/apache/iceberg/issues/6442#issuecomment-1357221410 > but some query engines (like Trino) can collect stats on the fly, when writing to a table (INSERT, CREATE TABLE AS ...). I think we have discussed this for partitions st

[GitHub] [iceberg] nastra commented on pull request #6436: Core: Add flag to control sending metric reports via REST

2022-12-18 Thread GitBox
nastra commented on PR #6436: URL: https://github.com/apache/iceberg/pull/6436#issuecomment-1357218106 @rdblue I've rebased the PR, so the old commits with the metrics-impl property are gone (since they are merged already). -- This is an automated message from the Apache Git Service. To r

[GitHub] [iceberg] nastra commented on a diff in pull request #6436: Core: Add flag to control sending metric reports via REST

2022-12-18 Thread GitBox
nastra commented on code in PR #6436: URL: https://github.com/apache/iceberg/pull/6436#discussion_r1051888053 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -316,14 +324,18 @@ private void reportMetrics( TableIdentifier tableIdentifier,

[GitHub] [iceberg] ggershinsky closed pull request #3471: Core: Envelope encryption

2022-12-18 Thread GitBox
ggershinsky closed pull request #3471: Core: Envelope encryption URL: https://github.com/apache/iceberg/pull/3471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [iceberg] ggershinsky commented on pull request #3471: Core: Envelope encryption

2022-12-18 Thread GitBox
ggershinsky commented on PR #3471: URL: https://github.com/apache/iceberg/pull/3471#issuecomment-1357206614 Given the significant changes in envelope key metadata format https://docs.google.com/document/d/1HPobEb2e4ML12Q9qthkbbsu47ziMQsb-sHfTFsVTIss/edit?usp=sharing, replacing this

[GitHub] [iceberg] ConeyLiu commented on pull request #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-18 Thread GitBox
ConeyLiu commented on PR #6335: URL: https://github.com/apache/iceberg/pull/6335#issuecomment-1357204759 @rdblue thanks for the reviewing. It happens for the larger table (has several PBs data) and with many columns(thousand columns which is very common for log data or feature data). And th

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-18 Thread GitBox
ConeyLiu commented on code in PR #6335: URL: https://github.com/apache/iceberg/pull/6335#discussion_r1051870044 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -896,33 +903,27 @@ private Iterable prepareNewManifests() { manifest -> GenericM

[GitHub] [iceberg] nastra commented on pull request #6246: Core: Create and report metrics about Snapshots

2022-12-18 Thread GitBox
nastra commented on PR #6246: URL: https://github.com/apache/iceberg/pull/6246#issuecomment-1357175669 @rdblue I've just rebased and pushed the branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [iceberg] pvary commented on a diff in pull request #6379: Docs: Update Iceberg Hive documentation - 1.0.x (#6337)

2022-12-18 Thread GitBox
pvary commented on code in PR #6379: URL: https://github.com/apache/iceberg/pull/6379#discussion_r1051837704 ## docs/hive.md: ## @@ -38,6 +38,16 @@ Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following feat DML operations work only with MapReduce executio

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: WIP, Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-18 Thread GitBox
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1051828171 ## core/src/main/java/org/apache/iceberg/DeltaLakeDataTypeVisitor.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: WIP, Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-18 Thread GitBox
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1051825473 ## core/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] JonasJ-ap closed pull request #6449: Migrate delta to iceberg

2022-12-18 Thread GitBox
JonasJ-ap closed pull request #6449: Migrate delta to iceberg URL: https://github.com/apache/iceberg/pull/6449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
zhongyujiang commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r1051751770 ## parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java: ## @@ -360,6 +362,21 @@ public void testNotNaNs() { Assert.assertTrue("

[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
zhongyujiang commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r105175 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java: ## @@ -165,20 +165,41 @@ public Boolean isNaN(BoundReference ref) {

[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
zhongyujiang commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r1051749856 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java: ## @@ -165,20 +165,41 @@ public Boolean isNaN(BoundReference ref) {

[GitHub] [iceberg] cccs-eric commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-18 Thread GitBox
cccs-eric commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1356970169 @rdblue All clear now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [iceberg] huaxingao commented on pull request #6405: API: Add Aggregate expression evaluation

2022-12-18 Thread GitBox
huaxingao commented on PR #6405: URL: https://github.com/apache/iceberg/pull/6405#issuecomment-1356946191 @rdblue Thank you very much for the PR! The changes are much cleaner and more generic now. These can be wrapped cleanly in Spark. Once your PR is in, I will make Spark changes on top o

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6405: API: Add Aggregate expression evaluation

2022-12-18 Thread GitBox
huaxingao commented on code in PR #6405: URL: https://github.com/apache/iceberg/pull/6405#discussion_r1051704651 ## api/src/main/java/org/apache/iceberg/expressions/BoundAggregate.java: ## @@ -44,4 +57,85 @@ public Type type() { return term().type(); } } + + publ

[GitHub] [iceberg] github-actions[bot] commented on issue #5103: Sign metadata.json files

2022-12-18 Thread GitBox
github-actions[bot] commented on issue #5103: URL: https://github.com/apache/iceberg/issues/5103#issuecomment-1356911707 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] danielcweeks commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
danielcweeks commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051691247 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] github-actions[bot] commented on issue #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager

2022-12-18 Thread GitBox
github-actions[bot] commented on issue #5098: URL: https://github.com/apache/iceberg/issues/5098#issuecomment-1356911714 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] rdblue commented on pull request #6246: Core: Create and report metrics about Snapshots

2022-12-18 Thread GitBox
rdblue commented on PR #6246: URL: https://github.com/apache/iceberg/pull/6246#issuecomment-1356911365 @nastra looks like this is out of date. Can you rebase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] rdblue merged pull request #6239: Docs: Select the right Spark catalog

2022-12-18 Thread GitBox
rdblue merged PR #6239: URL: https://github.com/apache/iceberg/pull/6239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6239: Docs: Select the right Spark catalog

2022-12-18 Thread GitBox
rdblue commented on code in PR #6239: URL: https://github.com/apache/iceberg/pull/6239#discussion_r1051691004 ## docs/aws.md: ## @@ -68,6 +68,7 @@ done # start Spark SQL client shell spark-sql --packages $DEPENDENCIES \ +--conf spark.sql.defaultCatalog=my_catalog \ Revi

[GitHub] [iceberg] rdblue merged pull request #6267: Docs: Update spec about statistics file snapshot id

2022-12-18 Thread GitBox
rdblue merged PR #6267: URL: https://github.com/apache/iceberg/pull/6267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6267: Docs: Update spec about statistics file snapshot id

2022-12-18 Thread GitBox
rdblue commented on PR #6267: URL: https://github.com/apache/iceberg/pull/6267#issuecomment-1356910794 Thanks, @ajantha-bhat! Merging this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

issues@iceberg.apache.org

2022-12-18 Thread GitBox
rdblue commented on code in PR #6324: URL: https://github.com/apache/iceberg/pull/6324#discussion_r1051690764 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -246,23 +248,33 @@ public void testReplaceTxnBuilder() throws Exception { } @

issues@iceberg.apache.org

2022-12-18 Thread GitBox
rdblue commented on code in PR #6324: URL: https://github.com/apache/iceberg/pull/6324#discussion_r1051690244 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -567,8 +570,13 @@ Database convertToDatabase(Namespace namespace, Map meta) { }

issues@iceberg.apache.org

2022-12-18 Thread GitBox
rdblue commented on code in PR #6324: URL: https://github.com/apache/iceberg/pull/6324#discussion_r1051690326 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -422,11 +425,23 @@ private Table newHmsTable(TableMetadata metadata) { Preco

[GitHub] [iceberg] rdblue commented on pull request #6348: Python: Update license-checker

2022-12-18 Thread GitBox
rdblue commented on PR #6348: URL: https://github.com/apache/iceberg/pull/6348#issuecomment-1356908698 @Fokko, what do you think about using the same license check script (copied) that we do in the Java version? -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [iceberg] rdblue commented on pull request #6234: Docs: Remove parent-version-id from the view spec example

2022-12-18 Thread GitBox
rdblue commented on PR #6234: URL: https://github.com/apache/iceberg/pull/6234#issuecomment-1356908424 Thanks, @ajantha-bhat! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] rdblue merged pull request #6234: Docs: Remove parent-version-id from the view spec example

2022-12-18 Thread GitBox
rdblue merged PR #6234: URL: https://github.com/apache/iceberg/pull/6234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6250: Docs: Remove redundant configuration from spark docs

2022-12-18 Thread GitBox
rdblue commented on PR #6250: URL: https://github.com/apache/iceberg/pull/6250#issuecomment-1356908254 I don't think this is a good idea and is pretty minor either way. I'm going to close it. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [iceberg] rdblue closed pull request #6250: Docs: Remove redundant configuration from spark docs

2022-12-18 Thread GitBox
rdblue closed pull request #6250: Docs: Remove redundant configuration from spark docs URL: https://github.com/apache/iceberg/pull/6250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] rdblue commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

2022-12-18 Thread GitBox
rdblue commented on code in PR #6250: URL: https://github.com/apache/iceberg/pull/6250#discussion_r1051689729 ## docs/spark-getting-started.md: ## @@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w ```sh spark-sql --packages org

[GitHub] [iceberg] rdblue commented on pull request #6190: Spark[3.1 | 3.2]: Support Java 8 time API in SparkValueConverter

2022-12-18 Thread GitBox
rdblue commented on PR #6190: URL: https://github.com/apache/iceberg/pull/6190#issuecomment-1356907875 Thanks, @singhpk234! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] rdblue merged pull request #6190: Spark[3.1 | 3.2]: Support Java 8 time API in SparkValueConverter

2022-12-18 Thread GitBox
rdblue merged PR #6190: URL: https://github.com/apache/iceberg/pull/6190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6293: Added FileIO Support for ORC Reader and Writers

2022-12-18 Thread GitBox
rdblue commented on PR #6293: URL: https://github.com/apache/iceberg/pull/6293#issuecomment-1356907182 Thanks, @pavibhai! Great to have this fixed, even if it's a hack :sweat_smile:. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [iceberg] rdblue merged pull request #6293: Added FileIO Support for ORC Reader and Writers

2022-12-18 Thread GitBox
rdblue merged PR #6293: URL: https://github.com/apache/iceberg/pull/6293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6223: AWS: Use provided glue catalog id in defaultWarehouseLocation

2022-12-18 Thread GitBox
rdblue commented on PR #6223: URL: https://github.com/apache/iceberg/pull/6223#issuecomment-1356906835 Running CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [iceberg] rdblue commented on a diff in pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-18 Thread GitBox
rdblue commented on code in PR #6353: URL: https://github.com/apache/iceberg/pull/6353#discussion_r1051688387 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ## @@ -79,9 +83,11 @@ private ReadConf init() { nameMapping, reuseC

[GitHub] [iceberg] rdblue closed issue #6155: Remove API deprecations for 1.2.0

2022-12-18 Thread GitBox
rdblue closed issue #6155: Remove API deprecations for 1.2.0 URL: https://github.com/apache/iceberg/issues/6155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] rdblue commented on pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-18 Thread GitBox
rdblue commented on PR #6274: URL: https://github.com/apache/iceberg/pull/6274#issuecomment-1356905240 Thanks, @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [iceberg] rdblue merged pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-18 Thread GitBox
rdblue merged PR #6274: URL: https://github.com/apache/iceberg/pull/6274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-18 Thread GitBox
rdblue commented on PR #6335: URL: https://github.com/apache/iceberg/pull/6335#issuecomment-1356905016 This seems like a reasonable thing to add to me. I'm actually more concerned about the use case that caused this. @ConeyLiu, what was the case where you were creating huge manifests?

[GitHub] [iceberg] rdblue commented on a diff in pull request #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-18 Thread GitBox
rdblue commented on code in PR #6335: URL: https://github.com/apache/iceberg/pull/6335#discussion_r1051687190 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -896,33 +903,27 @@ private Iterable prepareNewManifests() { manifest -> GenericMan

[GitHub] [iceberg] srilman commented on issue #3221: [Python] support iceberg jdbc catalog in python library

2022-12-18 Thread GitBox
srilman commented on issue #3221: URL: https://github.com/apache/iceberg/issues/3221#issuecomment-1356904359 Will this catalog be based off the JDBC interface like the Java implementation or will it use Python's DB API (https://peps.python.org/pep-0249, many DB libraries follow this API)? I

[GitHub] [iceberg] rdblue commented on pull request #6297: Python: Bump pre-commit versions

2022-12-18 Thread GitBox
rdblue commented on PR #6297: URL: https://github.com/apache/iceberg/pull/6297#issuecomment-1356904257 @Fokko can you rebase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6358: AWS: Print logs whether Glue optimistic locking is used or not

2022-12-18 Thread GitBox
rdblue commented on code in PR #6358: URL: https://github.com/apache/iceberg/pull/6358#discussion_r1051686720 ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java: ## @@ -315,7 +315,12 @@ void persistGlueTable(Table glueTable, Map parameters, TableMeta

[GitHub] [iceberg] rdblue commented on pull request #6419: Doc:Example of correcting the document add/drop partition truncate

2022-12-18 Thread GitBox
rdblue commented on PR #6419: URL: https://github.com/apache/iceberg/pull/6419#issuecomment-1356903901 Thanks, @jiamin13579! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] rdblue merged pull request #6419: Doc:Example of correcting the document add/drop partition truncate

2022-12-18 Thread GitBox
rdblue merged PR #6419: URL: https://github.com/apache/iceberg/pull/6419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6419: Doc:Example of correcting the document add/drop partition truncate

2022-12-18 Thread GitBox
rdblue commented on PR #6419: URL: https://github.com/apache/iceberg/pull/6419#issuecomment-1356903838 @szehon-ho, you're right that Iceberg accepts both. But in Spark, the correct one is to put the width first because that's how Spark validates the function call. -- This is an automated

[GitHub] [iceberg] rdblue commented on pull request #6369: Increase Partition Start Id to 10000

2022-12-18 Thread GitBox
rdblue commented on PR #6369: URL: https://github.com/apache/iceberg/pull/6369#issuecomment-1356903474 Any update on this? Should we keep it open or are we pursuing other solutions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [iceberg] rdblue commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-18 Thread GitBox
rdblue commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1356903295 @cccs-eric, looks like this was out of date. I tried to fix conflicts, but it is failing tests. Can you fix and then we'll merge? -- This is an automated message from the Apache Git Serv

[GitHub] [iceberg] rdblue commented on a diff in pull request #6379: Docs: Update Iceberg Hive documentation - 1.0.x (#6337)

2022-12-18 Thread GitBox
rdblue commented on code in PR #6379: URL: https://github.com/apache/iceberg/pull/6379#discussion_r1051685991 ## docs/hive.md: ## @@ -509,7 +534,15 @@ SELECT * FROM table_a FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57'; SELECT * FROM table_a FOR SYSTEM_VERSION AS OF 1234567; ``

[GitHub] [iceberg] rdblue commented on a diff in pull request #6379: Docs: Update Iceberg Hive documentation - 1.0.x (#6337)

2022-12-18 Thread GitBox
rdblue commented on code in PR #6379: URL: https://github.com/apache/iceberg/pull/6379#discussion_r1051685890 ## docs/hive.md: ## @@ -433,6 +449,15 @@ Tables can be dropped using the `DROP TABLE` command: DROP TABLE [IF EXISTS] table_name [PURGE]; ``` +### METADATA LOCATION

[GitHub] [iceberg] rdblue commented on a diff in pull request #6379: Docs: Update Iceberg Hive documentation - 1.0.x (#6337)

2022-12-18 Thread GitBox
rdblue commented on code in PR #6379: URL: https://github.com/apache/iceberg/pull/6379#discussion_r1051685815 ## docs/hive.md: ## @@ -244,7 +254,7 @@ The result is: | j | IDENTITY | NULL You can create Iceberg partitions using the follo

[GitHub] [iceberg] rdblue commented on a diff in pull request #6379: Docs: Update Iceberg Hive documentation - 1.0.x (#6337)

2022-12-18 Thread GitBox
rdblue commented on code in PR #6379: URL: https://github.com/apache/iceberg/pull/6379#discussion_r1051685727 ## docs/hive.md: ## @@ -38,6 +38,16 @@ Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following feat DML operations work only with MapReduce executi

[GitHub] [iceberg] rdblue commented on a diff in pull request #6401: Flink: Change to oldestAncestorAfter

2022-12-18 Thread GitBox
rdblue commented on code in PR #6401: URL: https://github.com/apache/iceberg/pull/6401#discussion_r1051685128 ## flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/ContinuousSplitPlannerImpl.java: ## @@ -213,17 +213,12 @@ static Optional startSnapshot(Tab

[GitHub] [iceberg] rdblue commented on pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-18 Thread GitBox
rdblue commented on PR #6404: URL: https://github.com/apache/iceberg/pull/6404#issuecomment-1356901600 Thanks, @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [iceberg] rdblue merged pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-18 Thread GitBox
rdblue merged PR #6404: URL: https://github.com/apache/iceberg/pull/6404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2022-12-18 Thread GitBox
rdblue commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1051684443 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -144,7 +146,18 @@ public static PositionDeleteIndex toPositionIndex( deletes ->

[GitHub] [iceberg] rdblue commented on a diff in pull request #6417: Reuse existing parquet reader in ReadConf (6416)

2022-12-18 Thread GitBox
rdblue commented on code in PR #6417: URL: https://github.com/apache/iceberg/pull/6417#discussion_r1051684313 ## parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java: ## @@ -185,21 +184,16 @@ private Map generateOffsetToStartPos(Schema schema) { return null;

[GitHub] [iceberg] rdblue commented on pull request #6426: Flink: add fixed field type for DataGenerators test util

2022-12-18 Thread GitBox
rdblue commented on PR #6426: URL: https://github.com/apache/iceberg/pull/6426#issuecomment-1356899972 Looks reasonable to me, but I don't have context on the rest of the discussion. Feel free to merge when you both are happy with this, @pvary and @stevenzwu. -- This is an automated mess

[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-18 Thread GitBox
rdblue commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683043 ## spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -898,7 +898,7 @@ public void testSnapshotsTable() {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-18 Thread GitBox
rdblue commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683205 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() { Map summary = snapshot.summary(

[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-18 Thread GitBox
rdblue commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683043 ## spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -898,7 +898,7 @@ public void testSnapshotsTable() {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051682764 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableOperations.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051682638 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeTableOperations.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051682458 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051682372 ## versions.props: ## @@ -28,6 +28,8 @@ org.scala-lang.modules:scala-collection-compat_2.12 = 2.6.0 org.scala-lang.modules:scala-collection-compat_2.13 = 2.6.0 com.emc

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051682219 ## spark/v3.1/build.gradle: ## @@ -213,6 +213,9 @@ project(':iceberg-spark:iceberg-spark-runtime-3.1_2.12') { implementation(project(':iceberg-nessie')) { ex

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681975 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeClient.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681930 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeClient.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681776 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681587 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681423 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681274 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051681027 ## snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeIdentifier.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] rbalamohan commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2022-12-18 Thread GitBox
rbalamohan commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1051680991 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -144,7 +146,18 @@ public static PositionDeleteIndex toPositionIndex( deletes ->

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051680729 ## snowflake/src/test/java/org/apache/iceberg/snowflake/InMemoryFileIO.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051680606 ## snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeTableMetadata.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051680116 ## snowflake/src/main/java/org/apache/iceberg/snowflake/NamespaceHelpers.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051679882 ## snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeIdentifier.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051679710 ## snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeIdentifier.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051679653 ## snowflake/src/main/java/org/apache/iceberg/snowflake/entities/SnowflakeIdentifier.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051679094 ## build.gradle: ## @@ -696,6 +696,26 @@ project(':iceberg-dell') { } } +project(':iceberg-snowflake') { + test { +useJUnitPlatform() + } + + dependencies {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-18 Thread GitBox
rdblue commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1051678623 ## snowflake/src/main/java/org/apache/iceberg/snowflake/SnowflakeCatalog.java: ## @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] rdblue commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
rdblue commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r1051678197 ## parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java: ## @@ -360,6 +362,21 @@ public void testNotNaNs() { Assert.assertTrue("Should

[GitHub] [iceberg] srilman commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-18 Thread GitBox
srilman commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1356891779 @Fokko Didn't realize that there was a Docker image for REST catalog! We use a very similar Docker Compose setup to test for Hive catalog support, which is great for testing catalog

[GitHub] [iceberg] rdblue commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
rdblue commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r1051678076 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java: ## @@ -165,20 +165,41 @@ public Boolean isNaN(BoundReference ref) { }

[GitHub] [iceberg] rdblue commented on a diff in pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-18 Thread GitBox
rdblue commented on code in PR #6431: URL: https://github.com/apache/iceberg/pull/6431#discussion_r1051678034 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java: ## @@ -165,20 +165,41 @@ public Boolean isNaN(BoundReference ref) { }

[GitHub] [iceberg] rdblue commented on pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2022-12-18 Thread GitBox
rdblue commented on PR #6432: URL: https://github.com/apache/iceberg/pull/6432#issuecomment-1356891084 I think this looks good. I like how small the change is. Do you think it would be easy to also add a config flag to enable/disable this? I think it technically violates Spark's threading m

[GitHub] [iceberg] rdblue commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2022-12-18 Thread GitBox
rdblue commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1051677178 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -144,7 +146,18 @@ public static PositionDeleteIndex toPositionIndex( deletes ->

[GitHub] [iceberg] rdblue merged pull request #6438: Python: Reduce the use of mock objects

2022-12-18 Thread GitBox
rdblue merged PR #6438: URL: https://github.com/apache/iceberg/pull/6438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6433: Docs: README

2022-12-18 Thread GitBox
rdblue commented on code in PR #6433: URL: https://github.com/apache/iceberg/pull/6433#discussion_r1051676679 ## README.md: ## @@ -34,7 +34,7 @@ Iceberg is under active development at the Apache Software Foundation. The core Java library that tracks table snapshots and metad

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-18 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1051676372 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +457,103 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-18 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1051676210 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +459,120 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

  1   2   >