[GitHub] [iceberg] Fokko opened a new pull request, #6484: Python: Fix PyArrow import

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6484: URL: https://github.com/apache/iceberg/pull/6484 The import should be inline in the function to avoid pulling in PyArrow when we don't need it. Tested this in a fresh docker container: ``` root@1252c09f932c:/vo# pip3 install -e ".[s3fs

[GitHub] [iceberg] Fokko opened a new pull request, #6483: Python: Bump version to 0.2.1

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6483: URL: https://github.com/apache/iceberg/pull/6483 Prepare for a 0.2.1 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [iceberg] zhangbutao opened a new pull request, #6482: API: Fix inconsistent TimeTransform Type

2022-12-21 Thread GitBox
zhangbutao opened a new pull request, #6482: URL: https://github.com/apache/iceberg/pull/6482 After https://github.com/apache/iceberg/pull/5601 and https://github.com/apache/iceberg/pull/6220, there is inconsistent `TimeTransform Type` in some codes. This causes an exception when remove and

[GitHub] [iceberg] LionTao opened a new issue, #6481: Support for predicate pushdown on s3

2022-12-21 Thread GitBox
LionTao opened a new issue, #6481: URL: https://github.com/apache/iceberg/issues/6481 ### Query engine None, I'm using Java API ### Question Currently, when reading files, iceberg utilizes vectorized reading to read files. But when a filter is applied, we can use the

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
stevenzwu commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1055124977 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleRecordWrapper.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
stevenzwu commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1055124544 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055109197 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055109197 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055105144 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055104528 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055103489 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -161,7 +161,7 @@ private static Metrics getAvroMetrics(Path path, Configuration conf) {

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055103248 ## api/src/main/java/org/apache/iceberg/actions/MigrateDeltaLakeTable.java: ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055103002 ## api/src/main/java/org/apache/iceberg/actions/MigrateDeltaLakeTable.java: ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055101395 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055101284 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055100910 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099655 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099655 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099031 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055098764 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034233 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] puchengy closed pull request #4397: Add support for listing partition recursively during the table migration

2022-12-21 Thread GitBox
puchengy closed pull request #4397: Add support for listing partition recursively during the table migration URL: https://github.com/apache/iceberg/pull/4397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [iceberg] ajantha-bhat closed pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2

2022-12-21 Thread GitBox
ajantha-bhat closed pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2 URL: https://github.com/apache/iceberg/pull/6466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] ajantha-bhat commented on pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6466: URL: https://github.com/apache/iceberg/pull/6466#issuecomment-1362410229 Implicit namespaces cannot be listed by getEntries in API v2. Also, API v2 is in the beta stage. Hence, will do this adoption in the next version bump or later. -- This is an aut

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar commented on code in PR #6480: URL: https://github.com/apache/iceberg/pull/6480#discussion_r1055070683 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -207,7 +207,14 @@ private List planFiles(StreamingOffset s

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar commented on code in PR #6480: URL: https://github.com/apache/iceberg/pull/6480#discussion_r1055070683 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -207,7 +207,14 @@ private List planFiles(StreamingOffset s

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar opened a new pull request, #6480: URL: https://github.com/apache/iceberg/pull/6480 Fixing error handling for https://github.com/apache/iceberg/issues/6388. Based on the stack trace the following sequence of events seems plausible. 1.) [The snapshot ID for curre

[GitHub] [iceberg] amogh-jahagirdar commented on issue #6388: Spark Structured Streaming - Cannot invoke "org.apache.iceberg.Snapshot.operation()" because "snapshot" is null

2022-12-21 Thread GitBox
amogh-jahagirdar commented on issue #6388: URL: https://github.com/apache/iceberg/issues/6388#issuecomment-1362366455 Hey Sjors, Is this happening while snapshot expiration is being performed on the table you're reading from? From my reading of the code this error will happen like th

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1055046230 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier i

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1055046158 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier i

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055037853 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034629 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034233 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055032941 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055032692 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055031822 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -161,7 +161,7 @@ private static Metrics getAvroMetrics(Path path, Configuration conf) {

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055031472 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -35,6 +35,11 @@ default MigrateTable migrateTable(String tableIdent) { this.getC

[GitHub] [iceberg] hililiwei commented on pull request #6440: Flink: Support Look-up Function

2022-12-21 Thread GitBox
hililiwei commented on PR #6440: URL: https://github.com/apache/iceberg/pull/6440#issuecomment-1362302792 > is it actually a good idea to use iceberg table as a LOOK UP JOIN candidate? will it be fast enough? It may work fine on small data sets. But without an index or other means to

[GitHub] [iceberg] islamismailov commented on pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on PR #6353: URL: https://github.com/apache/iceberg/pull/6353#issuecomment-1362265920 please see my example in the sub-thread above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] rdblue commented on a diff in pull request #6371: Spark 3.3: Support storage-partitioned joins

2022-12-21 Thread GitBox
rdblue commented on code in PR #6371: URL: https://github.com/apache/iceberg/pull/6371#discussion_r1054947098 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -42,4 +42,9 @@ private SparkSQLProperties() {} // Controls whether to check

[GitHub] [iceberg] rdblue merged pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
rdblue merged PR #6478: URL: https://github.com/apache/iceberg/pull/6478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue closed issue #6469: Python: Issue with date partition

2022-12-21 Thread GitBox
rdblue closed issue #6469: Python: Issue with date partition URL: https://github.com/apache/iceberg/issues/6469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] Fokko commented on pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
Fokko commented on PR #6478: URL: https://github.com/apache/iceberg/pull/6478#issuecomment-1362225099 @rdblue Added a test when parsing the ManifestEntry. I can add an end to end test in https://github.com/apache/iceberg/pull/6398 -- This is an automated message from the Apache Git Servic

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1054932981 ## core/src/main/java/org/apache/iceberg/DeltaLakeDataTypeVisitor.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054889320 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054902846 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -157,6 +166,68 @@ public void testPartitionsTableScanWithAddPartiti

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054901416 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Functionhttps://github.com/

[GitHub] [iceberg] flyrain opened a new pull request, #6479: Java doc fix on method SnapshotUtil::oldestAncestorAfter

2022-12-21 Thread GitBox
flyrain opened a new pull request, #6479: URL: https://github.com/apache/iceberg/pull/6479 cc @rdblue @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054900975 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054900831 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -157,6 +166,68 @@ public void testPartitionsTableScanWithAddPartiti

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054899859 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [iceberg] rdblue merged pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
rdblue merged PR #6468: URL: https://github.com/apache/iceberg/pull/6468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
rdblue commented on PR #6478: URL: https://github.com/apache/iceberg/pull/6478#issuecomment-1362180551 This looks correct to me. Can you add a test that hits this in job planning? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [iceberg] rdblue merged pull request #6471: Dynamo/Jdbc/Ecs/Nessie: Expose catalog properties

2022-12-21 Thread GitBox
rdblue merged PR #6471: URL: https://github.com/apache/iceberg/pull/6471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054889320 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] Fokko opened a new pull request, #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6478: URL: https://github.com/apache/iceberg/pull/6478 The partitions struct will read as a date if there is a date field in there. This way we'll just read the physical type. closes #6469 -- This is an automated message from the Apache Git Service

[GitHub] [iceberg] Fokko commented on issue #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
Fokko commented on issue #6475: URL: https://github.com/apache/iceberg/issues/6475#issuecomment-1362157332 @rdblue Thanks for letting me know, I was hoping that this was missed somewhere. Let's reserve this issue for looking why it is blocked. -- This is an automated message from the Apac

[GitHub] [iceberg] krvikash opened a new pull request, #6477: Hive-metastore: Merge identical catch branch

2022-12-21 Thread GitBox
krvikash opened a new pull request, #6477: URL: https://github.com/apache/iceberg/pull/6477 Merge identical catch branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [iceberg] krvikash opened a new pull request, #6476: API, Core, Flink, Parquet, Spark: Use enhanced for loop

2022-12-21 Thread GitBox
krvikash opened a new pull request, #6476: URL: https://github.com/apache/iceberg/pull/6476 API, Core, Flink, Parquet, Spark: Use enhanced for loop -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] islamismailov commented on a diff in pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on code in PR #6353: URL: https://github.com/apache/iceberg/pull/6353#discussion_r1054828310 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ## @@ -79,9 +83,11 @@ private ReadConf init() { nameMapping,

[GitHub] [iceberg] rdblue commented on issue #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
rdblue commented on issue #6475: URL: https://github.com/apache/iceberg/issues/6475#issuecomment-1362038098 @Fokko, I already tested open_input_stream and it didn't perform better than Python buffering and open_input_file. In my testing, opening the file blocked for about 500ms, which seems

[GitHub] [iceberg] islamismailov commented on a diff in pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on code in PR #6353: URL: https://github.com/apache/iceberg/pull/6353#discussion_r1054796838 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ## @@ -79,9 +83,11 @@ private ReadConf init() { nameMapping,

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1054732503 ## core/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054754294 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-21 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1054734934 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] jackye1995 commented on pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on PR #6449: URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1361923592 > I love seeing this functionality but I'm not sure it should be a first class citizen in the repo. +1, what about having a `iceberg-delta-lake` module for this feature? That c

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054728635 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054723826 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054721749 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] Fokko opened a new issue, #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
Fokko opened a new issue, #6475: URL: https://github.com/apache/iceberg/issues/6475 ### Apache Iceberg version None ### Query engine None ### Please describe the bug ๐Ÿž I noticed that s3fs is much faster than PyArrow. @rdblue also noticed this and added a bu

[GitHub] [iceberg] bryanck commented on a diff in pull request #6169: AWS,Core: Add S3 REST Signer client + REST Spec

2022-12-21 Thread GitBox
bryanck commented on code in PR #6169: URL: https://github.com/apache/iceberg/pull/6169#discussion_r1054650368 ## aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054640659 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] haydenflinner commented on pull request #1373: API: Implement SortOrder

2022-12-21 Thread GitBox
haydenflinner commented on PR #1373: URL: https://github.com/apache/iceberg/pull/1373#issuecomment-1361689734 Is this note in the documentation still accurate? >Explicit sort is necessary because Spark doesnโ€™t allow Iceberg to request a sort before writing as of Spark 3.0. [SPARK-23889](

[GitHub] [iceberg] nastra commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054629712 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier id

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054624264 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -51,7 +53,12 @@ public Table loadTable(TableIdentifier identifier) { } } els

[GitHub] [iceberg] rdblue commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
rdblue commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054610098 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] ajantha-bhat commented on pull request #6473: Nessie: Bump Nessie to 0.46.0

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6473: URL: https://github.com/apache/iceberg/pull/6473#issuecomment-1361632908 cc: @dimas-b -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6473: Nessie: Bump Nessie to 0.46.0

2022-12-21 Thread GitBox
ajantha-bhat opened a new pull request, #6473: URL: https://github.com/apache/iceberg/pull/6473 Release Notes: https://github.com/projectnessie/nessie/releases/tag/nessie-0.46.0 Note: Nessie also released API v2 support in this version. Iceberg side we need to add a code to a

[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

2022-12-21 Thread GitBox
pvary commented on issue #6370: URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1361628513 Created the Hive jira: https://issues.apache.org/jira/browse/HIVE-26882 And the PR: https://github.com/apache/hive/pull/3888 -- This is an automated message from the Apache Git Ser

[GitHub] [iceberg] findepi commented on pull request #5268: API/Core: Initial Table Scan Reporting support

2022-12-21 Thread GitBox
findepi commented on PR #5268: URL: https://github.com/apache/iceberg/pull/5268#issuecomment-1361614786 Just FYI, this seems to have inflated logs quite noticeably (https://github.com/trinodb/trino/issues/15492). I wonder whether INFO-level logging should be the default behavior. Would

[GitHub] [iceberg] findepi opened a new pull request, #6472: Have single 'working' constructor in BaseTable

2022-12-21 Thread GitBox
findepi opened a new pull request, #6472: URL: https://github.com/apache/iceberg/pull/6472 Delegate from one constructor to the other. This emphasizes the relation between them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [iceberg] pvary commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
pvary commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1054570635 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] pvary commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
pvary commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1054537560 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] arunb2w commented on issue #6453: Iceberg delete-append causing snapshot error

2022-12-21 Thread GitBox
arunb2w commented on issue #6453: URL: https://github.com/apache/iceberg/issues/6453#issuecomment-1361473026 Thanks for the input. I also ended up doing more or less the same thing by materializing my inflated_df dataframe and using it after delete. -- This is an automated message from th

[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
nastra commented on code in PR #6468: URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944 ## python/pyiceberg/io/pyarrow.py: ## @@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType: return pa.string() def visit_uuid(self, _

[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
nastra commented on code in PR #6468: URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944 ## python/pyiceberg/io/pyarrow.py: ## @@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType: return pa.string() def visit_uuid(self, _

[GitHub] [iceberg] nastra commented on pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on PR #6410: URL: https://github.com/apache/iceberg/pull/6410#issuecomment-1361395261 @kmozaid while reviewing this PR I've noticed that not every catalog actually exposes its properties (including the JDBC catalog which I suggested above to use for testing). I've opened h

[GitHub] [iceberg] nastra commented on a diff in pull request #6471: Dynamo/Jdbc/Ecs/Nessie: Expose catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6471: URL: https://github.com/apache/iceberg/pull/6471#discussion_r1054457829 ## aws/src/main/java/org/apache/iceberg/aws/dynamodb/DynamoDbCatalog.java: ## @@ -686,4 +688,9 @@ private boolean updateProperties( return false; } } + +

[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

2022-12-21 Thread GitBox
peay commented on PR #6470: URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1361392095 Not yet very familiar with the codebase, but this looks great to me, thanks a lot @hililiwei! -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [iceberg] kmozaid commented on issue #6453: Iceberg delete-append causing snapshot error

2022-12-21 Thread GitBox
kmozaid commented on issue #6453: URL: https://github.com/apache/iceberg/issues/6453#issuecomment-1361381099 I think, you should create `inflated_df` as - ``` spark.read().format("iceberg") .option(SparkReadOptions.AS_OF_TIMESTAMP, "2022-12-19 06:13:02") .load("glue_dev.da

[GitHub] [iceberg] nastra commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054443705 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -51,7 +53,12 @@ public Table loadTable(TableIdentifier identifier) { } } else

[GitHub] [iceberg] hililiwei commented on issue #6464: Allow specifying data file format in RewriteDataFiles

2022-12-21 Thread GitBox
hililiwei commented on issue #6464: URL: https://github.com/apache/iceberg/issues/6464#issuecomment-1361355506 We happen to have an internal implementation. I have raised a PR #6470 . If you are interested in it, can you review it? Thx. -- This is an automated message from the Apache Git

[GitHub] [iceberg] hililiwei opened a new pull request, #6470: Spark: Allow specifying file format in RewriteDataFiles

2022-12-21 Thread GitBox
hililiwei opened a new pull request, #6470: URL: https://github.com/apache/iceberg/pull/6470 Close #6464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [iceberg] Fokko opened a new issue, #6469: Python: Issue with date partition

2022-12-21 Thread GitBox
Fokko opened a new issue, #6469: URL: https://github.com/apache/iceberg/issues/6469 ### Apache Iceberg version None ### Query engine None ### Please describe the bug ๐Ÿž The followig scan on the taxi dataset: ```python df = tbl.scan(row_filter=And(

[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1361292197 @rdblue : Thanks for the awesome review. I will keep those points in mind. I have addressed all the comments. Please take a look at it again. -- This is an automated messa

[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
SHuixo commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361267227 > i am using sql client so i go through flink and iceberg document and i can not found something like this sorry can you explain more i am not expert in flink is this configuration in

[GitHub] [iceberg] findepi commented on issue #6442: Extends Iceberg table stats API to allow publish data and stats atomically

2022-12-21 Thread GitBox
findepi commented on issue #6442: URL: https://github.com/apache/iceberg/issues/6442#issuecomment-1361245849 Good idea! cc @rdblue @ajantha-bhat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] hililiwei commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
hililiwei commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361230094 > is checkpoint enabled? Your current problem is that you have not found any data in the iceberg table? As mentioned above, please confirm whether checkpoint is enable

[GitHub] [iceberg] Fokko opened a new pull request, #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6468: URL: https://github.com/apache/iceberg/pull/6468 It causes to throw a cast exception: ``` --- ArrowNotImplementedError Traceback (most recent call last)

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-21 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1054260595 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1515,4 +1594,63 @@ private RemoveSnapshots removeSnapshots(Table table) { RemoveSnap

  1   2   >