[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-21 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1054082154 ## versions.props: ## @@ -28,6 +28,8 @@ org.scala-lang.modules:scala-collection-compat_2.12 = 2.6.0 org.scala-lang.modules:scala-collection-compat_2.13 = 2.6.0 com.

[GitHub] [iceberg] ggershinsky commented on a diff in pull request #3231: GCM encryption stream

2022-12-21 Thread GitBox
ggershinsky commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1054083301 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-21 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1054089032 ## spark/v3.1/build.gradle: ## @@ -213,6 +213,9 @@ project(':iceberg-spark:iceberg-spark-runtime-3.1_2.12') { implementation(project(':iceberg-nessie')) {

[GitHub] [iceberg] dennishuo commented on pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-21 Thread GitBox
dennishuo commented on PR #6428: URL: https://github.com/apache/iceberg/pull/6428#issuecomment-1360990536 Thanks @rdblue @danielcweeks and @nastra for the continued reviews! Should be ready for re-review now; with the latest refactor I've also confirmed that no additional dependencies end u

[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
MohamedAdelHsn commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361037210 this code is used CREATE TABLE flink_table_source ( id BIGINT ,data VARCHAR ) WITH ( 'connector' = 'kafka' ,'topic' = 'flink' ,'properties.bootstrap.se

[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
MohamedAdelHsn commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361037555 @SHuixo @luoyuxia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] pvary merged pull request #6452: Flink: Improve IcebergFilesCommitter logging

2022-12-21 Thread GitBox
pvary merged PR #6452: URL: https://github.com/apache/iceberg/pull/6452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] pvary commented on pull request #6452: Flink: Improve IcebergFilesCommitter logging

2022-12-21 Thread GitBox
pvary commented on PR #6452: URL: https://github.com/apache/iceberg/pull/6452#issuecomment-1361055791 Merged to master. Thanks for the review @rdblue and @stevenzwu! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
SHuixo commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361102904 How do you write the context configuration code for the flink execution environment? Like this: ``` final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExec

[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
MohamedAdelHsn commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361121107 i am using sql client so i go through flink and iceberg document and i can not found something like this sorry can you explain more i am not expert in flink -- This is an

[GitHub] [iceberg] sristiraj closed issue #6463: Iceberg delete operation failing in spark 3.3.0 using Spark SQL

2022-12-21 Thread GitBox
sristiraj closed issue #6463: Iceberg delete operation failing in spark 3.3.0 using Spark SQL URL: https://github.com/apache/iceberg/issues/6463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [iceberg] sristiraj commented on issue #6463: Iceberg delete operation failing in spark 3.3.0 using Spark SQL

2022-12-21 Thread GitBox
sristiraj commented on issue #6463: URL: https://github.com/apache/iceberg/issues/6463#issuecomment-1361134274 @nastra Tried again after clearing sbt cache and this time it worked with additional dependency of "org.scala-lang.modules" %% "scala-collection-compat" % "2.1.1" and scala

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-21 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1054227750 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1515,4 +1594,63 @@ private RemoveSnapshots removeSnapshots(Table table) { RemoveSnap

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-21 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1054260595 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1515,4 +1594,63 @@ private RemoveSnapshots removeSnapshots(Table table) { RemoveSnap

[GitHub] [iceberg] Fokko opened a new pull request, #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6468: URL: https://github.com/apache/iceberg/pull/6468 It causes to throw a cast exception: ``` --- ArrowNotImplementedError Traceback (most recent call last)

[GitHub] [iceberg] hililiwei commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
hililiwei commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361230094 > is checkpoint enabled? Your current problem is that you have not found any data in the iceberg table? As mentioned above, please confirm whether checkpoint is enable

[GitHub] [iceberg] findepi commented on issue #6442: Extends Iceberg table stats API to allow publish data and stats atomically

2022-12-21 Thread GitBox
findepi commented on issue #6442: URL: https://github.com/apache/iceberg/issues/6442#issuecomment-1361245849 Good idea! cc @rdblue @ajantha-bhat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

2022-12-21 Thread GitBox
SHuixo commented on issue #6465: URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361267227 > i am using sql client so i go through flink and iceberg document and i can not found something like this sorry can you explain more i am not expert in flink is this configuration in

[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1361292197 @rdblue : Thanks for the awesome review. I will keep those points in mind. I have addressed all the comments. Please take a look at it again. -- This is an automated messa

[GitHub] [iceberg] Fokko opened a new issue, #6469: Python: Issue with date partition

2022-12-21 Thread GitBox
Fokko opened a new issue, #6469: URL: https://github.com/apache/iceberg/issues/6469 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 The followig scan on the taxi dataset: ```python df = tbl.scan(row_filter=And(

[GitHub] [iceberg] hililiwei opened a new pull request, #6470: Spark: Allow specifying file format in RewriteDataFiles

2022-12-21 Thread GitBox
hililiwei opened a new pull request, #6470: URL: https://github.com/apache/iceberg/pull/6470 Close #6464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [iceberg] hililiwei commented on issue #6464: Allow specifying data file format in RewriteDataFiles

2022-12-21 Thread GitBox
hililiwei commented on issue #6464: URL: https://github.com/apache/iceberg/issues/6464#issuecomment-1361355506 We happen to have an internal implementation. I have raised a PR #6470 . If you are interested in it, can you review it? Thx. -- This is an automated message from the Apache Git

[GitHub] [iceberg] nastra commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054443705 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -51,7 +53,12 @@ public Table loadTable(TableIdentifier identifier) { } } else

[GitHub] [iceberg] kmozaid commented on issue #6453: Iceberg delete-append causing snapshot error

2022-12-21 Thread GitBox
kmozaid commented on issue #6453: URL: https://github.com/apache/iceberg/issues/6453#issuecomment-1361381099 I think, you should create `inflated_df` as - ``` spark.read().format("iceberg") .option(SparkReadOptions.AS_OF_TIMESTAMP, "2022-12-19 06:13:02") .load("glue_dev.da

[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

2022-12-21 Thread GitBox
peay commented on PR #6470: URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1361392095 Not yet very familiar with the codebase, but this looks great to me, thanks a lot @hililiwei! -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [iceberg] nastra commented on a diff in pull request #6471: Dynamo/Jdbc/Ecs/Nessie: Expose catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6471: URL: https://github.com/apache/iceberg/pull/6471#discussion_r1054457829 ## aws/src/main/java/org/apache/iceberg/aws/dynamodb/DynamoDbCatalog.java: ## @@ -686,4 +688,9 @@ private boolean updateProperties( return false; } } + +

[GitHub] [iceberg] nastra commented on pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on PR #6410: URL: https://github.com/apache/iceberg/pull/6410#issuecomment-1361395261 @kmozaid while reviewing this PR I've noticed that not every catalog actually exposes its properties (including the JDBC catalog which I suggested above to use for testing). I've opened h

[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
nastra commented on code in PR #6468: URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944 ## python/pyiceberg/io/pyarrow.py: ## @@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType: return pa.string() def visit_uuid(self, _

[GitHub] [iceberg] nastra commented on a diff in pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
nastra commented on code in PR #6468: URL: https://github.com/apache/iceberg/pull/6468#discussion_r1054467944 ## python/pyiceberg/io/pyarrow.py: ## @@ -366,7 +352,7 @@ def visit_string(self, _: StringType) -> pa.DataType: return pa.string() def visit_uuid(self, _

[GitHub] [iceberg] arunb2w commented on issue #6453: Iceberg delete-append causing snapshot error

2022-12-21 Thread GitBox
arunb2w commented on issue #6453: URL: https://github.com/apache/iceberg/issues/6453#issuecomment-1361473026 Thanks for the input. I also ended up doing more or less the same thing by materializing my inflated_df dataframe and using it after delete. -- This is an automated message from th

[GitHub] [iceberg] pvary commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
pvary commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1054537560 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] pvary commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-21 Thread GitBox
pvary commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1054570635 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] findepi opened a new pull request, #6472: Have single 'working' constructor in BaseTable

2022-12-21 Thread GitBox
findepi opened a new pull request, #6472: URL: https://github.com/apache/iceberg/pull/6472 Delegate from one constructor to the other. This emphasizes the relation between them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [iceberg] findepi commented on pull request #5268: API/Core: Initial Table Scan Reporting support

2022-12-21 Thread GitBox
findepi commented on PR #5268: URL: https://github.com/apache/iceberg/pull/5268#issuecomment-1361614786 Just FYI, this seems to have inflated logs quite noticeably (https://github.com/trinodb/trino/issues/15492). I wonder whether INFO-level logging should be the default behavior. Would

[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

2022-12-21 Thread GitBox
pvary commented on issue #6370: URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1361628513 Created the Hive jira: https://issues.apache.org/jira/browse/HIVE-26882 And the PR: https://github.com/apache/hive/pull/3888 -- This is an automated message from the Apache Git Ser

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6473: Nessie: Bump Nessie to 0.46.0

2022-12-21 Thread GitBox
ajantha-bhat opened a new pull request, #6473: URL: https://github.com/apache/iceberg/pull/6473 Release Notes: https://github.com/projectnessie/nessie/releases/tag/nessie-0.46.0 Note: Nessie also released API v2 support in this version. Iceberg side we need to add a code to a

[GitHub] [iceberg] ajantha-bhat commented on pull request #6473: Nessie: Bump Nessie to 0.46.0

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6473: URL: https://github.com/apache/iceberg/pull/6473#issuecomment-1361632908 cc: @dimas-b -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [iceberg] rdblue commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
rdblue commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054610098 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054624264 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -51,7 +53,12 @@ public Table loadTable(TableIdentifier identifier) { } } els

[GitHub] [iceberg] nastra commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
nastra commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1054629712 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier id

[GitHub] [iceberg] haydenflinner commented on pull request #1373: API: Implement SortOrder

2022-12-21 Thread GitBox
haydenflinner commented on PR #1373: URL: https://github.com/apache/iceberg/pull/1373#issuecomment-1361689734 Is this note in the documentation still accurate? >Explicit sort is necessary because Spark doesn’t allow Iceberg to request a sort before writing as of Spark 3.0. [SPARK-23889](

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054640659 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] bryanck commented on a diff in pull request #6169: AWS,Core: Add S3 REST Signer client + REST Spec

2022-12-21 Thread GitBox
bryanck commented on code in PR #6169: URL: https://github.com/apache/iceberg/pull/6169#discussion_r1054650368 ## aws/src/main/java/org/apache/iceberg/aws/s3/signer/S3V4RestSignerClient.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] Fokko opened a new issue, #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
Fokko opened a new issue, #6475: URL: https://github.com/apache/iceberg/issues/6475 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 I noticed that s3fs is much faster than PyArrow. @rdblue also noticed this and added a bu

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054721749 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054723826 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054728635 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] jackye1995 commented on pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on PR #6449: URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1361923592 > I love seeing this functionality but I'm not sure it should be a first class citizen in the repo. +1, what about having a `iceberg-delta-lake` module for this feature? That c

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-21 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1054734934 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054754294 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1054732503 ## core/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [iceberg] islamismailov commented on a diff in pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on code in PR #6353: URL: https://github.com/apache/iceberg/pull/6353#discussion_r1054796838 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ## @@ -79,9 +83,11 @@ private ReadConf init() { nameMapping,

[GitHub] [iceberg] rdblue commented on issue #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
rdblue commented on issue #6475: URL: https://github.com/apache/iceberg/issues/6475#issuecomment-1362038098 @Fokko, I already tested open_input_stream and it didn't perform better than Python buffering and open_input_file. In my testing, opening the file blocked for about 500ms, which seems

[GitHub] [iceberg] islamismailov commented on a diff in pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on code in PR #6353: URL: https://github.com/apache/iceberg/pull/6353#discussion_r1054828310 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ## @@ -79,9 +83,11 @@ private ReadConf init() { nameMapping,

[GitHub] [iceberg] krvikash opened a new pull request, #6476: API, Core, Flink, Parquet, Spark: Use enhanced for loop

2022-12-21 Thread GitBox
krvikash opened a new pull request, #6476: URL: https://github.com/apache/iceberg/pull/6476 API, Core, Flink, Parquet, Spark: Use enhanced for loop -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] krvikash opened a new pull request, #6477: Hive-metastore: Merge identical catch branch

2022-12-21 Thread GitBox
krvikash opened a new pull request, #6477: URL: https://github.com/apache/iceberg/pull/6477 Merge identical catch branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [iceberg] Fokko commented on issue #6475: Python: Improve PyArrow performance

2022-12-21 Thread GitBox
Fokko commented on issue #6475: URL: https://github.com/apache/iceberg/issues/6475#issuecomment-1362157332 @rdblue Thanks for letting me know, I was hoping that this was missed somewhere. Let's reserve this issue for looking why it is blocked. -- This is an automated message from the Apac

[GitHub] [iceberg] Fokko opened a new pull request, #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
Fokko opened a new pull request, #6478: URL: https://github.com/apache/iceberg/pull/6478 The partitions struct will read as a date if there is a date field in there. This way we'll just read the physical type. closes #6469 -- This is an automated message from the Apache Git Service

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054889320 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] rdblue merged pull request #6471: Dynamo/Jdbc/Ecs/Nessie: Expose catalog properties

2022-12-21 Thread GitBox
rdblue merged PR #6471: URL: https://github.com/apache/iceberg/pull/6471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
rdblue commented on PR #6478: URL: https://github.com/apache/iceberg/pull/6478#issuecomment-1362180551 This looks correct to me. Can you add a test that hits this in job planning? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [iceberg] rdblue merged pull request #6468: Python: Parse UUID as binary in PyArrow

2022-12-21 Thread GitBox
rdblue merged PR #6468: URL: https://github.com/apache/iceberg/pull/6468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054899859 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054900831 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -157,6 +166,68 @@ public void testPartitionsTableScanWithAddPartiti

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054900975 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [iceberg] flyrain opened a new pull request, #6479: Java doc fix on method SnapshotUtil::oldestAncestorAfter

2022-12-21 Thread GitBox
flyrain opened a new pull request, #6479: URL: https://github.com/apache/iceberg/pull/6479 cc @rdblue @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Spark 3.3: Time range query of changelog tables

2022-12-21 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1054901416 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Functionhttps://github.com/

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054902846 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -157,6 +166,68 @@ public void testPartitionsTableScanWithAddPartiti

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-21 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1054889320 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,372 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1054932981 ## core/src/main/java/org/apache/iceberg/DeltaLakeDataTypeVisitor.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] Fokko commented on pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
Fokko commented on PR #6478: URL: https://github.com/apache/iceberg/pull/6478#issuecomment-1362225099 @rdblue Added a test when parsing the ManifestEntry. I can add an end to end test in https://github.com/apache/iceberg/pull/6398 -- This is an automated message from the Apache Git Servic

[GitHub] [iceberg] rdblue closed issue #6469: Python: Issue with date partition

2022-12-21 Thread GitBox
rdblue closed issue #6469: Python: Issue with date partition URL: https://github.com/apache/iceberg/issues/6469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] rdblue merged pull request #6478: Python: Read a date as an int

2022-12-21 Thread GitBox
rdblue merged PR #6478: URL: https://github.com/apache/iceberg/pull/6478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6371: Spark 3.3: Support storage-partitioned joins

2022-12-21 Thread GitBox
rdblue commented on code in PR #6371: URL: https://github.com/apache/iceberg/pull/6371#discussion_r1054947098 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -42,4 +42,9 @@ private SparkSQLProperties() {} // Controls whether to check

[GitHub] [iceberg] islamismailov commented on pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-21 Thread GitBox
islamismailov commented on PR #6353: URL: https://github.com/apache/iceberg/pull/6353#issuecomment-1362265920 please see my example in the sub-thread above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] hililiwei commented on pull request #6440: Flink: Support Look-up Function

2022-12-21 Thread GitBox
hililiwei commented on PR #6440: URL: https://github.com/apache/iceberg/pull/6440#issuecomment-1362302792 > is it actually a good idea to use iceberg table as a LOOK UP JOIN candidate? will it be fast enough? It may work fine on small data sets. But without an index or other means to

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055031472 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -35,6 +35,11 @@ default MigrateTable migrateTable(String tableIdent) { this.getC

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055031822 ## data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java: ## @@ -161,7 +161,7 @@ private static Metrics getAvroMetrics(Path path, Configuration conf) {

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055032692 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055032941 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034233 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034629 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055037853 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1055046158 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier i

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-21 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1055046230 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -301,4 +305,15 @@ protected static String fullTableName(String catalogName, TableIdentifier i

[GitHub] [iceberg] amogh-jahagirdar commented on issue #6388: Spark Structured Streaming - Cannot invoke "org.apache.iceberg.Snapshot.operation()" because "snapshot" is null

2022-12-21 Thread GitBox
amogh-jahagirdar commented on issue #6388: URL: https://github.com/apache/iceberg/issues/6388#issuecomment-1362366455 Hey Sjors, Is this happening while snapshot expiration is being performed on the table you're reading from? From my reading of the code this error will happen like th

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar opened a new pull request, #6480: URL: https://github.com/apache/iceberg/pull/6480 Fixing error handling for https://github.com/apache/iceberg/issues/6388. Based on the stack trace the following sequence of events seems plausible. 1.) [The snapshot ID for curre

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar commented on code in PR #6480: URL: https://github.com/apache/iceberg/pull/6480#discussion_r1055070683 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -207,7 +207,14 @@ private List planFiles(StreamingOffset s

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found

2022-12-21 Thread GitBox
amogh-jahagirdar commented on code in PR #6480: URL: https://github.com/apache/iceberg/pull/6480#discussion_r1055070683 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -207,7 +207,14 @@ private List planFiles(StreamingOffset s

[GitHub] [iceberg] ajantha-bhat closed pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2

2022-12-21 Thread GitBox
ajantha-bhat closed pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2 URL: https://github.com/apache/iceberg/pull/6466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] ajantha-bhat commented on pull request #6466: Nessie: Bump Nessie to 0.46.0 and adopt to APIv2

2022-12-21 Thread GitBox
ajantha-bhat commented on PR #6466: URL: https://github.com/apache/iceberg/pull/6466#issuecomment-1362410229 Implicit namespaces cannot be listed by getEntries in API v2. Also, API v2 is in the beta stage. Hence, will do this adoption in the next version bump or later. -- This is an aut

[GitHub] [iceberg] puchengy closed pull request #4397: Add support for listing partition recursively during the table migration

2022-12-21 Thread GitBox
puchengy closed pull request #4397: Add support for listing partition recursively during the table migration URL: https://github.com/apache/iceberg/pull/4397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055034233 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055098764 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099031 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099655 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055099655 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055100910 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055101284 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Core, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-21 Thread GitBox
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1055101395 ## delta-lake/src/main/java/org/apache/iceberg/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

  1   2   >