[GitHub] [iceberg-docs] nastra commented on a diff in pull request #187: Update the how-to-release page with findings after being a release manager

2023-01-25 Thread via GitHub
nastra commented on code in PR #187: URL: https://github.com/apache/iceberg-docs/pull/187#discussion_r1086933656 ## landing-page/content/common/how-to-release.md: ## @@ -192,11 +212,15 @@ This release includes important changes that I should have summarized here, but Please

[GitHub] [iceberg] nastra commented on a diff in pull request #6666: Update REST Spec to include warehouse param

2023-01-25 Thread via GitHub
nastra commented on code in PR #: URL: https://github.com/apache/iceberg/pull/#discussion_r1086946040 ## open-api/rest-catalog-open-api.yaml: ## @@ -69,6 +69,13 @@ paths: - Configuration API summary: List all catalog configuration settings operatio

[GitHub] [iceberg-docs] ajantha-bhat commented on a diff in pull request #187: Update the how-to-release page with findings after being a release manager

2023-01-25 Thread via GitHub
ajantha-bhat commented on code in PR #187: URL: https://github.com/apache/iceberg-docs/pull/187#discussion_r1086946962 ## landing-page/content/common/how-to-release.md: ## @@ -192,11 +212,15 @@ This release includes important changes that I should have summarized here, but P

[GitHub] [iceberg] rdblue merged pull request #6666: Update REST Spec to include warehouse param

2023-01-25 Thread via GitHub
rdblue merged PR #: URL: https://github.com/apache/iceberg/pull/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6666: Update REST Spec to include warehouse param

2023-01-25 Thread via GitHub
rdblue commented on PR #: URL: https://github.com/apache/iceberg/pull/#issuecomment-1404008281 Thanks, @danielcweeks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] Fokko commented on issue #6414: Python: Support Nessie catalog

2023-01-25 Thread via GitHub
Fokko commented on issue #6414: URL: https://github.com/apache/iceberg/issues/6414#issuecomment-1404045827 @ajantha-bhat I think you're good to pick this up if you're still interested :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [iceberg] stevenzwu commented on pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
stevenzwu commented on PR #6660: URL: https://github.com/apache/iceberg/pull/6660#issuecomment-1404049007 > As I implemented this though started to think we may just want to have a direct branch method on the FlinkSink builder itself. That seems more intuitive from an API perspective and is

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
stevenzwu commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087025072 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -471,8 +476,9 @@ private static ListStateDescriptor> buildStateDes

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-25 Thread via GitHub
szehon-ho commented on code in PR #6664: URL: https://github.com/apache/iceberg/pull/6664#discussion_r1087040113 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -47,4 +48,10 @@ public CloseableIterable planTasks() { return TableScanUtil.planTasks(

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6648: Hive: Refactor commit lock mechanism from HiveTableOperations

2023-01-25 Thread via GitHub
szehon-ho commented on code in PR #6648: URL: https://github.com/apache/iceberg/pull/6648#discussion_r1087052395 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [iceberg] szehon-ho merged pull request #6591: Core: Avoid creating new metadata file when `registerTable` API is used

2023-01-25 Thread via GitHub
szehon-ho merged PR #6591: URL: https://github.com/apache/iceberg/pull/6591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho commented on pull request #6591: Core: Avoid creating new metadata file when `registerTable` API is used

2023-01-25 Thread via GitHub
szehon-ho commented on PR #6591: URL: https://github.com/apache/iceberg/pull/6591#issuecomment-1404106102 Merged, thanks @krvikash for change, @ajantha-bhat for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [iceberg] holdenk commented on issue #6652: Support for global shadow writes + logs

2023-01-25 Thread via GitHub
holdenk commented on issue #6652: URL: https://github.com/apache/iceberg/issues/6652#issuecomment-1404136088 Actually I think we could do this with a small extension to Icebergs WAP so that the logs contain the table names. -- This is an automated message from the Apache Git Service. To r

[GitHub] [iceberg] holdenk commented on issue #6652: Support for global shadow writes + logs

2023-01-25 Thread via GitHub
holdenk commented on issue #6652: URL: https://github.com/apache/iceberg/issues/6652#issuecomment-1404153325 hmm more poking I can probably do this with an Iceberg listener. Closing for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] holdenk closed issue #6652: Support for global shadow writes + logs

2023-01-25 Thread via GitHub
holdenk closed issue #6652: Support for global shadow writes + logs URL: https://github.com/apache/iceberg/issues/6652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [iceberg] RussellSpitzer commented on issue #6652: Support for global shadow writes + logs

2023-01-25 Thread via GitHub
RussellSpitzer commented on issue #6652: URL: https://github.com/apache/iceberg/issues/6652#issuecomment-1404156574 This is also kind of similar to Nessie's branching idea, I think Iceberg is also moving towards having some first level branching support. -- This is an automated message fr

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6554: Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518

2023-01-25 Thread via GitHub
RussellSpitzer commented on code in PR #6554: URL: https://github.com/apache/iceberg/pull/6554#discussion_r1087115907 ## data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java: ## @@ -212,74 +218,82 @@ public void createParquetInputFile(List records) thr

[GitHub] [iceberg] haydenflinner commented on pull request #5933: [1.0.x] Core: Increase inferred column metrics limit to 100.

2023-01-25 Thread via GitHub
haydenflinner commented on PR #5933: URL: https://github.com/apache/iceberg/pull/5933#issuecomment-1404162903 If I have a table with more than 100 columns, what are the downsides since I'm above this param value? I don't see it documented here -- https://iceberg.apache.org/docs/latest/confi

[GitHub] [iceberg] GabeChurch opened a new issue, #6667: Spark Hive Iceberg Table Locks -- Settings Unclear in Docs Overrides Not Working

2023-01-25 Thread via GitHub
GabeChurch opened a new issue, #6667: URL: https://github.com/apache/iceberg/issues/6667 ### Query engine Spark ### Question I have a situation where I need to make high(ish)-frequency writes to a single iceberg table in multiple Spark jobs, and multiple times per job --

[GitHub] [iceberg] abmo-x opened a new pull request, #6668: add max_concurrent_adds argument to add_files procedure

2023-01-25 Thread via GitHub
abmo-x opened a new pull request, #6668: URL: https://github.com/apache/iceberg/pull/6668 max_concurrent_adds argument allows users of add_files procedure to concurrently add data files. Today the parallelism defaults to 1 and there is no way to configure. -- This is an automated message

[GitHub] [iceberg] jackye1995 commented on issue #6625: Improve nullability check in Iceberg codebase

2023-01-25 Thread via GitHub
jackye1995 commented on issue #6625: URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1404235165 This sounds like a very cool plugin to add, any thoughts about that? @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [iceberg] Fokko commented on a diff in pull request #6566: Python: Add visitor to DNF expr into Dask format

2023-01-25 Thread via GitHub
Fokko commented on code in PR #6566: URL: https://github.com/apache/iceberg/pull/6566#discussion_r1087197108 ## python/pyiceberg/expressions/visitors.py: ## @@ -881,3 +881,82 @@ def rewrite_to_dnf(expr: BooleanExpression) -> Tuple[BooleanExpression, ...]: # (A AND NOT(B) A

[GitHub] [iceberg] Fokko commented on a diff in pull request #6566: Python: Add visitor to DNF expr into Dask format

2023-01-25 Thread via GitHub
Fokko commented on code in PR #6566: URL: https://github.com/apache/iceberg/pull/6566#discussion_r1087197108 ## python/pyiceberg/expressions/visitors.py: ## @@ -881,3 +881,82 @@ def rewrite_to_dnf(expr: BooleanExpression) -> Tuple[BooleanExpression, ...]: # (A AND NOT(B) A

[GitHub] [iceberg] Fokko commented on a diff in pull request #6566: Python: Add visitor to DNF expr into Dask format

2023-01-25 Thread via GitHub
Fokko commented on code in PR #6566: URL: https://github.com/apache/iceberg/pull/6566#discussion_r1087197108 ## python/pyiceberg/expressions/visitors.py: ## @@ -881,3 +881,82 @@ def rewrite_to_dnf(expr: BooleanExpression) -> Tuple[BooleanExpression, ...]: # (A AND NOT(B) A

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087201304 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java: ## @@ -471,8 +476,9 @@ private static ListStateDescriptor> buildS

[GitHub] [iceberg] RussellSpitzer commented on issue #6669: RewriteDataFiles maintenance action never converges

2023-01-25 Thread via GitHub
RussellSpitzer commented on issue #6669: URL: https://github.com/apache/iceberg/issues/6669#issuecomment-1404252649 Yep this basically expected. You can always set parameters such that all files are marked for rewriting. The defaults are all based around percentage sizes of the target file

[GitHub] [iceberg] Fokko commented on a diff in pull request #6566: Python: Add visitor to DNF expr into Dask format

2023-01-25 Thread via GitHub
Fokko commented on code in PR #6566: URL: https://github.com/apache/iceberg/pull/6566#discussion_r1087205606 ## python/pyiceberg/expressions/visitors.py: ## @@ -881,3 +881,82 @@ def rewrite_to_dnf(expr: BooleanExpression) -> Tuple[BooleanExpression, ...]: # (A AND NOT(B) A

[GitHub] [iceberg] Fokko commented on a diff in pull request #6644: Python: Add support for static table

2023-01-25 Thread via GitHub
Fokko commented on code in PR #6644: URL: https://github.com/apache/iceberg/pull/6644#discussion_r1087211802 ## python/pyiceberg/catalog/rest.py: ## @@ -175,11 +175,7 @@ class RestCatalog(Catalog): session: Session properties: Properties -def __init__( -s

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
jackye1995 commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087212028 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/SimpleDataUtil.java: ## @@ -284,10 +303,23 @@ public static void assertTableRecords(Table table, List ex

[GitHub] [iceberg] jackye1995 commented on pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
jackye1995 commented on PR #6660: URL: https://github.com/apache/iceberg/pull/6660#issuecomment-1404275883 > As I implemented this though started to think we may just want to have a direct branch method on the FlinkSink builder itself. That seems more intuitive from an API perspective and i

[GitHub] [iceberg] srilman commented on issue #6620: Python: More Flexible Dependency Requirements, especially for Optional Deps

2023-01-25 Thread via GitHub
srilman commented on issue #6620: URL: https://github.com/apache/iceberg/issues/6620#issuecomment-1404304169 @fokko Any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087244952 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/SimpleDataUtil.java: ## @@ -284,10 +303,23 @@ public static void assertTableRecords(Table table, L

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087244952 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/SimpleDataUtil.java: ## @@ -284,10 +303,23 @@ public static void assertTableRecords(Table table, L

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087244952 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/SimpleDataUtil.java: ## @@ -284,10 +303,23 @@ public static void assertTableRecords(Table table, L

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
jackye1995 commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087256344 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/SimpleDataUtil.java: ## @@ -284,10 +303,23 @@ public static void assertTableRecords(Table table, List ex

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-25 Thread via GitHub
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1087285016 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [iceberg] abmo-x closed pull request #6668: add max_concurrent_adds argument to add_files procedure

2023-01-25 Thread via GitHub
abmo-x closed pull request #6668: add max_concurrent_adds argument to add_files procedure URL: https://github.com/apache/iceberg/pull/6668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-25 Thread via GitHub
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1087287064 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on PR #6660: URL: https://github.com/apache/iceberg/pull/6660#issuecomment-1404381644 Thanks for the reviews @stevenzwu @jackye1995 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [iceberg] ajantha-bhat commented on issue #6414: Python: Support Nessie catalog

2023-01-25 Thread via GitHub
ajantha-bhat commented on issue #6414: URL: https://github.com/apache/iceberg/issues/6414#issuecomment-1404465465 > @ajantha-bhat I think you're good to pick this up if you're still interested :) Sure. But I am occupied with partition stats and some internal work. Can't promise this

[GitHub] [iceberg] szehon-ho opened a new issue, #6670: Random no-op for delete for sequence of similar numeric partition values

2023-01-25 Thread via GitHub
szehon-ho opened a new issue, #6670: URL: https://github.com/apache/iceberg/issues/6670 ### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 This is a serious random bug I debugged together with @dramaticlly,

[GitHub] [iceberg] szehon-ho commented on issue #6670: Random no-op for delete for sequence of similar numeric partition values

2023-01-25 Thread via GitHub
szehon-ho commented on issue #6670: URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1404485884 FYI @rdblue @RussellSpitzer @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] RussellSpitzer commented on issue #6670: Random no-op for delete for sequence of similar numeric partition values

2023-01-25 Thread via GitHub
RussellSpitzer commented on issue #6670: URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1404492150 I think this implies that the "equals" method for StructLikeWrapper is broken because we should only have a problem if both the hashcode and equals methods both return equalit

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

2023-01-25 Thread via GitHub
JonasJ-ap commented on code in PR #6571: URL: https://github.com/apache/iceberg/pull/6571#discussion_r1087385526 ## docs/java-api.md: ## @@ -147,6 +147,58 @@ t.newAppend().appendFile(data).commit(); t.commitTransaction(); ``` +### WriteData + +The java api can write data int

[GitHub] [iceberg] wypoon opened a new pull request, #6671: Spark 3.2: Automatically set Arrow properties for read performance

2023-01-25 Thread via GitHub
wypoon opened a new pull request, #6671: URL: https://github.com/apache/iceberg/pull/6671 Port of https://github.com/apache/iceberg/pull/6550 to Spark 3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] RussellSpitzer commented on issue #6670: Random no-op for delete for sequence of similar numeric partition values

2023-01-25 Thread via GitHub
RussellSpitzer commented on issue #6670: URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1404503530 I did a quick test using PartitionData ```java public void testBug() { Types.StructType STRUCT_TYPE = Types.StructType.of( T

[GitHub] [iceberg] szehon-ho commented on issue #6670: Random no-op for delete for sequence of similar numeric partition values

2023-01-25 Thread via GitHub
szehon-ho commented on issue #6670: URL: https://github.com/apache/iceberg/issues/6670#issuecomment-1404513790 @RussellSpitzer is right, forget that hashcode is not the only factor for get() in a map. let us look further into it. -- This is an automated message from the Apache Git Servi

[GitHub] [iceberg] anthonysgro commented on issue #6619: Disaster Recovery Options for AWS Athena/Iceberg Integration

2023-01-25 Thread via GitHub
anthonysgro commented on issue #6619: URL: https://github.com/apache/iceberg/issues/6619#issuecomment-1404528549 Yes. So here is specifically how it happens: Creating my table: I create my table through an Athena query ``` CREATE TABLE IF NOT EXISTS db.friends ( id stri

[GitHub] [iceberg-docs] sfc-gh-standure opened a new pull request, #196: Added new blog entry published from Snowflake Medium account - Update blogs.md

2023-01-25 Thread via GitHub
sfc-gh-standure opened a new pull request, #196: URL: https://github.com/apache/iceberg-docs/pull/196 New Blog on How Apache Iceberg enables ACID compliance for data lakes. Authored by Sumeet Tandure, published on Snowflake Medium accout. -- This is an automated message from the Apache Gi

[GitHub] [iceberg] arminnajafi commented on pull request #6646: Implement Support for DynamoDB Catalog

2023-01-25 Thread via GitHub
arminnajafi commented on PR #6646: URL: https://github.com/apache/iceberg/pull/6646#issuecomment-1404588075 @Fokko @jackye1995 Ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] pvary commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
pvary commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1087469498 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: ## @@ -316,6 +316,11 @@ public Builder setSnapshotProperty(String property, String value

[GitHub] [iceberg] pvary commented on pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-25 Thread via GitHub
pvary commented on PR #6660: URL: https://github.com/apache/iceberg/pull/6660#issuecomment-1404604496 Thanks for the PR @amogh-jahagirdar! Left one small question. Also, do we have this feature in FlinkSource? -- This is an automated message from the Apache Git Service. To respon

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6622: push down min/max/count to iceberg

2023-01-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #6622: URL: https://github.com/apache/iceberg/pull/6622#discussion_r1087455589 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -158,6 +182,141 @@ public Filter[] pushedFilters() { return

[GitHub] [iceberg] pvary commented on issue #6667: Spark Hive Iceberg Table Locks -- Settings Unclear in Docs + Overrides Not Working

2023-01-25 Thread via GitHub
pvary commented on issue #6667: URL: https://github.com/apache/iceberg/issues/6667#issuecomment-1404609798 Hi @GabeChurch, Sadly I am not too familiar with the Spark configurations, but when #6570 gets in, it might help you with the high frequency concurrent writes. -- This is an autom

[GitHub] [iceberg] nastra commented on a diff in pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-25 Thread via GitHub
nastra commented on code in PR #6664: URL: https://github.com/apache/iceberg/pull/6664#discussion_r1087507535 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -47,4 +48,10 @@ public CloseableIterable planTasks() { return TableScanUtil.planTasks( s

[GitHub] [iceberg] nastra commented on pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-25 Thread via GitHub
nastra commented on PR #6664: URL: https://github.com/apache/iceberg/pull/6664#issuecomment-1404648315 @szehon-ho thanks for the review. I double-checking and we actually don't need to deprecate anything. Making the method in the super class `protected` again fixes the issue, so we should b

[GitHub] [iceberg] nastra closed pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
nastra closed pull request #6664: Core: Fix API breakages around scanMetrics() URL: https://github.com/apache/iceberg/pull/6664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] pvary commented on a diff in pull request #6648: Hive: Refactor commit lock mechanism from HiveTableOperations

2023-01-26 Thread via GitHub
pvary commented on code in PR #6648: URL: https://github.com/apache/iceberg/pull/6648#discussion_r1087526419 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] pvary commented on a diff in pull request #6648: Hive: Refactor commit lock mechanism from HiveTableOperations

2023-01-26 Thread via GitHub
pvary commented on code in PR #6648: URL: https://github.com/apache/iceberg/pull/6648#discussion_r1087529984 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] pvary commented on a diff in pull request #6648: Hive: Refactor commit lock mechanism from HiveTableOperations

2023-01-26 Thread via GitHub
pvary commented on code in PR #6648: URL: https://github.com/apache/iceberg/pull/6648#discussion_r1087530236 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] Fokko commented on issue #6475: Python: Improve PyArrow performance

2023-01-26 Thread via GitHub
Fokko commented on issue #6475: URL: https://github.com/apache/iceberg/issues/6475#issuecomment-1404692576 Looking at the recent improvements by @rdblue Before 887 requests, after the change 203 requests, which is a great improvement, but still more to be done! After: ```

[GitHub] [iceberg] Fokko merged pull request #6671: Spark 3.2: Automatically set Arrow properties for read performance

2023-01-26 Thread via GitHub
Fokko merged PR #6671: URL: https://github.com/apache/iceberg/pull/6671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
ajantha-bhat commented on code in PR #6664: URL: https://github.com/apache/iceberg/pull/6664#discussion_r1087650560 ## .palantir/revapi.yml: ## @@ -320,38 +320,6 @@ acceptedBreaks: \ org.apache.iceberg.types.Types.StructType, java.util.function.BiFunction)" just

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

2023-01-26 Thread via GitHub
youngxinler commented on code in PR #6571: URL: https://github.com/apache/iceberg/pull/6571#discussion_r1087654680 ## docs/java-api.md: ## @@ -147,6 +147,58 @@ t.newAppend().appendFile(data).commit(); t.commitTransaction(); ``` +### WriteData + +The java api can write data i

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
ajantha-bhat commented on code in PR #6664: URL: https://github.com/apache/iceberg/pull/6664#discussion_r1087657361 ## .palantir/revapi.yml: ## @@ -320,38 +320,6 @@ acceptedBreaks: \ org.apache.iceberg.types.Types.StructType, java.util.function.BiFunction)" just

[GitHub] [iceberg] nastra commented on a diff in pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
nastra commented on code in PR #6664: URL: https://github.com/apache/iceberg/pull/6664#discussion_r1087665522 ## .palantir/revapi.yml: ## @@ -320,38 +320,6 @@ acceptedBreaks: \ org.apache.iceberg.types.Types.StructType, java.util.function.BiFunction)" justificat

[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: 🎨 Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-26 Thread via GitHub
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1087744708 ## api/src/main/java/org/apache/iceberg/actions/MigrateTable.java: ## @@ -50,6 +50,15 @@ default MigrateTable dropBackup() { throw new UnsupportedOperationE

[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: 🎨 Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-26 Thread via GitHub
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1087750558 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -405,14 +405,16 @@ private static Iterator buildManifest( * @param part

[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: 🎨 Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-26 Thread via GitHub
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1087772776 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -442,14 +444,51 @@ public static void importSparkTable( "Canno

[GitHub] [iceberg] 0xNacho commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

2023-01-26 Thread via GitHub
0xNacho commented on PR #6470: URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1405009473 +1 @peay . I have the same problem! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [iceberg] RussellSpitzer commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

2023-01-26 Thread via GitHub
RussellSpitzer commented on PR #6470: URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1405061598 Does the streaming write have the ability to set the file format? Or does that only let you use the table default as well? -- This is an automated message from the Apache Git Ser

[GitHub] [iceberg] Fokko commented on pull request #6672: Add REST catalog spec to iceberg-core.jar

2023-01-26 Thread via GitHub
Fokko commented on PR #6672: URL: https://github.com/apache/iceberg/pull/6672#issuecomment-1405075260 @snazy Out of curiosity, do you use the open api spec to directly generate code? In Python, we tried to do it as well, but the structure was too complex in the end, and we used the generate

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6624: 🎨 Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1087917320 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -442,14 +444,51 @@ public static void importSparkTable( "Canno

[GitHub] [iceberg] dimas-b commented on a diff in pull request #6656: Nessie: Avoid usage of deprecated APIs in test

2023-01-26 Thread via GitHub
dimas-b commented on code in PR #6656: URL: https://github.com/apache/iceberg/pull/6656#discussion_r1087949854 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieCatalog.java: ## @@ -70,12 +72,14 @@ public class TestNessieCatalog extends CatalogTests { private Strin

[GitHub] [iceberg-docs] RussellSpitzer merged pull request #196: Add new blog entry How Apache Iceberg enables ACID compliance for data lakes

2023-01-26 Thread via GitHub
RussellSpitzer merged PR #196: URL: https://github.com/apache/iceberg-docs/pull/196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

[GitHub] [iceberg-docs] RussellSpitzer commented on pull request #196: Add new blog entry How Apache Iceberg enables ACID compliance for data lakes

2023-01-26 Thread via GitHub
RussellSpitzer commented on PR #196: URL: https://github.com/apache/iceberg-docs/pull/196#issuecomment-1405133032 Thanks @sfc-gh-standure ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6624: 🎨 Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1087967571 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/AddFilesProcedure.java: ## @@ -119,8 +120,15 @@ public InternalRow[] call(InternalRow args)

[GitHub] [iceberg-docs] RussellSpitzer commented on a diff in pull request #187: Update the how-to-release page with findings after being a release manager

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #187: URL: https://github.com/apache/iceberg-docs/pull/187#discussion_r1087992589 ## landing-page/content/common/how-to-release.md: ## @@ -21,6 +21,18 @@ disableSidebar: true - limitations under the License. --> +## Introduction + +Th

[GitHub] [iceberg-docs] RussellSpitzer commented on a diff in pull request #187: Update the how-to-release page with findings after being a release manager

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #187: URL: https://github.com/apache/iceberg-docs/pull/187#discussion_r1088003719 ## landing-page/content/common/how-to-release.md: ## @@ -21,6 +21,18 @@ disableSidebar: true - limitations under the License. --> +## Introduction + +Th

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1088030207 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apac

[GitHub] [iceberg] Fokko opened a new pull request, #6673: Python: Optimize PyArrow reads 🚀🚀🚀

2023-01-26 Thread via GitHub
Fokko opened a new pull request, #6673: URL: https://github.com/apache/iceberg/pull/6673 PyArrow is still sluggish when it comes into opening files, and we still see many requests being made to S3. This PR removes the Dataset, and uses the lower read_table API. Since the read_table A

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6656: Nessie: Avoid usage of deprecated APIs in test

2023-01-26 Thread via GitHub
ajantha-bhat commented on code in PR #6656: URL: https://github.com/apache/iceberg/pull/6656#discussion_r1088078440 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieCatalog.java: ## @@ -70,12 +72,14 @@ public class TestNessieCatalog extends CatalogTests { private

[GitHub] [iceberg] ajantha-bhat commented on pull request #6661: Core: Support delete file stats in partitions metadata table

2023-01-26 Thread via GitHub
ajantha-bhat commented on PR #6661: URL: https://github.com/apache/iceberg/pull/6661#issuecomment-1405302607 cc: @szehon-ho, @RussellSpitzer, @rdblue, @flyrain -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [iceberg] ajantha-bhat commented on pull request #6656: Nessie: Avoid usage of deprecated APIs in test

2023-01-26 Thread via GitHub
ajantha-bhat commented on PR #6656: URL: https://github.com/apache/iceberg/pull/6656#issuecomment-1405308124 @dimas-b: Thanks for the review. @Fokko: Can you please help in reviewing and merging the PR? -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6646: Implement Support for DynamoDB Catalog

2023-01-26 Thread via GitHub
jackye1995 commented on code in PR #6646: URL: https://github.com/apache/iceberg/pull/6646#discussion_r1088108564 ## python/mkdocs/docs/configuration.md: ## @@ -85,3 +85,15 @@ catalog: default: type: glue ``` + +## DynamoDB Catalog + +If you want to use AWS DynamoDB as

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6646: Implement Support for DynamoDB Catalog

2023-01-26 Thread via GitHub
jackye1995 commented on code in PR #6646: URL: https://github.com/apache/iceberg/pull/6646#discussion_r1088117070 ## python/pyiceberg/catalog/base_aws_catalog.py: ## @@ -0,0 +1,163 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6646: Implement Support for DynamoDB Catalog

2023-01-26 Thread via GitHub
jackye1995 commented on code in PR #6646: URL: https://github.com/apache/iceberg/pull/6646#discussion_r1088118591 ## python/pyiceberg/catalog/hive.py: ## @@ -548,10 +511,9 @@ def update_namespace_properties( for key, value in updates.items():

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6661: Core: Support delete file stats in partitions metadata table

2023-01-26 Thread via GitHub
ajantha-bhat commented on code in PR #6661: URL: https://github.com/apache/iceberg/pull/6661#discussion_r1087862183 ## core/src/main/java/org/apache/iceberg/PartitionsTable.java: ## @@ -47,7 +48,11 @@ public class PartitionsTable extends BaseMetadataTable { Types.Ne

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6661: Core: Support delete file stats in partitions metadata table

2023-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #6661: URL: https://github.com/apache/iceberg/pull/6661#discussion_r1088136888 ## core/src/main/java/org/apache/iceberg/PartitionsTable.java: ## @@ -47,7 +48,11 @@ public class PartitionsTable extends BaseMetadataTable { Types.

[GitHub] [iceberg] szehon-ho merged pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
szehon-ho merged PR #6664: URL: https://github.com/apache/iceberg/pull/6664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho commented on pull request #6664: Core: Fix API breakages around scanMetrics()

2023-01-26 Thread via GitHub
szehon-ho commented on PR #6664: URL: https://github.com/apache/iceberg/pull/6664#issuecomment-1405377742 Merged, thanks @nastra and @ajantha-bhat for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] wypoon commented on pull request #6671: Spark 3.2: Automatically set Arrow properties for read performance

2023-01-26 Thread via GitHub
wypoon commented on PR #6671: URL: https://github.com/apache/iceberg/pull/6671#issuecomment-1405433121 Thanks @nastra and @Fokko. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [iceberg] manisin opened a new pull request, #6674: Add support for special characters in snowflake identifiers for Snowflake Catalog

2023-01-26 Thread via GitHub
manisin opened a new pull request, #6674: URL: https://github.com/apache/iceberg/pull/6674 Currently the catalog is unable to handle databases or schema or table names (snowflake identifiers) with special characters. This limitation is due to sanitizing of the parameter for the like clause

[GitHub] [iceberg] GabeChurch commented on issue #6667: Spark Hive Iceberg Table Locks -- Settings Unclear in Docs + Overrides Not Working

2023-01-26 Thread via GitHub
GabeChurch commented on issue #6667: URL: https://github.com/apache/iceberg/issues/6667#issuecomment-1405560496 Thanks for the quick response @pvary just saw that ticket last night as well! I actually created a benchmark in airflow, that runs 30 concurrent Spark jobs with 50 sequent

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1088290116 ## core/src/main/java/org/apache/iceberg/view/SQLViewRepresentationParser.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1088290778 ## core/src/main/java/org/apache/iceberg/view/ViewRepresentationParser.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6655: Spark: Handle ResolvingFileIO while determining LocalityPreference

2023-01-26 Thread via GitHub
jackye1995 commented on code in PR #6655: URL: https://github.com/apache/iceberg/pull/6655#discussion_r1088296400 ## core/src/main/java/org/apache/iceberg/io/ResolvingFileIO.java: ## @@ -164,7 +164,7 @@ private FileIO io(String location) { return io; } - private stati

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1088320961 ## core/src/main/java/org/apache/iceberg/view/UnknownViewRepresentation.java: ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1088335461 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: ## @@ -316,6 +316,11 @@ public Builder setSnapshotProperty(String property, St

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1088335461 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: ## @@ -316,6 +316,11 @@ public Builder setSnapshotProperty(String property, St

<    1   2   3   4   5   6   7   8   9   10   >