[GitHub] [iceberg] nastra commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
nastra commented on code in PR #4826: URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013139543 ## nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java: ## @@ -80,7 +80,7 @@ public abstract class BaseTestIceberg { private static final Logger LOG

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #5392: URL: https://github.com/apache/iceberg/pull/5392#discussion_r1013141956 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkSortStrategy.java: ## @@ -119,26 +119,22 @@ public Set rewriteFiles(List filesToRewrite)

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-03 Thread GitBox
szehon-ho commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013145015 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java: ## @@ -978,6 +1091,32 @@ private Set expectedManifestListPaths(Iterable snapshots, Long

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #5392: URL: https://github.com/apache/iceberg/pull/5392#discussion_r1013144623 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -94,10 +95,14 @@ private boolean useStartingSequenc

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #5392: URL: https://github.com/apache/iceberg/pull/5392#discussion_r1013157205 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -94,10 +95,14 @@ private boolean useStartingSequenc

[GitHub] [iceberg] RussellSpitzer commented on pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-03 Thread GitBox
RussellSpitzer commented on PR #5392: URL: https://github.com/apache/iceberg/pull/5392#issuecomment-1302378216 > @manuzhang, it seems reasonable to create a session for the entire rewrite, not just each Spark submission. Is that what was happening before? Yes basically the old behavio

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #6093: URL: https://github.com/apache/iceberg/pull/6093#discussion_r1013163719 ## README.md: ## @@ -72,8 +72,7 @@ Iceberg table support is organized in library modules: Iceberg also has modules for adding Iceberg support to processing en

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #6093: URL: https://github.com/apache/iceberg/pull/6093#discussion_r1013167274 ## spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWrites.java: ## @@ -283,7 +283,7 @@ private Dataset createDataset(Iterable record

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #6093: URL: https://github.com/apache/iceberg/pull/6093#discussion_r1013168926 ## .github/labeler.yml: ## @@ -61,7 +61,6 @@ DATA: - data/**/* SPARK: - spark-runtime/**/* Review Comment: Technically can't we get rid of most of the

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
ajantha-bhat commented on code in PR #6093: URL: https://github.com/apache/iceberg/pull/6093#discussion_r1013172577 ## .github/labeler.yml: ## @@ -61,7 +61,6 @@ DATA: - data/**/* SPARK: - spark-runtime/**/* Review Comment: I thought of it. This file is based on a very

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
RussellSpitzer commented on code in PR #6093: URL: https://github.com/apache/iceberg/pull/6093#discussion_r1013174550 ## spark/v2.4/spark/src/test/java/org/apache/iceberg/examples/README.md: ## @@ -164,7 +164,7 @@ Code examples can be found [here](SnapshotFunctionalityTest.java

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
ajantha-bhat commented on code in PR #4826: URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013179128 ## nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java: ## @@ -80,7 +80,7 @@ public abstract class BaseTestIceberg { private static final Logg

[GitHub] [iceberg] ajantha-bhat commented on pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
ajantha-bhat commented on PR #4826: URL: https://github.com/apache/iceberg/pull/4826#issuecomment-1302406603 @RussellSpitzer , @nastra : I have reverted the base test class changes and handled the nit. -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
ajantha-bhat commented on code in PR #4826: URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013186434 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -199,10 +200,16 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifi

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
ajantha-bhat commented on code in PR #4826: URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013186434 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -199,10 +200,16 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifi

[GitHub] [iceberg] ajantha-bhat commented on pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-03 Thread GitBox
ajantha-bhat commented on PR #6093: URL: https://github.com/apache/iceberg/pull/6093#issuecomment-1302421124 > This looks good to me, but I'm not quite up to date with our timeline for Spark 3 removal. Is this something we are doing right now? I don't have any feelings against this just wan

[GitHub] [iceberg] dimas-b commented on a diff in pull request #4826: Nessie: Use unique path for different table with same name

2022-11-03 Thread GitBox
dimas-b commented on code in PR #4826: URL: https://github.com/apache/iceberg/pull/4826#discussion_r1013194544 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -199,10 +200,16 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {

[GitHub] [iceberg] rdblue merged pull request #6114: Python: Use Types from Typing

2022-11-03 Thread GitBox
rdblue merged PR #6114: URL: https://github.com/apache/iceberg/pull/6114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6114: Python: Use Types from Typing

2022-11-03 Thread GitBox
rdblue commented on PR #6114: URL: https://github.com/apache/iceberg/pull/6114#issuecomment-1302456665 Thanks, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [iceberg] rdblue commented on a diff in pull request #6110: API: Hash floats -0.0 and 0.0 to the same bucket

2022-11-03 Thread GitBox
rdblue commented on code in PR #6110: URL: https://github.com/apache/iceberg/pull/6110#discussion_r1013224969 ## api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java: ## @@ -65,10 +65,17 @@ public void testSpecValues() { Assert.assertEquals("Spec example: hash

[GitHub] [iceberg] rdblue merged pull request #6086: Python: Add the REST token to the properties

2022-11-03 Thread GitBox
rdblue merged PR #6086: URL: https://github.com/apache/iceberg/pull/6086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-03 Thread GitBox
rdblue commented on PR #6108: URL: https://github.com/apache/iceberg/pull/6108#issuecomment-1302499406 Running CI. The changes look good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] rdblue merged pull request #6078: Python: Pin versions explicitly

2022-11-03 Thread GitBox
rdblue merged PR #6078: URL: https://github.com/apache/iceberg/pull/6078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] nicor88 commented on issue #4582: Remove extraneous trailing slash in table location

2022-11-03 Thread GitBox
nicor88 commented on issue #4582: URL: https://github.com/apache/iceberg/issues/4582#issuecomment-1302599201 @rdblue When running on Iceberg `0.14.0` the issue seems fixed for the data folder, but somehow if I have a trailing slash in my location, the trailing slash is kept for the metadata

[GitHub] [iceberg] jlowe opened a new issue, #6116: Constant columns created using Spark type incompatible with constant type

2022-11-03 Thread GitBox
jlowe opened a new issue, #6116: URL: https://github.com/apache/iceberg/issues/6116 ### Apache Iceberg version 1.0.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 ColumnVectorWithFilter.forHolder and IcebergArrowColumnVector.forHolder

[GitHub] [iceberg] ddrinka opened a new pull request, #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-03 Thread GitBox
ddrinka opened a new pull request, #6117: URL: https://github.com/apache/iceberg/pull/6117 @Fokko here's another runtime bug I'm seeing while playing with this new Manifest evaluator (#5845) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] sunchao commented on pull request #2276: Core: Add option to combine tasks by partition

2022-11-03 Thread GitBox
sunchao commented on PR #2276: URL: https://github.com/apache/iceberg/pull/2276#issuecomment-1302756737 OK, after discussing with @aokolnychyi offline, I made this PR to only focus on the task combining part. We can make the Spark related changes in follow-ups. -- This is an automated mes

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-03 Thread GitBox
sunchao commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1013464145 ## api/src/main/java/org/apache/iceberg/util/StructProjection.java: ## @@ -90,6 +103,13 @@ public static StructProjection createAllowMissing( private final StructPro

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-03 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1013485645 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -142,9 +142,21 @@ public static Schema selectNot(Schema schema, Set fieldIds) { } public s

[GitHub] [iceberg] rdblue merged pull request #6038: Python: Fix Github pages

2022-11-03 Thread GitBox
rdblue merged PR #6038: URL: https://github.com/apache/iceberg/pull/6038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
rdblue commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1013488063 ## python/pyiceberg/io/pyarrow.py: ## @@ -223,8 +236,8 @@ def new_input(self, location: str) -> PyArrowFile: Returns: PyArrowFile: A PyArrowFile in

[GitHub] [iceberg] rdblue commented on a diff in pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
rdblue commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1013488618 ## python/pyiceberg/io/pyarrow.py: ## @@ -93,11 +88,23 @@ class PyArrowFile(InputFile, OutputFile): >>> # output_file.create().write(b'foobytes') """ -

[GitHub] [iceberg] joao-parana commented on issue #6097: Partitioning based on the "identity" transform doesn't work in 1.0.0 Java API.

2022-11-03 Thread GitBox
joao-parana commented on issue #6097: URL: https://github.com/apache/iceberg/issues/6097#issuecomment-1302802386 The error has been fixed. It was not a bug but an error in my program. @rdblue explained to me how to fix it (https://apache-iceberg.slack.com/archives/C03LG1D563F/p1667408382726

[GitHub] [iceberg] joao-parana closed issue #6097: Partitioning based on the "identity" transform doesn't work in 1.0.0 Java API.

2022-11-03 Thread GitBox
joao-parana closed issue #6097: Partitioning based on the "identity" transform doesn't work in 1.0.0 Java API. URL: https://github.com/apache/iceberg/issues/6097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] SinghAsDev commented on pull request #6064: Support 2-level list and maps type in RemoveIds.

2022-11-03 Thread GitBox
SinghAsDev commented on PR #6064: URL: https://github.com/apache/iceberg/pull/6064#issuecomment-1302806360 > This makes sense to me, should we add a test for this to go over both of the branches of the `if`? Thanks for the review @Fokko ! Since this is exactly what we do in `ApplyNam

[GitHub] [iceberg] github-actions[bot] commented on issue #4616: Add Checkstyle Rule to prevent Map and Set

2022-11-03 Thread GitBox
github-actions[bot] commented on issue #4616: URL: https://github.com/apache/iceberg/issues/4616#issuecomment-1302813823 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #4616: Add Checkstyle Rule to prevent Map and Set

2022-11-03 Thread GitBox
github-actions[bot] closed issue #4616: Add Checkstyle Rule to prevent Map and Set URL: https://github.com/apache/iceberg/issues/4616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] github-actions[bot] commented on issue #4176: Change column set not null failed

2022-11-03 Thread GitBox
github-actions[bot] commented on issue #4176: URL: https://github.com/apache/iceberg/issues/4176#issuecomment-1302813844 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #4176: Change column set not null failed

2022-11-03 Thread GitBox
github-actions[bot] closed issue #4176: Change column set not null failed URL: https://github.com/apache/iceberg/issues/4176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [iceberg] szehon-ho merged pull request #5632: Core: Avoid reading ManifestFile when create ManifestReader

2022-11-03 Thread GitBox
szehon-ho merged PR #5632: URL: https://github.com/apache/iceberg/pull/5632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] szehon-ho commented on pull request #5632: Core: Avoid reading ManifestFile when create ManifestReader

2022-11-03 Thread GitBox
szehon-ho commented on PR #5632: URL: https://github.com/apache/iceberg/pull/5632#issuecomment-1302833958 Merged, thanks @ConeyLiu and @rdblue for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5812: Use Java collections in AwsProperties to fix Kryo serialization.

2022-11-03 Thread GitBox
szehon-ho commented on code in PR #5812: URL: https://github.com/apache/iceberg/pull/5812#discussion_r1013522691 ## aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java: ## @@ -493,53 +492,7 @@ public class AwsProperties implements Serializable { private String dynamoD

[GitHub] [iceberg] szehon-ho commented on pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-03 Thread GitBox
szehon-ho commented on PR #5376: URL: https://github.com/apache/iceberg/pull/5376#issuecomment-1302866207 updated and rebased, @RussellSpitzer if you have time to take a look as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
stevenzwu commented on code in PR #6111: URL: https://github.com/apache/iceberg/pull/6111#discussion_r1013582106 ## flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java: ## @@ -18,14 +18,13 @@ */ package org.apache.iceberg.flink; -im

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
stevenzwu commented on code in PR #6111: URL: https://github.com/apache/iceberg/pull/6111#discussion_r1013583303 ## flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java: ## @@ -145,8 +145,14 @@ protected Catalog createCatalog( baseNamespace =

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
stevenzwu commented on code in PR #6111: URL: https://github.com/apache/iceberg/pull/6111#discussion_r1013583775 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java: ## @@ -102,14 +102,18 @@ public FlinkCatalog( String defaultDatabase, Nam

[GitHub] [iceberg] stevenzwu commented on pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
stevenzwu commented on PR #6111: URL: https://github.com/apache/iceberg/pull/6111#issuecomment-1302927593 @hililiwei 's contribution of Flink 1.16 was just merged. @lvyanquan can you also port the change to 1.16 module? -- This is an automated message from the Apache Git Service. To respo

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
lvyanquan commented on code in PR #6111: URL: https://github.com/apache/iceberg/pull/6111#discussion_r1013585878 ## flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java: ## @@ -18,14 +18,13 @@ */ package org.apache.iceberg.flink; -im

[GitHub] [iceberg] zhongyujiang opened a new pull request, #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

2022-11-03 Thread GitBox
zhongyujiang opened a new pull request, #6118: URL: https://github.com/apache/iceberg/pull/6118 …ntain special characters. Iceberg escape special characters in field names when converting Schema to Parquet MessageType or Avro Schema: https://github.com/apache/iceberg/blob/7a247fd

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-03 Thread GitBox
stevenzwu commented on code in PR #6111: URL: https://github.com/apache/iceberg/pull/6111#discussion_r1013586945 ## flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogTablePartitions.java: ## @@ -18,14 +18,13 @@ */ package org.apache.iceberg.flink; -im

[GitHub] [iceberg] luoyuxia commented on issue #3201: FLINK:use ROW_NUMBER() over() get error

2022-11-03 Thread GitBox
luoyuxia commented on issue #3201: URL: https://github.com/apache/iceberg/issues/3201#issuecomment-1302959738 It's not a icberberg problem. It's a flink problem. In flink, `groupBy` will produce update data, but `OverAggregate` can't consume update data. -- This is an automated messag

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-03 Thread GitBox
ajantha-bhat commented on code in PR #6117: URL: https://github.com/apache/iceberg/pull/6117#discussion_r1013615833 ## python/pyiceberg/expressions/visitors.py: ## @@ -526,7 +526,7 @@ def visit_equal(self, term: BoundTerm, literal: Literal[Any]) -> bool: if lower > lit

[GitHub] [iceberg] fb913bf0de288ba84fe98f7a23d35edfdb22381 commented on a diff in pull request #6110: API: Hash floats -0.0 and 0.0 to the same bucket

2022-11-03 Thread GitBox
fb913bf0de288ba84fe98f7a23d35edfdb22381 commented on code in PR #6110: URL: https://github.com/apache/iceberg/pull/6110#discussion_r1013619657 ## api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java: ## @@ -65,10 +65,17 @@ public void testSpecValues() { Assert

[GitHub] [iceberg] Fokko commented on a diff in pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
Fokko commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1013633552 ## python/pyiceberg/io/pyarrow.py: ## @@ -223,8 +236,8 @@ def new_input(self, location: str) -> PyArrowFile: Returns: PyArrowFile: A PyArrowFile ins

[GitHub] [iceberg] Fokko commented on a diff in pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
Fokko commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1013633552 ## python/pyiceberg/io/pyarrow.py: ## @@ -223,8 +236,8 @@ def new_input(self, location: str) -> PyArrowFile: Returns: PyArrowFile: A PyArrowFile ins

[GitHub] [iceberg] Fokko commented on a diff in pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
Fokko commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1013641436 ## python/pyiceberg/io/pyarrow.py: ## @@ -93,11 +88,23 @@ class PyArrowFile(InputFile, OutputFile): >>> # output_file.create().write(b'foobytes') """ -

[GitHub] [iceberg] Fokko closed pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-03 Thread GitBox
Fokko closed pull request #6010: Python: Fix PyArrowFileIO caching URL: https://github.com/apache/iceberg/pull/6010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [iceberg] Fokko merged pull request #6115: Struct fields should be provided to Schema constructor

2022-11-03 Thread GitBox
Fokko merged PR #6115: URL: https://github.com/apache/iceberg/pull/6115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko opened a new pull request, #6119: Remove Fokko from the list of collaborators

2022-11-03 Thread GitBox
Fokko opened a new pull request, #6119: URL: https://github.com/apache/iceberg/pull/6119 I got this email: ![image](https://user-images.githubusercontent.com/1134248/199901339-17be4938-4bf5-4510-8ad4-440bf7a44708.png) However, this looks to be inconsistent with the docs: ![imag

[GitHub] [iceberg] ddrinka opened a new issue, #6120: [Python] The structure of a partition definition and partition instance should be consistent

2022-11-03 Thread GitBox
ddrinka opened a new issue, #6120: URL: https://github.com/apache/iceberg/issues/6120 ### Feature Request / Improvement Consider the `PartitionSummary` in a `ManifestFile`. The lower and upper bounds can be resolved with `conversions.from_bytes`, resulting in types according to the

[GitHub] [iceberg] singhpk234 opened a new pull request, #6121: [Core | Spark] Strip trailing slash from custom metadatalocation

2022-11-03 Thread GitBox
singhpk234 opened a new pull request, #6121: URL: https://github.com/apache/iceberg/pull/6121 ### About the change Followup for https://github.com/apache/iceberg/issues/4582, handle custom metadata location trailing slash location stripping. related issue : https://github.com/

[GitHub] [iceberg] Fokko commented on pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-04 Thread GitBox
Fokko commented on PR #6117: URL: https://github.com/apache/iceberg/pull/6117#issuecomment-1303050019 Thanks for spotting this one @ddrinka It looks like we also need a not-`None` check. I also noticed that this is being fixed in https://github.com/apache/iceberg/pull/6069. So I'll close th

[GitHub] [iceberg] Fokko closed pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-04 Thread GitBox
Fokko closed pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal` URL: https://github.com/apache/iceberg/pull/6117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] Fokko commented on pull request #6119: Remove Fokko from the list of collaborators

2022-11-04 Thread GitBox
Fokko commented on PR #6119: URL: https://github.com/apache/iceberg/pull/6119#issuecomment-1303053285 @singhpk234 Good call, just updated the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [iceberg] Fokko commented on issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

2022-11-04 Thread GitBox
Fokko commented on issue #6120: URL: https://github.com/apache/iceberg/issues/6120#issuecomment-1303064851 Hey @ddrinka Thanks for opening the PR. The date is a so-called logical type that most storage formats store internally as the days since 1970-01-01. After reading, this should be conv

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013695871 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -107,13 +107,15 @@ public Schema record(Schema record, List names, Iterable s

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013696351 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java: ## @@ -978,6 +1091,32 @@ private Set expectedManifestListPaths(Iterable snapshots, Long

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013697964 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -107,13 +107,15 @@ public Schema record(Schema record, List names, Iterable s

[GitHub] [iceberg] lvyanquan commented on pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-04 Thread GitBox
lvyanquan commented on PR #6111: URL: https://github.com/apache/iceberg/pull/6111#issuecomment-1303144296 Resubmitted to port the change to 1.16 module. "cache-enabled" is reserved now, since users who set "cache-enabled" to "false" before would need to add property "cache.expiration-int

[GitHub] [iceberg] Fokko merged pull request #6119: Remove Fokko from the list of collaborators

2022-11-04 Thread GitBox
Fokko merged PR #6119: URL: https://github.com/apache/iceberg/pull/6119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] kamaljit-1991 opened a new issue, #6122: IcebergGenerics.read(table) doesn't work as expected

2022-11-04 Thread GitBox
kamaljit-1991 opened a new issue, #6122: URL: https://github.com/apache/iceberg/issues/6122 ### Apache Iceberg version 0.13.1 ### Query engine _No response_ ### Please describe the bug 🐞 It is little bit of related https://github.com/apache/iceberg/issues/45

[GitHub] [iceberg] RussellSpitzer commented on issue #6122: IcebergGenerics.read(table) doesn't work as expected

2022-11-04 Thread GitBox
RussellSpitzer commented on issue #6122: URL: https://github.com/apache/iceberg/issues/6122#issuecomment-1303235332 Are you adding a default name mapping for your table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [iceberg] findepi commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
findepi commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013923155 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] findepi commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
findepi commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013924386 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark

2022-11-04 Thread GitBox
harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark URL: https://github.com/apache/iceberg/issues/6089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [iceberg] harini-venkataraman commented on issue #6089: Issue with Creation of Database using Spark

2022-11-04 Thread GitBox
harini-venkataraman commented on issue #6089: URL: https://github.com/apache/iceberg/issues/6089#issuecomment-1303312171 **RCA :** Spark had introduced a new configuration - `spark.sql.warehouse.dir` https://issues.apache.org/jira/browse/SPARK-15034 Tried changing this in the config

[GitHub] [iceberg] findepi commented on a diff in pull request #6091: Spark-3.3: Handle statistics file clean up from expireSnapshots action/procedure

2022-11-04 Thread GitBox
findepi commented on code in PR #6091: URL: https://github.com/apache/iceberg/pull/6091#discussion_r1013950242 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1234,6 +1245,40 @@ public void testMultipleRefsAndCleanExpiredFilesFailsForIncrementalCleanup

[GitHub] [iceberg] findepi commented on pull request #5129: Add source snapshot info to Puffin Blob metadata

2022-11-04 Thread GitBox
findepi commented on PR #5129: URL: https://github.com/apache/iceberg/pull/5129#issuecomment-1303363469 @rdblue Now that we have this on the blob metadata level, do we still need to have `org.apache.iceberg.StatisticsFile#snapshotId` field? cc @ajantha-bhat -- This is an automate

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013964398 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1013976800 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1013983522 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -135,11 +143,14 @@ public CloseableIterable planFiles() { planningDuration.stop();

[GitHub] [iceberg] nastra commented on pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-04 Thread GitBox
nastra commented on PR #6108: URL: https://github.com/apache/iceberg/pull/6108#issuecomment-1303478725 @Omega359 could you please fix the missing import so that the code compiles? Would be great to get this merged -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [iceberg] ConeyLiu commented on pull request #5632: Core: Avoid reading ManifestFile when create ManifestReader

2022-11-04 Thread GitBox
ConeyLiu commented on PR #5632: URL: https://github.com/apache/iceberg/pull/5632#issuecomment-1303486785 Thanks @szehon-ho @rdblue @nastra @zinking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014038296 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014048088 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -142,9 +142,21 @@ public static Schema selectNot(Schema schema, Set fieldIds) { } pub

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014059674 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -532,6 +537,23 @@ public final void initialize(String name, CaseInsensitiveStringMap

[GitHub] [iceberg] nastra commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-04 Thread GitBox
nastra commented on code in PR #6113: URL: https://github.com/apache/iceberg/pull/6113#discussion_r1014062148 ## core/src/main/java/org/apache/iceberg/util/JsonUtil.java: ## @@ -251,6 +252,11 @@ public static Set getIntegerSet(String property, JsonNode node) { .build()

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014095953 ## core/src/main/java/org/apache/iceberg/BaseFilesTable.java: ## @@ -140,42 +142,72 @@ protected CloseableIterable doPlanFiles() { } static class Manifes

[GitHub] [iceberg] code-magician323 commented on issue #4977: Support Kafka Connect within Iceberg

2022-11-04 Thread GitBox
code-magician323 commented on issue #4977: URL: https://github.com/apache/iceberg/issues/4977#issuecomment-1303656262 @kbendick Do you think there will be progress at this area soon? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014106936 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014108559 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014109381 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -140,6 +140,8 @@ private CatalogProperties() {} public static final String APP_ID = "app-id";

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014113140 ## core/src/main/java/org/apache/iceberg/EnvironmentContext.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014114843 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014117067 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014117969 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -140,6 +140,8 @@ private CatalogProperties() {} public static final String APP_ID = "app-id";

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014136769 ## core/src/main/java/org/apache/iceberg/EnvironmentContext.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] martindurant commented on issue #5800: Integrate pyiceberg with Dask

2022-11-04 Thread GitBox
martindurant commented on issue #5800: URL: https://github.com/apache/iceberg/issues/5800#issuecomment-1303714276 cc https://github.com/martindurant/daskberg/issues/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014114843 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] nastra commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
nastra commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1014180932 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

<    6   7   8   9   10   11   12   13   14   15   >