[GitHub] [iceberg] hililiwei commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-11-07 Thread GitBox
hililiwei commented on code in PR #5984: URL: https://github.com/apache/iceberg/pull/5984#discussion_r1015103751 ## api/src/main/java/org/apache/iceberg/IncrementalScan.java: ## @@ -21,6 +21,23 @@ /** API for configuring an incremental scan. */ public interface IncrementalScan

[GitHub] [iceberg] gaborkaszab commented on pull request #6133: [1.0] Dell: Fix client serialization bug.

2022-11-07 Thread GitBox
gaborkaszab commented on PR #6133: URL: https://github.com/apache/iceberg/pull/6133#issuecomment-1305223386 @ajantha-bhat thanks for raising attention! 1.1.0 is branched from master so I guess if we keep this approach we won't need a branch like 1.0.x anymore. Would it make sense to drop

[GitHub] [iceberg] XBaith opened a new issue, #6137: Only one Iceberg hive catalog in different cluster is available in a job

2022-11-07 Thread GitBox
XBaith opened a new issue, #6137: URL: https://github.com/apache/iceberg/issues/6137 ### Apache Iceberg version 0.14.1 ### Query engine Spark ### Please describe the bug 🐞 ### Backgroud Migrate Iceberg table to different clusters. These two HDFS cluste

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #6134: URL: https://github.com/apache/iceberg/pull/6134#discussion_r1015129057 ## format/view-spec.md: ## @@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S | Optional | schema-id | ID of the

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-07 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1015142325 ## core/src/main/java/org/apache/iceberg/EnvironmentContext.java: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] lvyanquan opened a new issue, #6138: Keep the writing format of the table the same as before migration

2022-11-07 Thread GitBox
lvyanquan opened a new issue, #6138: URL: https://github.com/apache/iceberg/issues/6138 ### Feature Request / Improvement With the procedure of [migrate](https://iceberg.apache.org/docs/latest/spark-procedures/#migrate), we can migrate a Hive table to Iceberg. However, Iceberg us

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-07 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1015153734 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] wmoustafa commented on pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
wmoustafa commented on PR #4925: URL: https://github.com/apache/iceberg/pull/4925#issuecomment-1305287086 > * No need for something like `View.updateRepresentions()`, as `buildView() + replace()` should be enough Can one replace the current version or an old version by adding a new d

[GitHub] [iceberg] nastra commented on issue #6136: if i lost a metadata file, how to recover

2022-11-07 Thread GitBox
nastra commented on issue #6136: URL: https://github.com/apache/iceberg/issues/6136#issuecomment-1305291763 @chenwyi2 is the problem that you can't delete a table anymore because you accidentally deleted the old manifest file and deleting a table complains about the manifest file missing? T

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015154082 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] Fokko merged pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-07 Thread GitBox
Fokko merged PR #6117: URL: https://github.com/apache/iceberg/pull/6117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-07 Thread GitBox
Fokko commented on PR #6117: URL: https://github.com/apache/iceberg/pull/6117#issuecomment-1305312653 Fixed the linting issue, thanks @ddrinka! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] hililiwei commented on pull request #5281: Spark: Spark SQL Extensions for tag

2022-11-07 Thread GitBox
hililiwei commented on PR #5281: URL: https://github.com/apache/iceberg/pull/5281#issuecomment-1305316234 > That's a good question. @amogh-jahagirdar @jackye1995 @rdblue What do you think? Should we continue to create syntax, or switch to something like CALL? -- This is

[GitHub] [iceberg] majin1102 commented on issue #5606: Support to write a custom partition transforms in iceberg

2022-11-07 Thread GitBox
majin1102 commented on issue #5606: URL: https://github.com/apache/iceberg/issues/5606#issuecomment-1305332847 hi, vamen maybe we could use an UDF to transform custom partition field to meet your needs cause from my view, transforming extension on sql(flink/spark/trino. etc) could

[GitHub] [iceberg] majin1102 commented on issue #5883: How to combine vertically split tables into one table in iceberg

2022-11-07 Thread GitBox
majin1102 commented on issue #5883: URL: https://github.com/apache/iceberg/issues/5883#issuecomment-1305344678 It could be done in offline way using MERGE INTO sql -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [iceberg] luoyuxia commented on issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model)

2022-11-07 Thread GitBox
luoyuxia commented on issue #6104: URL: https://github.com/apache/iceberg/issues/6104#issuecomment-1305422261 Seems the files has been compressed, but it fails to to generate snapshot. Maybe some exception happen when commit. Is there any error or warning in the log? -- This is an aut

[GitHub] [iceberg] hililiwei commented on pull request #5967: Flink: Support read options in flink source

2022-11-07 Thread GitBox
hililiwei commented on PR #5967: URL: https://github.com/apache/iceberg/pull/5967#issuecomment-1305549142 cc @stevenzwu @chenjunjiedada @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [iceberg] hililiwei commented on pull request #5029: Flink: Use Tag or Branch to scan data.

2022-11-07 Thread GitBox
hililiwei commented on PR #5029: URL: https://github.com/apache/iceberg/pull/5029#issuecomment-1305571330 cc @amogh-jahagirdar @stevenzwu @rdblue, could you please take a look when you are available? -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [iceberg] pvary commented on issue #6067: exec insert into (hive on spark),no erro log,but table no data

2022-11-07 Thread GitBox
pvary commented on issue #6067: URL: https://github.com/apache/iceberg/issues/6067#issuecomment-1305645778 Hive on Spark is not supported/tested Also Hive 2.1.1 is also not supported/tested Could you use newer CDH/CDP version? -- This is an automated message from the Apache Git Servi

[GitHub] [iceberg] findepi commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-07 Thread GitBox
findepi commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1305652072 I think we should change the label of the `snapshot-id` entry in https://iceberg.apache.org/spec/#table-statistics (to level, not blob level) -- This is an automated message from the Ap

[GitHub] [iceberg] Fokko merged pull request #6135: Python: Use pythonic `len()` built-in instead of `length` property

2022-11-07 Thread GitBox
Fokko merged PR #6135: URL: https://github.com/apache/iceberg/pull/6135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] pvary commented on issue #6071: Should ClientPool consider UGI when reusing a connection?

2022-11-07 Thread GitBox
pvary commented on issue #6071: URL: https://github.com/apache/iceberg/issues/6071#issuecomment-1305705704 Sorry for the delay - I was OOO. Since we are reusing the HMSClient objects, we can face authorization issues if we are using Catalogs with multi-tenant scenarios. We were briefl

[GitHub] [iceberg] Fokko commented on a diff in pull request #6127: Python: Add expression evaluator

2022-11-07 Thread GitBox
Fokko commented on code in PR #6127: URL: https://github.com/apache/iceberg/pull/6127#discussion_r1015073328 ## python/pyiceberg/expressions/visitors.py: ## @@ -417,6 +422,75 @@ def visit_bound_predicate(self, predicate) -> BooleanExpression: return predicate +def

[GitHub] [iceberg] Fokko commented on a diff in pull request #6127: Python: Add expression evaluator

2022-11-07 Thread GitBox
Fokko commented on code in PR #6127: URL: https://github.com/apache/iceberg/pull/6127#discussion_r1015499369 ## python/pyiceberg/expressions/__init__.py: ## @@ -281,6 +289,30 @@ def __invert__(self) -> BoundIsNull: return BoundIsNull(self.term) +def coerce_unary_arg

[GitHub] [iceberg] Fokko commented on a diff in pull request #6139: Python: Remove dataclass

2022-11-07 Thread GitBox
Fokko commented on code in PR #6139: URL: https://github.com/apache/iceberg/pull/6139#discussion_r1015502404 ## python/tests/expressions/test_visitors.py: ## @@ -1647,73 +1641,6 @@ def test_manifest_evaluator_or(): assert _create_manifest_evaluator(expr).eval(manifest)

[GitHub] [iceberg] Fokko commented on a diff in pull request #6139: Python: Remove dataclass

2022-11-07 Thread GitBox
Fokko commented on code in PR #6139: URL: https://github.com/apache/iceberg/pull/6139#discussion_r1015503114 ## python/tests/expressions/test_expressions.py: ## @@ -73,27 +65,10 @@ NestedField, StringType, ) +from tests.conftest import FooStruct from tests.expression

[GitHub] [iceberg] Fokko commented on a diff in pull request #6139: Python: Remove dataclass

2022-11-07 Thread GitBox
Fokko commented on code in PR #6139: URL: https://github.com/apache/iceberg/pull/6139#discussion_r1015503864 ## python/pyiceberg/expressions/literals.py: ## @@ -439,6 +439,9 @@ def _(self, type_var: DecimalType) -> Optional[Literal[Decimal]]: else: return

[GitHub] [iceberg] dimas-b commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

2022-11-07 Thread GitBox
dimas-b commented on code in PR #6134: URL: https://github.com/apache/iceberg/pull/6134#discussion_r1015497935 ## format/view-spec.md: ## @@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S | Optional | schema-id | ID of the v

[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6045: [iceberg-hive-metastore] Support setting individual and group ownership for Namespace

2022-11-07 Thread GitBox
haizhou-zhao commented on code in PR #6045: URL: https://github.com/apache/iceberg/pull/6045#discussion_r1015549849 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -360,5 +360,7 @@ private TableProperties() {} public static final String UPSERT_ENABLED = "

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
aokolnychyi commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015562517 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.co

[GitHub] [iceberg] dhruv-pratap commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-07 Thread GitBox
dhruv-pratap commented on code in PR #6069: URL: https://github.com/apache/iceberg/pull/6069#discussion_r1015576539 ## python/pyiceberg/table/scan.py: ## @@ -0,0 +1,103 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

[GitHub] [iceberg] dhruv-pratap commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-07 Thread GitBox
dhruv-pratap commented on code in PR #6069: URL: https://github.com/apache/iceberg/pull/6069#discussion_r1015579816 ## python/tests/table/test_scan.py: ## @@ -0,0 +1,177 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015582931 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015583682 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015598536 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015598536 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
aokolnychyi commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015562517 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.co

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015603424 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015603424 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015605206 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Software

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015605561 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Software

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015603424 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-07 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1015604784 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Software

[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6045: [iceberg-hive-metastore] Support setting individual and group ownership for Namespace

2022-11-07 Thread GitBox
haizhou-zhao commented on code in PR #6045: URL: https://github.com/apache/iceberg/pull/6045#discussion_r1013067243 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -448,6 +540,36 @@ public void testRemoveNamespaceProperties() throws TExceptio

[GitHub] [iceberg] ahshahid commented on issue #6039: Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

2022-11-07 Thread GitBox
ahshahid commented on issue #6039: URL: https://github.com/apache/iceberg/issues/6039#issuecomment-1305869163 @rdblue @aokolnychyi is there any particular reason why colStats in TableContext is by default false ? With this flag false, for non partition cols the bounds are not being written

[GitHub] [iceberg] ahshahid commented on issue #6039: Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

2022-11-07 Thread GitBox
ahshahid commented on issue #6039: URL: https://github.com/apache/iceberg/issues/6039#issuecomment-1305877706 > Some update: For tpcds query with limited data and enabling stats at manifest level for non partition cols, still does not improve perf.. the cost of dpp query is pretty high, esp

[GitHub] [iceberg] rdblue commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
rdblue commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015644370 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combine

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015647765 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015648978 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015649887 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] dhruv-pratap commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-07 Thread GitBox
dhruv-pratap commented on code in PR #6069: URL: https://github.com/apache/iceberg/pull/6069#discussion_r1015651935 ## python/pyiceberg/table/scan.py: ## @@ -0,0 +1,103 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015652744 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015653216 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] nastra commented on pull request #6053: Build: Let revapi compare API compatibility against apache-iceberg-1.0.0

2022-11-07 Thread GitBox
nastra commented on PR #6053: URL: https://github.com/apache/iceberg/pull/6053#issuecomment-1305895789 @ajantha-bhat this should work: ``` - org.apache.iceberg:iceberg-api:apache-iceberg-0.14.0: "0.14.0" + org.apache.iceberg:iceberg-api:1.0.0: "1.0.0" ``` -- This is an au

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015658211 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015658509 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015658926 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015661549 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015680437 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015685367 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] ddrinka commented on pull request #6131: Python: Add initial TableScan implementation

2022-11-07 Thread GitBox
ddrinka commented on PR #6131: URL: https://github.com/apache/iceberg/pull/6131#issuecomment-1305936075 I'm just an outside observer here, but isn't there already a Python implementation that followed the Java API, but folks thought it would be good to do all this work to rewrite it to be m

[GitHub] [iceberg] dhruv-pratap commented on pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-07 Thread GitBox
dhruv-pratap commented on PR #6069: URL: https://github.com/apache/iceberg/pull/6069#issuecomment-1305942424 In retrospect, I think this is becoming too large of a PR and would benefit from breaking down into smaller tasks. I'm going to go ahead and close this PR and if you guys are onboard

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
sunchao commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015704077 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combin

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
sunchao commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015704882 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combin

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
sunchao commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1015705315 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combin

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015709176 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] Fokko commented on a diff in pull request #6131: Python: Add initial TableScan implementation

2022-11-07 Thread GitBox
Fokko commented on code in PR #6131: URL: https://github.com/apache/iceberg/pull/6131#discussion_r1015731779 ## python/pyiceberg/table/__init__.py: ## @@ -14,30 +14,43 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and

[GitHub] [iceberg] Fokko commented on pull request #6131: Python: Add initial TableScan implementation

2022-11-07 Thread GitBox
Fokko commented on PR #6131: URL: https://github.com/apache/iceberg/pull/6131#issuecomment-1305996028 I would also like: ```python scan = table.scan( row_filter=col("id") in [5, 6, 7], selected_fields=("id", "data"), snapshot_id=1234567890 ) ``` -- This is

[GitHub] [iceberg] Fokko opened a new pull request, #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
Fokko opened a new pull request, #6140: URL: https://github.com/apache/iceberg/pull/6140 Instead of just supplying the unbound expression to the evaluator directly, we created a bogus one and replaced the bound expression with the one we wanted to test. But introduced a bug in the test beca

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015762154 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [iceberg] ajantha-bhat commented on pull request #4826: Nessie: Use unique path for different table with same name

2022-11-07 Thread GitBox
ajantha-bhat commented on PR #4826: URL: https://github.com/apache/iceberg/pull/4826#issuecomment-1306018512 @RussellSpitzer, @rdblue: Can we please merge this PR if it is ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015778820 ## python/tests/expressions/test_visitors.py: ## @@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit return eval

[GitHub] [iceberg] ajantha-bhat commented on pull request #6053: Build: Let revapi compare API compatibility against apache-iceberg-1.0.0

2022-11-07 Thread GitBox
ajantha-bhat commented on PR #6053: URL: https://github.com/apache/iceberg/pull/6053#issuecomment-1306037207 > @ajantha-bhat this should work: - org.apache.iceberg:iceberg-api:apache-iceberg-0.14.0: "0.14.0" + org.apache.iceberg:iceberg-api:1.0.0: "1.0.0" a. Could you please ex

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015778820 ## python/tests/expressions/test_visitors.py: ## @@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit return eval

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4657: [WIP] API/Core: View support

2022-11-07 Thread GitBox
jzhuge commented on code in PR #4657: URL: https://github.com/apache/iceberg/pull/4657#discussion_r1015780470 ## core/src/main/java/org/apache/iceberg/view/BaseViewDefinition.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015781049 ## python/tests/expressions/test_visitors.py: ## @@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap(): upper_bound=_to_byte_buffer(Strin

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015780924 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015781565 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-07 Thread GitBox
ajantha-bhat commented on PR #6090: URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1306046322 > I think we should change the label of the snapshot-id entry in https://iceberg.apache.org/spec/#table-statistics (to level, not blob level) Sorry, I still didn't get how the

[GitHub] [iceberg] flyrain commented on pull request #6056: Parquet: Remove the row position since parquet row group has it natively

2022-11-07 Thread GitBox
flyrain commented on PR #6056: URL: https://github.com/apache/iceberg/pull/6056#issuecomment-1306047299 > Do we trust this value from Parquet? The approach parquet used is similar to what @chenjunjiedada implemented in Iceberg repo. As long as it is reliable(no bug), I don't see a rea

[GitHub] [iceberg] Fokko opened a new pull request, #6141: Python: Make invalid Literal conversions explicit

2022-11-07 Thread GitBox
Fokko opened a new pull request, #6141: URL: https://github.com/apache/iceberg/pull/6141 Currently, we silently turn Literals into None if we can't convert them, instead I prefer to raise an exception. This can cause silent bugs like: `EqualTo(Reference("id"), StringLiteral("123a"))` will t

[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
Fokko commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015798033 ## python/tests/expressions/test_visitors.py: ## @@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit return evalu

[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
Fokko commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015799107 ## python/tests/expressions/test_visitors.py: ## @@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap(): upper_bound=_to_byte_buffer(String

[GitHub] [iceberg] aokolnychyi commented on pull request #2276: Core: Add option to combine tasks by partition

2022-11-07 Thread GitBox
aokolnychyi commented on PR #2276: URL: https://github.com/apache/iceberg/pull/2276#issuecomment-1306079422 Let me take another look in a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015834172 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] rdblue commented on a diff in pull request #5432: AES GCM Stream Spec

2022-11-07 Thread GitBox
rdblue commented on code in PR #5432: URL: https://github.com/apache/iceberg/pull/5432#discussion_r950908532 ## format/gcm-stream-spec.md: ## @@ -0,0 +1,87 @@ +--- +title: "AES GCM Stream Spec" +url: gcm-stream-spec +toc: true +disableSidebar: true +--- + + +# AES GCM Stream (AG

[GitHub] [iceberg] rdblue commented on a diff in pull request #5432: AES GCM Stream Spec

2022-11-07 Thread GitBox
rdblue commented on code in PR #5432: URL: https://github.com/apache/iceberg/pull/5432#discussion_r1015856883 ## format/gcm-stream-spec.md: ## @@ -0,0 +1,87 @@ +--- +title: "AES GCM Stream Spec" +url: gcm-stream-spec +toc: true +disableSidebar: true +--- + + +# AES GCM Stream (A

[GitHub] [iceberg] rdblue commented on a diff in pull request #5432: AES GCM Stream Spec

2022-11-07 Thread GitBox
rdblue commented on code in PR #5432: URL: https://github.com/apache/iceberg/pull/5432#discussion_r1015856676 ## format/gcm-stream-spec.md: ## @@ -0,0 +1,87 @@ +--- +title: "AES GCM Stream Spec" +url: gcm-stream-spec +toc: true +disableSidebar: true +--- + + +# AES GCM Stream (A

[GitHub] [iceberg] rdblue commented on a diff in pull request #5432: AES GCM Stream Spec

2022-11-07 Thread GitBox
rdblue commented on code in PR #5432: URL: https://github.com/apache/iceberg/pull/5432#discussion_r1015857790 ## format/gcm-stream-spec.md: ## @@ -0,0 +1,87 @@ +--- +title: "AES GCM Stream Spec" +url: gcm-stream-spec +toc: true +disableSidebar: true +--- + + +# AES GCM Stream (A

[GitHub] [iceberg] rdblue commented on a diff in pull request #5432: AES GCM Stream Spec

2022-11-07 Thread GitBox
rdblue commented on code in PR #5432: URL: https://github.com/apache/iceberg/pull/5432#discussion_r1015859348 ## format/gcm-stream-spec.md: ## @@ -0,0 +1,87 @@ +--- +title: "AES GCM Stream Spec" +url: gcm-stream-spec +toc: true +disableSidebar: true +--- + + +# AES GCM Stream (A

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015864672 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

[GitHub] [iceberg] rdblue commented on a diff in pull request #4925: API: Add view interfaces

2022-11-07 Thread GitBox
rdblue commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1015865455 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] rdblue commented on a diff in pull request #6139: Python: Remove dataclasses

2022-11-07 Thread GitBox
rdblue commented on code in PR #6139: URL: https://github.com/apache/iceberg/pull/6139#discussion_r1015867397 ## python/pyiceberg/expressions/__init__.py: ## @@ -32,6 +41,24 @@ B = TypeVar("B") +def _to_literal(lit: Optional[Union[T, Literal[T]]]) -> Optional[Literal[T]]: +

[GitHub] [iceberg] Fokko commented on pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-07 Thread GitBox
Fokko commented on PR #6069: URL: https://github.com/apache/iceberg/pull/6069#issuecomment-1306148401 Hey @dhruv-pratap that makes a lot of sense. Maybe we should create issues on the list you mentioned above, to make sure that we're aligned on who's working on what. Smaller PRs make it muc

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015875223 ## python/tests/expressions/test_visitors.py: ## @@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap(): upper_bound=_to_byte_buffer(Strin

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015876595 ## python/tests/expressions/test_visitors.py: ## @@ -874,36 +881,19 @@ def test_manifest_evaluator_less_than_overlap(): upper_bound=_to_byte_buffer(StringTy

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015879735 ## python/tests/expressions/test_visitors.py: ## @@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit return eval

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015880802 ## python/tests/expressions/test_visitors.py: ## @@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap(): upper_bound=_to_byte_buffer(Strin

[GitHub] [iceberg] wmoustafa commented on a diff in pull request #6134: Spec: Add query-column-names to SQL view representation in view spec

2022-11-07 Thread GitBox
wmoustafa commented on code in PR #6134: URL: https://github.com/apache/iceberg/pull/6134#discussion_r101588 ## format/view-spec.md: ## @@ -116,11 +116,19 @@ This type of representation stores the original view definition in SQL and its S | Optional | schema-id | ID of the

[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

2022-11-07 Thread GitBox
rdblue commented on code in PR #6140: URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015882053 ## python/tests/expressions/test_visitors.py: ## @@ -874,36 +881,19 @@ def test_manifest_evaluator_less_than_overlap(): upper_bound=_to_byte_buffer(StringTy

  1   2   >