[GitHub] [iceberg] Shane-Yu opened a new issue, #5671: The upsert mode can query the historical version of the data under certain conditions

2022-08-30 Thread GitBox
Shane-Yu opened a new issue, #5671: URL: https://github.com/apache/iceberg/issues/5671 ### Apache Iceberg version 0.13.1 ### Query engine Hive ### Please describe the bug 🐞 In Iceberg upsert mode, create v2 table like this: > create table upsert_up

[GitHub] [iceberg] Shane-Yu commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

2022-08-30 Thread GitBox
Shane-Yu commented on issue #5671: URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1231565348 @rdblue @openinx @stevenzwu @kbendick Can you guys take some time to look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [iceberg] Fokko opened a new pull request, #5672: Python: Update docs and fine-tune the API

2022-08-30 Thread GitBox
Fokko opened a new pull request, #5672: URL: https://github.com/apache/iceberg/pull/5672 The API wasn't consistent everywhere. Now the ids will just initialize at 1, so the user doesn't have to do this. -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [iceberg] Shane-Yu closed issue #5671: The upsert mode can query the historical version of the data under certain conditions

2022-08-30 Thread GitBox
Shane-Yu closed issue #5671: The upsert mode can query the historical version of the data under certain conditions URL: https://github.com/apache/iceberg/issues/5671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [iceberg] rajan-v opened a new issue, #5673: Support for new offset based deletion interface deleteOffsetFromDataFile in DeleteFiles

2022-08-30 Thread GitBox
rajan-v opened a new issue, #5673: URL: https://github.com/apache/iceberg/issues/5673 ### Feature Request / Improvement Support for another interface in DeleteFiles _DeleteFiles deleteOffsetFromDataFile(Map dataFileAndOffsetFileMap)_ https://iceberg.apache.org/javadoc/master/o

[GitHub] [iceberg] samredai commented on pull request #4801: Add Configuration page

2022-08-30 Thread GitBox
samredai commented on PR #4801: URL: https://github.com/apache/iceberg/pull/4801#issuecomment-1231689951 @rdblue when you have a chance can you take another look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [iceberg] huaxingao commented on a diff in pull request #5638: Bind overwrite filters

2022-08-30 Thread GitBox
huaxingao commented on code in PR #5638: URL: https://github.com/apache/iceberg/pull/5638#discussion_r958586790 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkTable.java: ## @@ -47,14 +55,39 @@ public void removeTable() { @Test public void tes

[GitHub] [iceberg] msb1 commented on issue #5630: Failed to start the Flink task (write iceberg)

2022-08-30 Thread GitBox
msb1 commented on issue #5630: URL: https://github.com/apache/iceberg/issues/5630#issuecomment-1231783307 If you are using gradle and creating shadow jar; do not use minimize()... Was doing build with shadowJar { minimize() zip64 true } in buil

[GitHub] [iceberg] pvary commented on a diff in pull request #4518: core: Provide mechanism to cache manifest file content

2022-08-30 Thread GitBox
pvary commented on code in PR #4518: URL: https://github.com/apache/iceberg/pull/4518#discussion_r958600742 ## core/src/main/java/org/apache/iceberg/ManifestFiles.java: ## @@ -300,4 +328,14 @@ private static ManifestFile copyManifestInternal( return writer.toManifestFile(

[GitHub] [iceberg] Fokko opened a new pull request, #5674: Python: Install PyYaml

2022-08-30 Thread GitBox
Fokko opened a new pull request, #5674: URL: https://github.com/apache/iceberg/pull/5674 Currently it is missing: ``` root@88de3a02961f:/# pip install "git+https://github.com/apache/iceberg.git#subdirectory=python[pyarrow]"^C root@88de3a02961f:/# pyiceberg Traceback (most rec

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5669: Core: Expire Snapshots reachability analysis

2022-08-30 Thread GitBox
amogh-jahagirdar commented on code in PR #5669: URL: https://github.com/apache/iceberg/pull/5669#discussion_r958663366 ## core/src/main/java/org/apache/iceberg/RemoveSnapshots.java: ## @@ -623,22 +667,82 @@ private Set findFilesToDelete( return filesToDelete; } + // H

[GitHub] [iceberg] rdblue merged pull request #5021: Add API changes for statistics information in table metadata

2022-08-30 Thread GitBox
rdblue merged PR #5021: URL: https://github.com/apache/iceberg/pull/5021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rizaon commented on a diff in pull request #4518: core: Provide mechanism to cache manifest file content

2022-08-30 Thread GitBox
rizaon commented on code in PR #4518: URL: https://github.com/apache/iceberg/pull/4518#discussion_r958669856 ## core/src/main/java/org/apache/iceberg/ManifestFiles.java: ## @@ -300,4 +328,14 @@ private static ManifestFile copyManifestInternal( return writer.toManifestFile

[GitHub] [iceberg] Fokko commented on a diff in pull request #5665: Core: Exclude old fields from the partition spec

2022-08-30 Thread GitBox
Fokko commented on code in PR #5665: URL: https://github.com/apache/iceberg/pull/5665#discussion_r958677816 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -212,7 +211,7 @@ public Set capabilities() { @Override public MetadataColu

[GitHub] [iceberg-docs] bitsondatadev commented on pull request #131: Adding vendors page

2022-08-30 Thread GitBox
bitsondatadev commented on PR #131: URL: https://github.com/apache/iceberg-docs/pull/131#issuecomment-1231925458 Checking in here @rdblue and @samredai. Is anything holding this up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [iceberg] Fokko merged pull request #5674: Python: Install PyYaml

2022-08-30 Thread GitBox
Fokko merged PR #5674: URL: https://github.com/apache/iceberg/pull/5674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] findepi commented on pull request #4741: Add implementation for statistics information in table snapshot

2022-08-30 Thread GitBox
findepi commented on PR #4741: URL: https://github.com/apache/iceberg/pull/4741#issuecomment-1231967878 Rebased after #5021 has been merged to make Conflicts disappear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [iceberg] findepi commented on pull request #5021: Add API changes for statistics information in table metadata

2022-08-30 Thread GitBox
findepi commented on PR #5021: URL: https://github.com/apache/iceberg/pull/5021#issuecomment-1231966833 Thank you for the merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [iceberg] Fokko commented on a diff in pull request #5627: Python: Reassign schema/partition-spec/sort-order ids

2022-08-30 Thread GitBox
Fokko commented on code in PR #5627: URL: https://github.com/apache/iceberg/pull/5627#discussion_r958814479 ## python/pyiceberg/schema.py: ## @@ -276,6 +279,32 @@ def primitive(self, primitive: PrimitiveType) -> T: """Visit a PrimitiveType""" +class PreOrderSchemaVi

[GitHub] [iceberg] Fokko commented on a diff in pull request #5627: Python: Reassign schema/partition-spec/sort-order ids

2022-08-30 Thread GitBox
Fokko commented on code in PR #5627: URL: https://github.com/apache/iceberg/pull/5627#discussion_r958815658 ## python/pyiceberg/schema.py: ## @@ -638,3 +724,61 @@ def map(self, map_type: MapType, key_result: int, value_result: int) -> int: def primitive(self, primitive:

[GitHub] [iceberg] Fokko commented on a diff in pull request #5627: Python: Reassign schema/partition-spec/sort-order ids

2022-08-30 Thread GitBox
Fokko commented on code in PR #5627: URL: https://github.com/apache/iceberg/pull/5627#discussion_r958853064 ## python/pyiceberg/schema.py: ## @@ -638,3 +724,61 @@ def map(self, map_type: MapType, key_result: int, value_result: int) -> int: def primitive(self, primitive:

[GitHub] [iceberg] Fokko commented on a diff in pull request #5627: Python: Reassign schema/partition-spec/sort-order ids

2022-08-30 Thread GitBox
Fokko commented on code in PR #5627: URL: https://github.com/apache/iceberg/pull/5627#discussion_r958853761 ## python/pyiceberg/table/metadata.py: ## @@ -327,24 +334,43 @@ def check_sort_orders(cls, values: Dict[str, Any]): based on the spec. Implementations must throw an e

[GitHub] [iceberg] Fokko commented on a diff in pull request #5627: Python: Reassign schema/partition-spec/sort-order ids

2022-08-30 Thread GitBox
Fokko commented on code in PR #5627: URL: https://github.com/apache/iceberg/pull/5627#discussion_r958866429 ## python/pyiceberg/table/partitioning.py: ## @@ -157,3 +159,20 @@ def compatible_with(self, other: "PartitionSpec") -> bool: UNPARTITIONED_PARTITION_SPEC = Partition

[GitHub] [iceberg] Fokko commented on pull request #5672: Python: Update docs and fine-tune the API

2022-08-30 Thread GitBox
Fokko commented on PR #5672: URL: https://github.com/apache/iceberg/pull/5672#issuecomment-1232104623 Waiting for https://github.com/apache/iceberg/pull/5627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [iceberg] jzhuge commented on pull request #4925: API: Add view interfaces

2022-08-30 Thread GitBox
jzhuge commented on PR #4925: URL: https://github.com/apache/iceberg/pull/4925#issuecomment-1232176520 Merged Amogh's PR, rebased, and applied spotless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] sumeetgajjar commented on pull request #5645: [Docs] Update drop table behavior in spark-ddl docs

2022-08-30 Thread GitBox
sumeetgajjar commented on PR #5645: URL: https://github.com/apache/iceberg/pull/5645#issuecomment-1232267477 A gentle ping @Fokko @samredai -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [iceberg] sumeetgajjar commented on pull request #5647: [0.14][Docs] Update drop table behavior in spark-ddl docs

2022-08-30 Thread GitBox
sumeetgajjar commented on PR #5647: URL: https://github.com/apache/iceberg/pull/5647#issuecomment-1232267521 A gentle ping @Fokko @samredai -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [iceberg] github-actions[bot] commented on issue #4257: Implement FileIO for Azure

2022-08-30 Thread GitBox
github-actions[bot] commented on issue #4257: URL: https://github.com/apache/iceberg/issues/4257#issuecomment-1232301235 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-08-30 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r959050316 ## api/src/main/java/org/apache/iceberg/catalog/ViewCatalog.java: ## @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-08-30 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r959052594 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contribu

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-08-30 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r959053085 ## api/src/main/java/org/apache/iceberg/view/ViewVersion.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contribu

[GitHub] [iceberg] hililiwei commented on a diff in pull request #4904: Flink: new sink base on the unified sink API

2022-08-30 Thread GitBox
hililiwei commented on code in PR #4904: URL: https://github.com/apache/iceberg/pull/4904#discussion_r959093861 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/writer/StreamWriter.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

2022-08-30 Thread GitBox
lvyanquan commented on code in PR #5662: URL: https://github.com/apache/iceberg/pull/5662#discussion_r959097698 ## docs/spark-queries.md: ## @@ -318,12 +318,20 @@ To show a table's current partitions: SELECT * FROM prod.db.table.partitions ``` -| partition | record_count | f

[GitHub] [iceberg] badbye commented on issue #2586: Add geometry type to iceberg

2022-08-30 Thread GitBox
badbye commented on issue #2586: URL: https://github.com/apache/iceberg/issues/2586#issuecomment-1232392801 To fully support geometry, there are lots of things to do. 1. Add geometry type. 2. Partitioning. 3. Filtering. 4. Writing and reading. Firstly, we must figure out

[GitHub] [iceberg] dmgcodevil opened a new issue, #5675: Limit the number of files for rewrite/compaction action

2022-08-30 Thread GitBox
dmgcodevil opened a new issue, #5675: URL: https://github.com/apache/iceberg/issues/5675 ### Query engine Flink ### Question We have a streaming Flink job that continously consumes records from Kafka and stores them into Iceberg. The [RewriteDataFilesAction](https:/

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
stevenzwu commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959153410 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rem

[GitHub] [iceberg-docs] pvary commented on pull request #131: Adding vendors page

2022-08-30 Thread GitBox
pvary commented on PR #131: URL: https://github.com/apache/iceberg-docs/pull/131#issuecomment-1232472565 @samredai: I see that all of the comments were fixed. If you also think that the page is ready to be pushed, then I would be happy to merge. Thanks, Peter -- This is an autom

[GitHub] [iceberg] kbendick commented on pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on PR #5642: URL: https://github.com/apache/iceberg/pull/5642#issuecomment-1232500424 Hey @xuzhiwen1255, thanks for the patch! I’ve been out very sick, but this seems important. I’ll do my best to take a look as soon as possible. Thanks Steven for reviewing. -- T

[GitHub] [iceberg] dotjdk opened a new issue, #5676: core: Dropping an old partition column causes NPE (and corrupt metadata on v2 tables)

2022-08-30 Thread GitBox
dotjdk opened a new issue, #5676: URL: https://github.com/apache/iceberg/issues/5676 ### Apache Iceberg version 0.14.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 On a format version 2 table, dropping an old partition column on an i

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959194058 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

2022-08-30 Thread GitBox
szehon-ho commented on code in PR #5662: URL: https://github.com/apache/iceberg/pull/5662#discussion_r959194801 ## docs/spark-queries.md: ## @@ -318,12 +318,15 @@ To show a table's current partitions: SELECT * FROM prod.db.table.partitions ``` -| partition | record_count | f

[GitHub] [iceberg] xuzhiwen1255 commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
xuzhiwen1255 commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959197838 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { //

[GitHub] [iceberg] xuzhiwen1255 commented on pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
xuzhiwen1255 commented on PR #5642: URL: https://github.com/apache/iceberg/pull/5642#issuecomment-1232511957 > Hey @xuzhiwen1255, thanks for the patch! > > I’ve been out very sick, but this seems important. I’ll do my best to take a look as soon as possible. Thanks Steven for reviewin

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959213199 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

2022-08-30 Thread GitBox
lvyanquan commented on code in PR #5662: URL: https://github.com/apache/iceberg/pull/5662#discussion_r959213698 ## docs/spark-queries.md: ## @@ -318,12 +318,15 @@ To show a table's current partitions: SELECT * FROM prod.db.table.partitions ``` -| partition | record_count | f

[GitHub] [iceberg] xuzhiwen1255 commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
xuzhiwen1255 commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959214446 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { //

[GitHub] [iceberg] tongwei opened a new issue, #5677: Why locality is disable default when FileSystem scheme is not hdfs

2022-08-30 Thread GitBox
tongwei opened a new issue, #5677: URL: https://github.com/apache/iceberg/issues/5677 ### Query engine spark ### Question When I test iceberg with alluxio and spark, I notice that locality is disable by default when FileSystem scheme is not hdfs. To enable this I can on

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959214832 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] tongwei commented on issue #5677: Why locality is disable default when FileSystem scheme is not hdfs

2022-08-30 Thread GitBox
tongwei commented on issue #5677: URL: https://github.com/apache/iceberg/issues/5677#issuecomment-1232531922 [ ![UvqGOmq3fK](https://user-images.githubusercontent.com/32157039/187612031-89a21480-9dda-4981-9d2b-4d0a9b2dda6d.jpg) ](url) -- This is an automated message from the Apache

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959217458 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959219452 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] kbendick commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
kbendick commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959219452 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { // rema

[GitHub] [iceberg] xuzhiwen1255 commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
xuzhiwen1255 commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959222044 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { //

[GitHub] [iceberg] xuzhiwen1255 commented on a diff in pull request #5642: Flink: Fixed an issue where Flink batch entry was not accurate

2022-08-30 Thread GitBox
xuzhiwen1255 commented on code in PR #5642: URL: https://github.com/apache/iceberg/pull/5642#discussion_r959223612 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java: ## @@ -87,6 +89,7 @@ public void endInput() throws IOException { //