[GitHub] [iceberg] Fokko merged pull request #6033: Build: Bump mkdocs from 1.3.1 to 1.4.1 in /python

2022-10-24 Thread GitBox
Fokko merged PR #6033: URL: https://github.com/apache/iceberg/pull/6033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on a diff in pull request #6010: Python: Fix caching of the PyArrowFileIO

2022-10-24 Thread GitBox
Fokko commented on code in PR #6010: URL: https://github.com/apache/iceberg/pull/6010#discussion_r1003109634 ## python/pyiceberg/io/pyarrow.py: ## @@ -66,10 +74,14 @@ class PyArrowFile(InputFile, OutputFile): >>> # output_file.create().write(b'foobytes') """ -

[GitHub] [iceberg] gaborkaszab opened a new pull request, #6035: Core: Increase inferred column metrics limit to 100

2022-10-24 Thread GitBox
gaborkaszab opened a new pull request, #6035: URL: https://github.com/apache/iceberg/pull/6035 This patch seems to be present in 1.0.0 but missing in master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [iceberg] ajantha-bhat commented on pull request #6035: Core: Increase inferred column metrics limit to 100

2022-10-24 Thread GitBox
ajantha-bhat commented on PR #6035: URL: https://github.com/apache/iceberg/pull/6035#issuecomment-1288971855 Already merged yesterday? https://github.com/apache/iceberg/pull/5916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [iceberg] gaborkaszab commented on pull request #6035: Core: Increase inferred column metrics limit to 100

2022-10-24 Thread GitBox
gaborkaszab commented on PR #6035: URL: https://github.com/apache/iceberg/pull/6035#issuecomment-1288977162 Thanks for letting me know! I need a rebase then :) Closing this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [iceberg] gaborkaszab closed pull request #6035: Core: Increase inferred column metrics limit to 100

2022-10-24 Thread GitBox
gaborkaszab closed pull request #6035: Core: Increase inferred column metrics limit to 100 URL: https://github.com/apache/iceberg/pull/6035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6034: Python: GlueCatalog Full Implementation

2022-10-24 Thread GitBox
jackye1995 commented on code in PR #6034: URL: https://github.com/apache/iceberg/pull/6034#discussion_r1003415094 ## python/tests/catalog/test_glue.py: ## @@ -0,0 +1,252 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

[GitHub] [iceberg] huaxingao commented on pull request #5961: add Aggregate Expressions

2022-10-24 Thread GitBox
huaxingao commented on PR #5961: URL: https://github.com/apache/iceberg/pull/5961#issuecomment-1289199629 @rdblue Thank you very much for your review! I have addressed the comments. Could you please take one more look when you have time? Thanks! -- This is an automated message from the Ap

[GitHub] [iceberg] gaborkaszab commented on pull request #6036: Build: Add gaborkaszab as a collaborator

2022-10-24 Thread GitBox
gaborkaszab commented on PR #6036: URL: https://github.com/apache/iceberg/pull/6036#issuecomment-1289206280 @samredai @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [iceberg] ismailsimsek commented on issue #5997: Iceberg table maintenance/compaction within AWS

2022-10-24 Thread GitBox
ismailsimsek commented on issue #5997: URL: https://github.com/apache/iceberg/issues/5997#issuecomment-1289283783 @vshel any reason you are not [using Athena to do compaction](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html)? -- This is an au

[GitHub] [iceberg] nastra opened a new pull request, #6037: API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes

2022-10-24 Thread GitBox
nastra opened a new pull request, #6037: URL: https://github.com/apache/iceberg/pull/6037 The motivation behind moving `ScanReport` to `iceberg-core` is because we don't actually need it in the `iceberg-api`, since `MetricsReporter` only requires to have `MetricsReport` in the `iceberg-api`

[GitHub] [iceberg] nastra commented on a diff in pull request #5968: Core: Use explicit JSON Parser for namespace creation request

2022-10-24 Thread GitBox
nastra commented on code in PR #5968: URL: https://github.com/apache/iceberg/pull/5968#discussion_r1003528420 ## core/src/main/java/org/apache/iceberg/rest/requests/CreateNamespaceRequest.java: ## @@ -19,80 +19,24 @@ package org.apache.iceberg.rest.requests; import java.util

[GitHub] [iceberg] rdblue commented on pull request #6037: API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes

2022-10-24 Thread GitBox
rdblue commented on PR #6037: URL: https://github.com/apache/iceberg/pull/6037#issuecomment-1289304312 Looks good to me. I like not having so many nested interfaces! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] wypoon commented on pull request #6026: Spark: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on PR #6026: URL: https://github.com/apache/iceberg/pull/6026#issuecomment-1289315807 @flyrain for the test part, I followed your suggestion and added a test in `TestSparkReaderDeletes` instead (removing the earlier one). -- This is an automated message from the Apache Gi

[GitHub] [iceberg] wypoon commented on a diff in pull request #6026: Spark: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on code in PR #6026: URL: https://github.com/apache/iceberg/pull/6026#discussion_r1003542296 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -508,6 +530,75 @@ public void testIsDeletedColumnWithoutDeleteFile()

[GitHub] [iceberg] wypoon commented on a diff in pull request #6026: Spark: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on code in PR #6026: URL: https://github.com/apache/iceberg/pull/6026#discussion_r1003550199 ## spark/v3.3/build.gradle: ## @@ -140,6 +140,9 @@ project(":iceberg-spark:iceberg-spark-extensions-${sparkMajorVersion}_${scalaVer exclude group: 'org.roaringbi

[GitHub] [iceberg] wypoon commented on a diff in pull request #6026: Spark: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on code in PR #6026: URL: https://github.com/apache/iceberg/pull/6026#discussion_r1003550728 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestParquetMergeOnRead.java: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [iceberg] wypoon commented on a diff in pull request #6026: Spark: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on code in PR #6026: URL: https://github.com/apache/iceberg/pull/6026#discussion_r1003542296 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -508,6 +530,75 @@ public void testIsDeletedColumnWithoutDeleteFile()

[GitHub] [iceberg] Fokko opened a new pull request, #6038: Python: Fix Github pages

2022-10-24 Thread GitBox
Fokko opened a new pull request, #6038: URL: https://github.com/apache/iceberg/pull/6038 Github pages break every time we do a force push because it removes the `CNAME` file. This will fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [iceberg] ahshahid opened a new issue, #6039: Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

2022-10-24 Thread GitBox
ahshahid opened a new issue, #6039: URL: https://github.com/apache/iceberg/issues/6039 Spark has Partition Pruning rule which under right condition can fetch all the join keys of one side of the table, and pass it as an In Clause filter to other table. For eg if the query is select

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-10-24 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1003609858 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-10-24 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1003616023 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-10-24 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1003621379 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] jzhuge commented on pull request #4925: API: Add view interfaces

2022-10-24 Thread GitBox
jzhuge commented on PR #4925: URL: https://github.com/apache/iceberg/pull/4925#issuecomment-1289434209 @wmoustafa Thanks for the valuable questions and feedbacks. Echo your point that it is confusing to store representations in different view versions. I'd suggest this API contract:

[GitHub] [iceberg] JiJiTang commented on pull request #5539: [Core]Add EncryptionManagerFactory to configure encryption via catalog properties and table metadata.

2022-10-24 Thread GitBox
JiJiTang commented on PR #5539: URL: https://github.com/apache/iceberg/pull/5539#issuecomment-1289454522 cc @flyrain for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [iceberg] rdblue merged pull request #6037: API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes

2022-10-24 Thread GitBox
rdblue merged PR #6037: URL: https://github.com/apache/iceberg/pull/6037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #6040: AWS: Add AwsKmsClient implementation

2022-10-24 Thread GitBox
JonasJ-ap opened a new pull request, #6040: URL: https://github.com/apache/iceberg/pull/6040 - Add `AwsKmsClient`, which implements the `KmsClient` interface. - Add unit tests - Add integration tests -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [iceberg] wypoon opened a new pull request, #6041: Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon opened a new pull request, #6041: URL: https://github.com/apache/iceberg/pull/6041 This is a port of https://github.com/apache/iceberg/pull/6026 to spark/v3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] wypoon commented on pull request #6041: Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on PR #6041: URL: https://github.com/apache/iceberg/pull/6041#issuecomment-1289706498 @flyrain @chenjunjiedada this is a direct port of https://github.com/apache/iceberg/pull/6026 to spark/v3.2. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6025: [Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs.

2022-10-24 Thread GitBox
szehon-ho commented on code in PR #6025: URL: https://github.com/apache/iceberg/pull/6025#discussion_r1003844542 ## docs/spark-procedures.md: ## @@ -421,12 +421,18 @@ Existing data files are added to the Iceberg table's metadata and can be read us To leave the original table

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6025: [Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs.

2022-10-24 Thread GitBox
szehon-ho commented on code in PR #6025: URL: https://github.com/apache/iceberg/pull/6025#discussion_r1003844542 ## docs/spark-procedures.md: ## @@ -421,12 +421,18 @@ Existing data files are added to the Iceberg table's metadata and can be read us To leave the original table

[GitHub] [iceberg] szehon-ho opened a new issue, #6042: Harmonize partition field values for delete files

2022-10-24 Thread GitBox
szehon-ho opened a new issue, #6042: URL: https://github.com/apache/iceberg/issues/6042 ### Feature Request / Improvement @ajantha-bhat brought up that Partitions table fields 'file_count' and 'record_count' are not reflecting delete files, and was interested to fix it. One pos

[GitHub] [iceberg] flyrain merged pull request #6026: Spark 3.3: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
flyrain merged PR #6026: URL: https://github.com/apache/iceberg/pull/6026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

[GitHub] [iceberg] flyrain commented on pull request #6026: Spark 3.3: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
flyrain commented on PR #6026: URL: https://github.com/apache/iceberg/pull/6026#issuecomment-1289765350 Merged. Thanks @wypoon. Thanks @chenjunjiedada for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] rzhang10 commented on a diff in pull request #4110: Spark: Exclude netty from spark runtime modules

2022-10-24 Thread GitBox
rzhang10 commented on code in PR #4110: URL: https://github.com/apache/iceberg/pull/4110#discussion_r1003864020 ## spark/v2.4/build.gradle: ## @@ -121,6 +121,7 @@ project(':iceberg-spark:iceberg-spark-runtime') { exclude group: 'org.xerial.snappy' exclude group: 'j

[GitHub] [iceberg] github-actions[bot] commented on issue #4628: missing SetWriteDistributionAndOrdering class for spark sql plan

2022-10-24 Thread GitBox
github-actions[bot] commented on issue #4628: URL: https://github.com/apache/iceberg/issues/4628#issuecomment-1289809920 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] github-actions[bot] commented on issue #4549: HIVE_METASTORE_ERROR: Table storage descriptor is missing SerDe info - when query a view using an Iceberg table on Athena

2022-10-24 Thread GitBox
github-actions[bot] commented on issue #4549: URL: https://github.com/apache/iceberg/issues/4549#issuecomment-1289809974 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #4549: HIVE_METASTORE_ERROR: Table storage descriptor is missing SerDe info - when query a view using an Iceberg table on Athena

2022-10-24 Thread GitBox
github-actions[bot] closed issue #4549: HIVE_METASTORE_ERROR: Table storage descriptor is missing SerDe info - when query a view using an Iceberg table on Athena URL: https://github.com/apache/iceberg/issues/4549 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] github-actions[bot] commented on issue #4542: Schema Evolution exception: too many data columns

2022-10-24 Thread GitBox
github-actions[bot] commented on issue #4542: URL: https://github.com/apache/iceberg/issues/4542#issuecomment-1289809998 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #4542: Schema Evolution exception: too many data columns

2022-10-24 Thread GitBox
github-actions[bot] closed issue #4542: Schema Evolution exception: too many data columns URL: https://github.com/apache/iceberg/issues/4542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #5994: Doc: add assume role session name doc and remove redundant spark shell examples

2022-10-24 Thread GitBox
JonasJ-ap commented on code in PR #5994: URL: https://github.com/apache/iceberg/pull/5994#discussion_r1003915535 ## docs/aws.md: ## @@ -435,48 +437,23 @@ This is turned off by default. ### S3 Tags Custom [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-ta

[GitHub] [iceberg] hililiwei opened a new pull request, #6043: Core: Partial Update

2022-10-24 Thread GitBox
hililiwei opened a new pull request, #6043: URL: https://github.com/apache/iceberg/pull/6043 # Proposal: Partial Updates ## motivation Take feature engineering as an example, there are thousands or even tens of thousands of columns in the table, but the task will update

[GitHub] [iceberg] ajantha-bhat commented on issue #6042: Add delete file information to partitions table

2022-10-24 Thread GitBox
ajantha-bhat commented on issue #6042: URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1289876866 @szehon-ho : Say for `partition-a` I have `record_count`=6 and `file_count`=2. [3 records in each file] Now, I do position delete which marks 3 records in file1 as deleted

[GitHub] [iceberg] szehon-ho commented on issue #6042: Add delete file information to partitions table

2022-10-24 Thread GitBox
szehon-ho commented on issue #6042: URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1289920910 Yea I think , we cant do any arithmetic otherwise it becomes a matter of applying the delete file, which shouldn't be done in metadata table. This should be coming just from file

[GitHub] [iceberg] flyrain merged pull request #6041: Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
flyrain merged PR #6041: URL: https://github.com/apache/iceberg/pull/6041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

[GitHub] [iceberg] flyrain commented on pull request #6041: Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
flyrain commented on PR #6041: URL: https://github.com/apache/iceberg/pull/6041#issuecomment-1289925072 Thanks @wypoon for the PR. Thanks @chenjunjiedada for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-10-24 Thread GitBox
hililiwei commented on code in PR #5984: URL: https://github.com/apache/iceberg/pull/5984#discussion_r1003963515 ## api/src/main/java/org/apache/iceberg/IncrementalScan.java: ## @@ -21,6 +21,23 @@ /** API for configuring an incremental scan. */ public interface IncrementalScan

[GitHub] [iceberg] lvyanquan commented on pull request #5561: [Flink] Avoid submitting too many empty snapshots

2022-10-24 Thread GitBox
lvyanquan commented on PR #5561: URL: https://github.com/apache/iceberg/pull/5561#issuecomment-1289931212 Have the same needs as this pr, and we may need to add notes explaining this parameter 'flink.max-continuous-empty-commits'. -- This is an automated message from the Apache Git Servic

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-10-24 Thread GitBox
hililiwei commented on code in PR #5984: URL: https://github.com/apache/iceberg/pull/5984#discussion_r1003963515 ## api/src/main/java/org/apache/iceberg/IncrementalScan.java: ## @@ -21,6 +21,23 @@ /** API for configuring an incremental scan. */ public interface IncrementalScan

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-10-24 Thread GitBox
hililiwei commented on code in PR #5984: URL: https://github.com/apache/iceberg/pull/5984#discussion_r1003963515 ## api/src/main/java/org/apache/iceberg/IncrementalScan.java: ## @@ -21,6 +21,23 @@ /** API for configuring an incremental scan. */ public interface IncrementalScan

[GitHub] [iceberg] rbalamohan opened a new issue, #6044: Column pruning/projection is not happening in correlated queries (e.g Q94, Q16)

2022-10-24 Thread GitBox
rbalamohan opened a new issue, #6044: URL: https://github.com/apache/iceberg/issues/6044 ### Apache Iceberg version 0.14.0 ### Query engine Spark ### Please describe the bug 🐞 Column projection/pruning is not happening in iceberg tables for inner queries.

[GitHub] [iceberg] chenwyi2 commented on issue #4137: Flink Job failed to restore because of downstream table changed

2022-10-24 Thread GitBox
chenwyi2 commented on issue #4137: URL: https://github.com/apache/iceberg/issues/4137#issuecomment-1289942511 this problem has been solved? i also meet this problem, when iceberg was commited sucessfully but flink flush snapshot state to state backend was failed, then i restart task, it can

[GitHub] [iceberg] ajantha-bhat commented on issue #6042: Add delete file information to partitions table

2022-10-24 Thread GitBox
ajantha-bhat commented on issue #6042: URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1289959557 > Yea I think , we cant do any resolution of deletes, otherwise it becomes a matter of applying the delete file, which shouldn't be done in metadata table. This should be coming

[GitHub] [iceberg] haizhou-zhao opened a new pull request, #6045: [iceberg-hive-metastore] Add support for group ownership

2022-10-24 Thread GitBox
haizhou-zhao opened a new pull request, #6045: URL: https://github.com/apache/iceberg/pull/6045 Build on top of this PR: https://github.com/apache/iceberg/pull/5763 This is to add group ownership support to iceberg-hive-metastore -- This is an automated message from the Apache Git S

[GitHub] [iceberg] wypoon opened a new pull request, #6046: Spark 3.1: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon opened a new pull request, #6046: URL: https://github.com/apache/iceberg/pull/6046 This is a port of https://github.com/apache/iceberg/pull/6026 to spark/v3.1. This is not a direct port, as the 3.1 code base lags the 3.3 and 3.2 code base, but it is fairly straightforward.

[GitHub] [iceberg] pvary commented on pull request #6043: Core: Partial Update

2022-10-24 Thread GitBox
pvary commented on PR #6043: URL: https://github.com/apache/iceberg/pull/6043#issuecomment-1290016237 When we were developing updates for Hive tables, the first version of the ACID implementation was to store only the updated data, which is very similar to the partial updates suggested here

[GitHub] [iceberg] wypoon commented on a diff in pull request #6046: Spark 3.1: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly

2022-10-24 Thread GitBox
wypoon commented on code in PR #6046: URL: https://github.com/apache/iceberg/pull/6046#discussion_r1004028589 ## spark/v3.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReaderDeletes.java: ## @@ -117,16 +131,26 @@ public static void stopMetastoreAndSpark() throws

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5967: Flink: Support read options in flink source

2022-10-24 Thread GitBox
hililiwei commented on code in PR #5967: URL: https://github.com/apache/iceberg/pull/5967#discussion_r1004060008 ## docs/flink-getting-started.md: ## @@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream"); OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5967: Flink: Support read options in flink source

2022-10-24 Thread GitBox
hililiwei commented on code in PR #5967: URL: https://github.com/apache/iceberg/pull/5967#discussion_r1004060861 ## docs/flink-getting-started.md: ## @@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream"); OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the