Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1792208374 ## format/spec.md: ## @@ -298,16 +298,102 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns i

Re: [PR] Add clarifying docs to transform result types [iceberg-python]

2024-10-08 Thread via GitHub
kevinzwang commented on PR #1211: URL: https://github.com/apache/iceberg-python/pull/1211#issuecomment-2400653349 @kevinjqliu oh I merged instead of rebase. Either way, I think it makes sense to squash and merge this PR into main anyway -- This is an automated message from the Apache Git

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400362505 I think this is an optimization for the engine side. I want to balance "pyiceberg, the python library for iceberg" and "pyiceberg, the engines to run queries on iceberg

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1792459912 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1792461392 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile

Re: [I] Make iceberg an idempotent sink for Spark like delta lake [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8809: URL: https://github.com/apache/iceberg/issues/8809#issuecomment-2401026895 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Enable Partition Transforms and/or Spark SQL In Spark `rewrite_data_files` Procedure [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8846: Enable Partition Transforms and/or Spark SQL In Spark `rewrite_data_files` Procedure URL: https://github.com/apache/iceberg/issues/8846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1792709044 ## format/spec.md: ## @@ -554,6 +648,15 @@ Manifests for a snapshot are tracked by a manifest list. Valid snapshots are stored as a list in table metadata.

Re: [I] Cannot create a V1 table with `CREATE OR REPLACE TABLE` [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8756: URL: https://github.com/apache/iceberg/issues/8756#issuecomment-2401026654 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Docs: Update site-docs/spark-quickstart.md [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on PR #8991: URL: https://github.com/apache/iceberg/pull/8991#issuecomment-2401027166 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [I] Enable Partition Transforms and/or Spark SQL In Spark `rewrite_data_files` Procedure [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8846: URL: https://github.com/apache/iceberg/issues/8846#issuecomment-2401026930 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

2024-10-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #11273: URL: https://github.com/apache/iceberg/pull/11273#discussion_r1792697477 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java: ## @@ -169,7 +174,13 @@ public DeltaWriterFactory createBatc

Re: [I] Distributed execution of DeleteReachableFilesSparkAction [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8862: Distributed execution of DeleteReachableFilesSparkAction URL: https://github.com/apache/iceberg/issues/8862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] manifest lost [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8806: URL: https://github.com/apache/iceberg/issues/8806#issuecomment-2401026843 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Query fails when executed without filter i.e. aggregate pushdown [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8859: URL: https://github.com/apache/iceberg/issues/8859#issuecomment-2401026963 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Spec does not define which header fields to be present in ManifestLists [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8746: Spec does not define which header fields to be present in ManifestLists URL: https://github.com/apache/iceberg/issues/8746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
jinyangli34 commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792853328 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -211,6 +228,8 @@ private void flushRowGroup(boolean finished) { writer.star

Re: [PR] Add clarifying docs to transform result types [iceberg-python]

2024-10-08 Thread via GitHub
kevinzwang commented on PR #1211: URL: https://github.com/apache/iceberg-python/pull/1211#issuecomment-2400911951 Also I don't think I have perms to merge myself so feel free to push the button whenever -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-08 Thread via GitHub
huaxingao commented on PR #11257: URL: https://github.com/apache/iceberg/pull/11257#issuecomment-2400917819 CI for preview1 passed. CI for preview2 failed. Trying SNAPSHOT to see if some of the Spark issues in preview2 have been fixed in SNAPSHOT. -- This is an automated message fr

Re: [I] Flaky test/env TestFlinkParquetReader, TestFlinkParquetWriter, TestIcebergSourceBoundedSql [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8761: Flaky test/env TestFlinkParquetReader, TestFlinkParquetWriter, TestIcebergSourceBoundedSql URL: https://github.com/apache/iceberg/issues/8761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Query fails when executed without filter i.e. aggregate pushdown [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8859: Query fails when executed without filter i.e. aggregate pushdown URL: https://github.com/apache/iceberg/issues/8859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Make iceberg an idempotent sink for Spark like delta lake [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8809: Make iceberg an idempotent sink for Spark like delta lake URL: https://github.com/apache/iceberg/issues/8809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Pyiceberg support the query without provided snapshot_id [iceberg-python]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #553: URL: https://github.com/apache/iceberg-python/issues/553#issuecomment-2401029561 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] Implementation does not write `schema-id` into Manifest Avro headers [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8745: Implementation does not write `schema-id` into Manifest Avro headers URL: https://github.com/apache/iceberg/issues/8745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] java.lang.IllegalArgumentException: requirement failed while read migrated parquet table [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8863: java.lang.IllegalArgumentException: requirement failed while read migrated parquet table URL: https://github.com/apache/iceberg/issues/8863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Docs: Update site-docs/spark-quickstart.md [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed pull request #8991: Docs: Update site-docs/spark-quickstart.md URL: https://github.com/apache/iceberg/pull/8991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

2024-10-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #11273: URL: https://github.com/apache/iceberg/pull/11273#discussion_r1792703916 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java: ## @@ -449,8 +493,33 @@ protected PartitioningWriter, Delete

Re: [I] How is iceberg compatible with hive's tez engine [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8757: How is iceberg compatible with hive's tez engine URL: https://github.com/apache/iceberg/issues/8757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Support tencent COS fileIO [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on PR #9048: URL: https://github.com/apache/iceberg/pull/9048#issuecomment-2401027269 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792639080 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -215,7 +213,7 @@ private List writeNewManifests() throws IOException { } if (newManif

Re: [PR] Core: Make metrics reporter serializable (alternative impl) [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on PR #8032: URL: https://github.com/apache/iceberg/pull/8032#issuecomment-2401026539 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] java.lang.IllegalArgumentException: requirement failed while read migrated parquet table [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8863: URL: https://github.com/apache/iceberg/issues/8863#issuecomment-2401027009 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Support tencent COS fileIO [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed pull request #9048: Support tencent COS fileIO URL: https://github.com/apache/iceberg/pull/9048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Parquet.write to S3 with GlueCatalog requires commit [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8767: URL: https://github.com/apache/iceberg/issues/8767#issuecomment-2401026784 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1792822991 ## arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java: ## @@ -262,6 +265,142 @@ public void testReadColumnFilter2() throws Exceptio

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792834646 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -71,6 +72,7 @@ public String partition() { private final PartitionSet deleteFilePart

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792834057 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -71,6 +72,7 @@ public String partition() { private final PartitionSet deleteFilePart

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792836843 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -372,8 +367,14 @@ private boolean manifestHasDeletedFiles( for (ManifestEntry en

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792836843 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -372,8 +367,14 @@ private boolean manifestHasDeletedFiles( for (ManifestEntry en

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792838808 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -230,27 +232,20 @@ SnapshotSummary.Builder buildSummary(Iterable manifests) { *

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-08 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1792839130 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792837752 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -421,7 +421,7 @@ private ManifestFile filterManifestWithDeletedFiles(

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792837752 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -421,7 +421,7 @@ private ManifestFile filterManifestWithDeletedFiles(

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-08 Thread via GitHub
rdblue commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1791001221 ## format/spec.md: ## @@ -298,16 +298,102 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is: -|

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
jinyangli34 commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792875688 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -66,6 +66,9 @@ class ParquetWriter implements FileAppender, Closeable { private b

Re: [PR] Spec: Add v3 types and type promotion [iceberg]

2024-10-08 Thread via GitHub
emkornfield commented on code in PR #10955: URL: https://github.com/apache/iceberg/pull/10955#discussion_r1792890848 ## format/spec.md: ## @@ -230,11 +233,31 @@ Schemas may be evolved by type promotion or adding, deleting, renaming, or reord Evolution applies changes to the

Re: [PR] Spec: Add v3 types and type promotion [iceberg]

2024-10-08 Thread via GitHub
emkornfield commented on code in PR #10955: URL: https://github.com/apache/iceberg/pull/10955#discussion_r1792888066 ## format/spec.md: ## @@ -1089,6 +1118,7 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | Primitive type | Hash speci

Re: [PR] Config for deciding whether to use Iceberg Time type [iceberg]

2024-10-08 Thread via GitHub
bryanck commented on PR #11174: URL: https://github.com/apache/iceberg/pull/11174#issuecomment-2400139107 Have you considered using an SMT for this? I'm reluctant to add configs for each type conversion scenario. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-08 Thread via GitHub
kevinjqliu commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1792170277 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licens

Re: [PR] Drop support for Python 3.8 [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on PR #1221: URL: https://github.com/apache/iceberg-python/pull/1221#issuecomment-2400318556 Merging, since it was already voted to drop python 3.8 in the pyiceberg 0.8.0 release (next release) -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Remove python 3.8 support [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu closed issue #1121: Remove python 3.8 support URL: https://github.com/apache/iceberg-python/issues/1121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Drop support for Python 3.8 [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu merged PR #1221: URL: https://github.com/apache/iceberg-python/pull/1221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Drop support for Python 3.8 [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on PR #1221: URL: https://github.com/apache/iceberg-python/pull/1221#issuecomment-2400319218 Thank you @raulcd for the contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Arrow: Remove dead code [iceberg]

2024-10-08 Thread via GitHub
wypoon commented on PR #11276: URL: https://github.com/apache/iceberg/pull/11276#issuecomment-2400327463 Thanks @nastra. FYI, with this change, `DecimalVectorUtil` is no longer used (except in tests for it). It is a public class with a public method. -- This is an automated message fro

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-08 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2400259998 @kevinjqliu: I wasn't sure because of https://github.com/apache/iceberg/pull/9871. I can add it. If we agree that it is useful. -- This is an automated message from the

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-08 Thread via GitHub
kevinjqliu commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2400288200 #9871 is superseded by #10908 See https://github.com/apache/iceberg/pull/10908/files#diff-d11c4195b39c7a98f6f1ed9ab99d1845ca19d297574e92b836c95ff6aa6e1701L25-L29 -- This

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-08 Thread via GitHub
kevinjqliu commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2400304905 I initially thought this was creating the jar for the TCK, but it looks like its for the `RESTCatalogServer` class instead. I wonder if it would be easier to work with if the class i

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-10-08 Thread via GitHub
kevinjqliu commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2400356510 This is a great idea! We should leverage Iceberg's robust metadata whenever possible. As mentioned, this would be a specific optimization for querying Iceberg tabl

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-08 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2400360639 > I initially thought this was creating the jar for the TCK, but it looks like its for the RESTCatalogServer class instead. I wonder if it would be easier to work with if the clas

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1792211491 ## format/spec.md: ## @@ -684,34 +797,38 @@ The atomic operation used to commit metadata depends on how tables are tracked a Table metadata consists of the

Re: [PR] WIP: Initial Support for Spark 4.0 [iceberg]

2024-10-08 Thread via GitHub
huaxingao commented on PR #10622: URL: https://github.com/apache/iceberg/pull/10622#issuecomment-2400850614 cc @aihuaxu @RussellSpitzer Spark 4.0 preview1 works OK now with the new [PR](https://github.com/apache/iceberg/pull/11257), but there are still a few issues with Preview 2. I am

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792496212 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -132,7 +135,9 @@ private void ensureWriterInitialized() { @Override public

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792493504 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -66,6 +66,9 @@ class ParquetWriter implements FileAppender, Closeable { privat

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792492250 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -211,6 +228,8 @@ private void flushRowGroup(boolean finished) { writer.s

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792476033 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -185,9 +190,21 @@ public List splitOffsets() { return null; } + /* +

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-08 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1792516517 ## .github/workflows/java-ci.yml: ## @@ -95,7 +95,7 @@ jobs: runs-on: ubuntu-22.04 strategy: matrix: -jvm: [11, 17, 21] +jvm: [17,

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-08 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1792517220 ## .github/workflows/java-ci.yml: ## @@ -108,7 +108,7 @@ jobs: runs-on: ubuntu-22.04 strategy: matrix: -jvm: [11, 17, 21] +jvm: [17

Re: [PR] DO NOT MERGE WILL BREAK - Change BaseCatalog to Interface [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer closed pull request #11210: DO NOT MERGE WILL BREAK - Change BaseCatalog to Interface URL: https://github.com/apache/iceberg/pull/11210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-08 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1792514818 ## format/spec.md: ## @@ -598,6 +702,14 @@ Notes: 1. Lower and upper bounds are serialized to bytes using the single-object serialization in Appendix D. The

Re: [PR] WIP: Initial Support for Spark 4.0 [iceberg]

2024-10-08 Thread via GitHub
huaxingao closed pull request #10622: WIP: Initial Support for Spark 4.0 URL: https://github.com/apache/iceberg/pull/10622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] WIP: Initial Support for Spark 4.0 [iceberg]

2024-10-08 Thread via GitHub
huaxingao commented on PR #10622: URL: https://github.com/apache/iceberg/pull/10622#issuecomment-2400841583 It's easier to have a new PR than update this one. I am closing this one and open a new [PR](https://github.com/apache/iceberg/pull/11257) -- This is an automated message from the A

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11273: URL: https://github.com/apache/iceberg/pull/11273#discussion_r1792506398 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java: ## @@ -158,6 +163,26 @@ public void filter(Predicate[] predicates) {

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

2024-10-08 Thread via GitHub
aokolnychyi commented on code in PR #11273: URL: https://github.com/apache/iceberg/pull/11273#discussion_r1792479823 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -383,4 +383,8 @@ private TableProperties() {} public static final int ENCRYPTION_DEK_LENGT

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-08 Thread via GitHub
mrcnc commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1792289958 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agr

Re: [PR] Drop support for Python 3.8 [iceberg-python]

2024-10-08 Thread via GitHub
raulcd commented on PR #1221: URL: https://github.com/apache/iceberg-python/pull/1221#issuecomment-2400383106 > Thank you @raulcd for the contribution! Thanks @kevinjqliu @sungwy @HonahX for approving the PR! I plan to keep contributing :) -- This is an automated message from the

Re: [PR] Arrow: Fix indexing in Parquet dictionary encoded values readers [iceberg]

2024-10-08 Thread via GitHub
wypoon commented on code in PR #11247: URL: https://github.com/apache/iceberg/pull/11247#discussion_r1792370324 ## spark/v3.5/spark/src/test/resources/decimal_dict_and_plain_encoding.parquet: ## Review Comment: @nastra the code in `iceberg-parquet` still has to use the `par

[I] [Docs] Update Examples to Replace Hadoop Catalog with JDBC Catalog [iceberg]

2024-10-08 Thread via GitHub
kevinjqliu opened a new issue, #11284: URL: https://github.com/apache/iceberg/issues/11284 ### Feature Request / Improvement The current documentation includes examples of using the Hadoop catalog with Iceberg, such as: * https://iceberg.apache.org/spark-quickstart/#adding-a-catalo

[PR] Update Examples to Replace Hadoop Catalog with JDBC Catalog [iceberg]

2024-10-08 Thread via GitHub
kevinjqliu opened a new pull request, #11285: URL: https://github.com/apache/iceberg/pull/11285 Closes #11284 This PR changes examples of using the Hadoop catalog with the JDBC catalog -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] iceberg reports an error after upgrading to 1.4.2 [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #9018: URL: https://github.com/apache/iceberg/issues/9018#issuecomment-2401027215 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Core: Implement equals/hashCode method for RESTResponse [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed pull request #9049: Core: Implement equals/hashCode method for RESTResponse URL: https://github.com/apache/iceberg/pull/9049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Core: Implement equals/hashCode method for RESTResponse [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on PR #9049: URL: https://github.com/apache/iceberg/pull/9049#issuecomment-2401027296 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed pull request #9050: Spark: Add serialzable isolation test for concurrent MERGE INTOs URL: https://github.com/apache/iceberg/pull/9050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Cannot create a V1 table with `CREATE OR REPLACE TABLE` [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8756: Cannot create a V1 table with `CREATE OR REPLACE TABLE` URL: https://github.com/apache/iceberg/issues/8756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] How is iceberg compatible with hive's tez engine [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8757: URL: https://github.com/apache/iceberg/issues/8757#issuecomment-2401026687 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Flaky test/env TestFlinkParquetReader, TestFlinkParquetWriter, TestIcebergSourceBoundedSql [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8761: URL: https://github.com/apache/iceberg/issues/8761#issuecomment-2401026720 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] is there anyway to rewrite onto a specific branch? [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8762: is there anyway to rewrite onto a specific branch? URL: https://github.com/apache/iceberg/issues/8762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] is there anyway to rewrite onto a specific branch? [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8762: URL: https://github.com/apache/iceberg/issues/8762#issuecomment-2401026750 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Parquet.write to S3 with GlueCatalog requires commit [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8767: Parquet.write to S3 with GlueCatalog requires commit URL: https://github.com/apache/iceberg/issues/8767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] How can I quickly insert data into an iceberg table in a Python environment? [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8801: URL: https://github.com/apache/iceberg/issues/8801#issuecomment-2401026815 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] How can I quickly insert data into an iceberg table in a Python environment? [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8801: How can I quickly insert data into an iceberg table in a Python environment? URL: https://github.com/apache/iceberg/issues/8801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] manifest lost [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8806: manifest lost URL: https://github.com/apache/iceberg/issues/8806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Distributed execution of DeleteReachableFilesSparkAction [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8862: URL: https://github.com/apache/iceberg/issues/8862#issuecomment-2401026987 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] support meta column query on staged scan [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #8866: support meta column query on staged scan URL: https://github.com/apache/iceberg/issues/8866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] support meta column query on staged scan [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8866: URL: https://github.com/apache/iceberg/issues/8866#issuecomment-2401027033 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Pyiceberg support the query without provided snapshot_id [iceberg-python]

2024-10-08 Thread via GitHub
github-actions[bot] closed issue #553: Pyiceberg support the query without provided snapshot_id URL: https://github.com/apache/iceberg-python/issues/553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Implementation does not write `schema-id` into Manifest Avro headers [iceberg]

2024-10-08 Thread via GitHub
github-actions[bot] commented on issue #8745: URL: https://github.com/apache/iceberg/issues/8745#issuecomment-2401026587 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-08 Thread via GitHub
edgarRd commented on code in PR #11258: URL: https://github.com/apache/iceberg/pull/11258#discussion_r1792071351 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java: ## @@ -185,9 +190,17 @@ public List splitOffsets() { return null; } + private long