Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853417638 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853417638 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853420085 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853417638 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Document procedure for stats collection [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1853410841 ## docs/docs/spark-procedures.md: ## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-||-

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-21 Thread via GitHub
nastra commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1853408518 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @return this for

Re: [PR] Spark: Add view support to SparkSessionCatalog [iceberg]

2024-11-21 Thread via GitHub
nastra commented on code in PR #11388: URL: https://github.com/apache/iceberg/pull/11388#discussion_r1853401637 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -35,7 +35,9 @@ abstract class BaseCatalog ProcedureCatalog, Suppor

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
pvary commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853353750 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java: ## @@ -386,6 +386,12 @@ public void testHiveTableAndIcebergTableWithSameName(TableType tabl

Re: [PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-21 Thread via GitHub
binayakd commented on code in PR #1354: URL: https://github.com/apache/iceberg-python/pull/1354#discussion_r1853343720 ## tests/io/test_pyarrow_stats.py: ## @@ -681,6 +681,73 @@ def test_stats_types(table_schema_nested: Schema) -> None: ] +def construct_test_table_witho

Re: [PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-21 Thread via GitHub
binayakd commented on PR #1354: URL: https://github.com/apache/iceberg-python/pull/1354#issuecomment-2492978374 @kevinjqliu, @Fokko, pushed the linting fix, and also updated the test based on the suggestions. Thank you! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-21 Thread via GitHub
binayakd commented on code in PR #1354: URL: https://github.com/apache/iceberg-python/pull/1354#discussion_r1853355886 ## tests/io/test_pyarrow_stats.py: ## @@ -681,6 +681,73 @@ def test_stats_types(table_schema_nested: Schema) -> None: ] +def construct_test_table_witho

Re: [PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-21 Thread via GitHub
binayakd commented on code in PR #1354: URL: https://github.com/apache/iceberg-python/pull/1354#discussion_r1853344003 ## tests/io/test_pyarrow_stats.py: ## @@ -681,6 +681,73 @@ def test_stats_types(table_schema_nested: Schema) -> None: ] +def construct_test_table_witho

Re: [I] User ID information in Iceberg Table's snapshot [iceberg]

2024-11-21 Thread via GitHub
ArijitSinghEDA closed issue #11474: User ID information in Iceberg Table's snapshot URL: https://github.com/apache/iceberg/issues/11474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Document procedure for stats collection [iceberg]

2024-11-21 Thread via GitHub
manuzhang commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1853228696 ## docs/docs/spark-procedures.md: ## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-||-

Re: [I] java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: Required field 'num_values' was not found in serialized data [iceberg]

2024-11-21 Thread via GitHub
wardlican commented on issue #11614: URL: https://github.com/apache/iceberg/issues/11614#issuecomment-2492781743 > ``` > Required field num_values was not found in serialized data! > ``` > > What's the column of num_values ? https://github.com/user-attachments/assets/c7bf

Re: [I] java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: Required field 'num_values' was not found in serialized data [iceberg]

2024-11-21 Thread via GitHub
wardlican commented on issue #11614: URL: https://github.com/apache/iceberg/issues/11614#issuecomment-2492780298 > Thanks @wardlican for raising this. Do you happen to know which system produced the Parquet files (Spark, Arrow, etc)? We are using spark_catalog.system.rewrite_data_file

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853216938 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

2024-11-21 Thread via GitHub
pan3793 commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1853194071 ## docs/docs/spark-configuration.md: ## @@ -167,16 +171,20 @@ spark.read ### Write options -Spark write options are passed when configuring the DataFrameWriter, li

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-21 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1853193408 ## data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853192695 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

2024-11-21 Thread via GitHub
pan3793 commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1853187626 ## docs/docs/spark-configuration.md: ## @@ -154,6 +154,10 @@ spark.read .table("catalog.db.table") ``` +Iceberg 1.8.0 and later support setting read options by

Re: [PR] API, Core: Add scan planning apis to REST Catalog [iceberg]

2024-11-21 Thread via GitHub
rahil-c commented on code in PR #11180: URL: https://github.com/apache/iceberg/pull/11180#discussion_r1852995146 ## core/src/main/java/org/apache/iceberg/rest/requests/FetchScanTasksRequest.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853169535 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,28 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853156033 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,26 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853138562 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,26 @@ private void validateTableIsIcebergTableOrView( } } +

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853134029 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java: ## @@ -386,6 +386,12 @@ public void testHiveTableAndIcebergTableWithSameName(TableTyp

Re: [PR] Kafka Connect: Add config to route to tables using topic name [iceberg]

2024-11-21 Thread via GitHub
mun1r0b0t commented on PR #11313: URL: https://github.com/apache/iceberg/pull/11313#issuecomment-2492647778 I opened up https://github.com/apache/iceberg/issues/11163 since I did not hear back on this. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853129053 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,26 @@ private void validateTableIsIcebergTableOrView( } } +

[PR] Kafka Connect: Add mechanisms for routing records by topic name [iceberg]

2024-11-21 Thread via GitHub
mun1r0b0t opened a new pull request, #11623: URL: https://github.com/apache/iceberg/pull/11623 Add 2 new routing mechanisms for records that use Kafka topic name for routing and update configuration for how to route records. The changes move the routing logic to a separate class with

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1853121710 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,26 @@ private void validateTableIsIcebergTableOrView( } } + @

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#issuecomment-2492615027 verified that CI breaks with bad doc changes https://github.com/apache/iceberg-python/actions/runs/11963956201/job/33355402732?pr=1360 -- This is an automated message from the

Re: [PR] (AWS) Docs: List all AWS S3 properties from all language impl. [iceberg]

2024-11-21 Thread via GitHub
github-actions[bot] commented on PR #11321: URL: https://github.com/apache/iceberg/pull/11321#issuecomment-2492612691 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] [draft] HADOOP-18679. Add API for bulk/paged object deletion: Iceberg PoC [iceberg]

2024-11-21 Thread via GitHub
github-actions[bot] commented on PR #10233: URL: https://github.com/apache/iceberg/pull/10233#issuecomment-2492612372 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1853077549 ## core/src/main/java/org/apache/iceberg/UpdateRequirements.java: ## @@ -173,6 +175,26 @@ private void update(MetadataUpdate.SetDefaultSortOrder unused) {

Re: [PR] Support convert orc timestamptz [iceberg]

2024-11-21 Thread via GitHub
github-actions[bot] commented on PR #9905: URL: https://github.com/apache/iceberg/pull/9905#issuecomment-2492612301 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-11-21 Thread via GitHub
danielcweeks merged PR #11093: URL: https://github.com/apache/iceberg/pull/11093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#discussion_r1853108710 ## .github/workflows/python-ci.yml: ## @@ -49,5 +49,7 @@ jobs: run: make install-dependencies - name: Linters run: make lint +- name: C

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#discussion_r1853107082 ## .github/workflows/python-ci.yml: ## @@ -49,5 +49,7 @@ jobs: run: make install-dependencies - name: Linters run: make lint +- name: C

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#discussion_r1853106916 ## pyproject.toml: ## @@ -95,6 +95,17 @@ pyspark = "3.5.3" cython = "3.0.11" deptry = ">=0.14,<0.22" docutils = "!=0.21.post1" # https://github.com/pyth

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#discussion_r1853105652 ## .github/workflows/python-ci-docs.yml: ## Review Comment: this will now run on every PR when there's a change in `mkdocs/docs/` ## .github/w

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1853082951 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1853077549 ## core/src/main/java/org/apache/iceberg/UpdateRequirements.java: ## @@ -173,6 +175,26 @@ private void update(MetadataUpdate.SetDefaultSortOrder unused) {

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1853077549 ## core/src/main/java/org/apache/iceberg/UpdateRequirements.java: ## @@ -173,6 +175,26 @@ private void update(MetadataUpdate.SetDefaultSortOrder unused) {

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-21 Thread via GitHub
karuppayya commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2492535997 Another minor behavioral change is , earlier if the user had access to both HMS table and storage, the table exists would pass. With the change, tableExists would pass with only ac

Re: [PR] Remove Python 3.13 upper bound restriction [iceberg-python]

2024-11-21 Thread via GitHub
bigluck commented on PR #1355: URL: https://github.com/apache/iceberg-python/pull/1355#issuecomment-2492493406 Ok, I know what's going on. ```bash $ make install Poetry is already installed. poetry install --all-extras Installing dependencies from lock file ...

[PR] Bump getdaft from 0.3.13 to 0.3.14 [iceberg-python]

2024-11-21 Thread via GitHub
dependabot[bot] opened a new pull request, #1361: URL: https://github.com/apache/iceberg-python/pull/1361 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.3.13 to 0.3.14. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] Improve documentation for "how to release" [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #1359: URL: https://github.com/apache/iceberg-python/pull/1359#discussion_r1853009964 ## mkdocs/docs/how-to-release.md: ## @@ -128,21 +144,51 @@ svn add $SVN_TMP_DIR_VERSIONED svn ci -m "PyIceberg ${VERSION}" ${SVN_TMP_DIR_VERSIONED} ``` -#

Re: [PR] API, Core: Add scan planning apis to REST Catalog [iceberg]

2024-11-21 Thread via GitHub
rahil-c commented on code in PR #11180: URL: https://github.com/apache/iceberg/pull/11180#discussion_r1852987342 ## core/src/test/java/org/apache/iceberg/rest/requests/TestFetchScanTasksRequest.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Allow leading underscore in column name used in row filter [iceberg-python]

2024-11-21 Thread via GitHub
vincenzon commented on PR #1358: URL: https://github.com/apache/iceberg-python/pull/1358#issuecomment-2492448992 I added a test. I didn't change the quote character to backtick though I think it should be done. Let me know if you'd like that change in this PR or a separate one or something

Re: [PR] Allow leading underscore in column name used in row filter [iceberg-python]

2024-11-21 Thread via GitHub
Fokko commented on PR #1358: URL: https://github.com/apache/iceberg-python/pull/1358#issuecomment-2492404898 @vincenzon Thanks for fixing this. Should we add a test to ensure that we don't break this in the future? :) -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-11-21 Thread via GitHub
ZachDischner commented on PR #9731: URL: https://github.com/apache/iceberg/pull/9731#issuecomment-2492428364 I'm going to work on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-21 Thread via GitHub
rahil-c commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1852965609 ## api/src/main/java/org/apache/iceberg/exceptions/EntityNotFoundException.java: ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-21 Thread via GitHub
rahil-c commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1852964353 ## core/src/main/java/org/apache/iceberg/rest/responses/FetchPlanningResultResponse.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] Materialized View Spec [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1852962681 ## format/view-spec.md: ## @@ -42,12 +42,27 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata fil

Re: [PR] Improve documentation for "how to release" [iceberg-python]

2024-11-21 Thread via GitHub
Fokko commented on code in PR #1359: URL: https://github.com/apache/iceberg-python/pull/1359#discussion_r1852944966 ## mkdocs/docs/how-to-release.md: ## @@ -128,21 +144,51 @@ svn add $SVN_TMP_DIR_VERSIONED svn ci -m "PyIceberg ${VERSION}" ${SVN_TMP_DIR_VERSIONED} ``` -### Up

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1852955315 ## docs/docs/spark-configuration.md: ## @@ -154,6 +154,10 @@ spark.read .table("catalog.db.table") ``` +Iceberg 1.8.0 and later support setting read options b

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

2024-11-21 Thread via GitHub
szehon-ho commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1852951219 ## docs/docs/spark-configuration.md: ## @@ -154,43 +154,51 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default | Descripti

Re: [PR] Improve documentation for "how to release" [iceberg-python]

2024-11-21 Thread via GitHub
Fokko commented on code in PR #1359: URL: https://github.com/apache/iceberg-python/pull/1359#discussion_r1852949430 ## mkdocs/docs/how-to-release.md: ## @@ -205,36 +258,46 @@ The release candidate has been accepted as PyIceberg . Thanks everyone, Kind regards, ``` -### Copy

Re: [PR] Improve documentation for "how to release" [iceberg-python]

2024-11-21 Thread via GitHub
Fokko commented on code in PR #1359: URL: https://github.com/apache/iceberg-python/pull/1359#discussion_r1852939525 ## mkdocs/docs/how-to-release.md: ## @@ -74,48 +89,49 @@ export VERSION_BRANCH=${VERSION_WITHOUT_RC//./-} export GIT_TAG=pyiceberg-${VERSION} git tag -s ${GIT_

Re: [PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
Fokko commented on code in PR #1360: URL: https://github.com/apache/iceberg-python/pull/1360#discussion_r1852923680 ## pyproject.toml: ## @@ -95,6 +95,17 @@ pyspark = "3.5.3" cython = "3.0.11" deptry = ">=0.14,<0.22" docutils = "!=0.21.post1" # https://github.com/python-po

Re: [PR] 1.7.x apply PR #11220 [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on PR #11622: URL: https://github.com/apache/iceberg/pull/11622#issuecomment-2492369726 Thanks @bryanck for the cherry pick and @Fokko for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] 1.7.x apply PR #11220 [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer merged PR #11622: URL: https://github.com/apache/iceberg/pull/11622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Improve documentation for "how to release" [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on PR #1359: URL: https://github.com/apache/iceberg-python/pull/1359#issuecomment-2492262104 Lets run this for 0.8.1 patch release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Move mkdocs action/workflow into `docs` group [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on issue #923: URL: https://github.com/apache/iceberg-python/issues/923#issuecomment-2492241171 @jayceslesar yes, please! I just ran into this problem recently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] check mkdocs build strict in CI [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu opened a new pull request, #1360: URL: https://github.com/apache/iceberg-python/pull/1360 This PR adds a CI check for `mkdocs build --strict`. When releasing docs, the `python-ci-docs.yml` GitHub Action uses `--strict` https://github.com/apache/iceberg-python/blob/7a8369533

Re: [PR] Spec: add variant type [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1852748196 ## format/spec.md: ## @@ -1110,6 +1125,7 @@ Maps with non-string keys must use an array representation with the `map` logica |**`struct`**|`record`|| |**`list`**|`a

Re: [PR] Revert "Core: Update TableMetadataParser to ensure all streams closed (#11220)" [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar merged PR #11621: URL: https://github.com/apache/iceberg/pull/11621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Revert "Core: Update TableMetadataParser to ensure all streams closed (#11220)" [iceberg]

2024-11-21 Thread via GitHub
amogh-jahagirdar commented on PR #11621: URL: https://github.com/apache/iceberg/pull/11621#issuecomment-2492166856 Thanks @hussein-awala , agree with just reverting this first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Spec: add variant type [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1852746749 ## format/spec.md: ## @@ -444,7 +459,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

Re: [PR] Spec: add variant type [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1852745191 ## format/spec.md: ## @@ -178,6 +178,21 @@ A **`list`** is a collection of values with some element type. The element field A **`map`** is a collection of key-valu

Re: [PR] Spec: add variant type [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1852745826 ## format/spec.md: ## @@ -444,7 +459,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

Re: [PR] Introduce `assign_fresh_ids` flag and allow skipping fresh assignment of IDs on Table creation [iceberg-python]

2024-11-21 Thread via GitHub
sungwy commented on code in PR #1304: URL: https://github.com/apache/iceberg-python/pull/1304#discussion_r1852726772 ## pyiceberg/catalog/__init__.py: ## @@ -754,9 +760,7 @@ def _load_file_io(self, properties: Properties = EMPTY_DICT, location: Optional[ return load_fi

Re: [PR] Spark : Derive Stats From Manifest on the Fly [iceberg]

2024-11-21 Thread via GitHub
saitharun15 commented on PR #11615: URL: https://github.com/apache/iceberg/pull/11615#issuecomment-2492078503 @RussellSpitzer, thanks for the review comments,I will address them soon. As per @huaxingao implementation [here](https://github.com/apache/iceberg/blob/90be5d7360bc7ff274e7d00cb725

Re: [PR] Core,Open-API: Don't expose the `last-column-id` [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #11514: URL: https://github.com/apache/iceberg/pull/11514#discussion_r1852719762 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -1081,8 +1094,18 @@ public Builder setCurrentSchema(int schemaId) { return this; } +

Re: [PR] Core,Open-API: Don't expose the `last-column-id` [iceberg]

2024-11-21 Thread via GitHub
rdblue commented on code in PR #11514: URL: https://github.com/apache/iceberg/pull/11514#discussion_r1852718855 ## core/src/main/java/org/apache/iceberg/MetadataUpdateParser.java: ## @@ -462,6 +465,8 @@ private static MetadataUpdate readAddSchema(JsonNode node) { Schema sch

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1852570677 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more con

Re: [I] java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: Required field 'num_values' was not found in serialized data [iceberg]

2024-11-21 Thread via GitHub
Fokko commented on issue #11614: URL: https://github.com/apache/iceberg/issues/11614#issuecomment-2491987466 Thanks @wardlican for raising this. Do you happen to know which system produced the Parquet files (Spark, Arrow, etc)? -- This is an automated message from the Apache Git Service.

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-11-21 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1852643846 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,20 +87,29 @@ protected static Object[][] parameters() { }

Re: [PR] Core,Format: Deprecate embedded manifests [iceberg]

2024-11-21 Thread via GitHub
flyrain commented on code in PR #11586: URL: https://github.com/apache/iceberg/pull/11586#discussion_r1852683377 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -81,6 +84,9 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio

Re: [PR] Kafka Connect: Fix a bug in streams closing while read or write metadata files [iceberg]

2024-11-21 Thread via GitHub
hussein-awala commented on PR #11609: URL: https://github.com/apache/iceberg/pull/11609#issuecomment-2491995270 > For 1.7.1 I'd prefer to revert it, and follow up with a separate PR with a fix Sounds good, I opened https://github.com/apache/iceberg/pull/11621 to revert the commit.

Re: [I] Delete orphan files [iceberg-python]

2024-11-21 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2491973911 `Orphan File Deletion in Iceberg Tables` Here's a step-by-step breakdown of the logic behind the process: 1. List All Files in Storage 2. Extract Referenced Files from

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-21 Thread via GitHub
c-thiel commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1852590840 ## crates/iceberg/src/transaction.rs: ## @@ -96,6 +109,60 @@ impl<'a> Transaction<'a> { Ok(self) } +fn generate_unique_snapshot_id(&self) -> i64

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-21 Thread via GitHub
c-thiel commented on code in PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#discussion_r1852078486 ## crates/e2e_test/Cargo.toml: ## @@ -0,0 +1,37 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] Kafka Connect: Fix a bug in streams closing while read or write metadata files [iceberg]

2024-11-21 Thread via GitHub
bryanck commented on PR #11609: URL: https://github.com/apache/iceberg/pull/11609#issuecomment-2491959273 I think for 1.7.1 I'd prefer to revert it, and follow up with a separate PR with a fix, but I'm OK either way. -- This is an automated message from the Apache Git Service. To respond

Re: [I] When write.object-storage.enabled=true, it is difficult to gather information for individual partition of partitioned tables [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on issue #11488: URL: https://github.com/apache/iceberg/issues/11488#issuecomment-2491942966 In general you shouldn't be using the pathing information for this, instead you should use the Files or Partitions Metadata tables. This is important because the storage lay

Re: [PR] Spark: remove ROW_POSITION from project schema [iceberg]

2024-11-21 Thread via GitHub
huaxingao commented on PR #11610: URL: https://github.com/apache/iceberg/pull/11610#issuecomment-2491897033 @flyrain Yes, we need the same change in the older version too. Just added the changes. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-11-21 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1852627137 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,20 +87,29 @@ protected static Object[][] parameters() { }

Re: [I] Is iceberg support "Predicate Pushdown" when spark read data from it? [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on issue #11617: URL: https://github.com/apache/iceberg/issues/11617#issuecomment-2491925208 Depends on the query, there are some finicky details there but sometimes (especially in early versions of Iceberg and Spark) predicates don't translate correctly. Ice

Re: [I] Incorrect Deletion of Snapshot Metadata Due to OutOfMemoryError [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer closed issue #11575: Incorrect Deletion of Snapshot Metadata Due to OutOfMemoryError URL: https://github.com/apache/iceberg/issues/11575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Incorrect Deletion of Snapshot Metadata Due to OutOfMemoryError [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on issue #11575: URL: https://github.com/apache/iceberg/issues/11575#issuecomment-2491927652 Fixed in #11576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Kafka Connect: Fix a bug in streams closing while read or write metadata files [iceberg]

2024-11-21 Thread via GitHub
hussein-awala commented on PR #11609: URL: https://github.com/apache/iceberg/pull/11609#issuecomment-2491920362 > +1 to reverting the original change Yes, but there is indeed an edge case that leads to failure, and this fix is ​​a revert + fix for the original problem. Manually closin

Re: [PR] Kafka Connect: Fix a bug in streams closing while read or write metadata files [iceberg]

2024-11-21 Thread via GitHub
hussein-awala commented on code in PR #11609: URL: https://github.com/apache/iceberg/pull/11609#discussion_r1852619296 ## core/src/main/java/org/apache/iceberg/TableMetadataParser.java: ## @@ -122,15 +122,25 @@ public static void write(TableMetadata metadata, OutputFile outputF

Re: [I] Row filter parse exception on column starting with underscore [iceberg-python]

2024-11-21 Thread via GitHub
vincenzon commented on issue #1357: URL: https://github.com/apache/iceberg-python/issues/1357#issuecomment-2491905990 According to this: https://spark.apache.org/docs/latest/sql-ref-identifier.html it is allowed. In fact, the way quoting is handled by pyiceberg is wrong on two levels:

Re: [PR] Add @override [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on PR #1312: URL: https://github.com/apache/iceberg-python/pull/1312#issuecomment-2491902906 what would the utility file look like? currently we use `TYPE_CHECKING` in a bunch of places already https://grep.app/search?q=TYPE_CHECKING&filter[repo.pattern][0]=apach

Re: [PR] REST: AuthManager API [iceberg]

2024-11-21 Thread via GitHub
danielcweeks commented on code in PR #10753: URL: https://github.com/apache/iceberg/pull/10753#discussion_r1852600808 ## core/src/main/java/org/apache/iceberg/rest/auth/AuthManager.java: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Feature: Write to branches [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on PR #941: URL: https://github.com/apache/iceberg-python/pull/941#issuecomment-2491889472 cc @HonahX / @Fokko / @sungwy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Feature: Write to branches [iceberg-python]

2024-11-21 Thread via GitHub
kevinjqliu commented on code in PR #941: URL: https://github.com/apache/iceberg-python/pull/941#discussion_r1852553109 ## tests/table/test_init.py: ## @@ -982,28 +982,43 @@ def test_assert_table_uuid(table_v2: Table) -> None: def test_assert_ref_snapshot_id(table_v2: Table) -

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-21 Thread via GitHub
RussellSpitzer commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1852542188 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more con

Re: [PR] Kafka Connect: Fix a bug in streams closing while read or write metadata files [iceberg]

2024-11-21 Thread via GitHub
bryanck commented on PR #11609: URL: https://github.com/apache/iceberg/pull/11609#issuecomment-2491876055 +1 to reverting the original change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-11-21 Thread via GitHub
nastra commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1852565228 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor

  1   2   3   >