Re: [I] [bug] Cannot perform table scan on V1 table [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on issue #1194: URL: https://github.com/apache/iceberg-python/issues/1194#issuecomment-2570622923 Added a reproducible test in #1483, i had to save the biglake iceberg table locally. please take a look -- This is an automated message from the Apache Git Service. To r

[PR] add reproducible test [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu opened a new pull request, #1483: URL: https://github.com/apache/iceberg-python/pull/1483 Added reproducible test for #1194 Run `poetry run pytest tests/table/test_v1_table.py` Looking at the stack trace, one of the issues is `read_manifest_list` defaulting to use V2 man

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902646277 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902289753 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902646277 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902646277 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] ci: use taiki-e/install-action to install tools from binary [iceberg-rust]

2025-01-03 Thread via GitHub
Xuanwo merged PR #852: URL: https://github.com/apache/iceberg-rust/pull/852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1902525180 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1902490200 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGET

Re: [PR] Implement column projection [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1902357065 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1127,63 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_t

Re: [PR] Implement column projection [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1902352957 ## pyiceberg/io/pyarrow.py: ## @@ -1286,14 +1310,20 @@ def _task_to_record_batches( continue output_batches = ar

Re: [PR] Parquet: Internal writer and reader [iceberg]

2025-01-03 Thread via GitHub
ajantha-bhat closed pull request #11904: Parquet: Internal writer and reader URL: https://github.com/apache/iceberg/pull/11904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902289753 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902289753 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -60,28 +69,63 @@ public static CharSequence referencedDataFile(DeleteFile deleteFile) {

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902287419 ## pyiceberg/io/pyarrow.py: ## @@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: def _initialize_fs(self, scheme: str, net

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1902285780 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1902285780 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1902285780 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902277899 ## pyiceberg/io/pyarrow.py: ## @@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: def _initialize_fs(self, scheme: str, netl

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1902261165 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1902260652 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [PR] Add pyiceberg DataFusion e2e test [iceberg-rust]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #825: URL: https://github.com/apache/iceberg-rust/pull/825#discussion_r1902253197 ## crates/integration_tests/testdata/pyiceberg/provision.py: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contribu

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#issuecomment-2569957824 I ran the tests locally `ARN=arn:aws:s3tables:us-east-2:... poetry run pytest tests/catalog/test_s3tables.py` had to manually add `s3tables.region` to the catalog config ``

Re: [PR] Implement column projection [iceberg-python]

2025-01-03 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1902247982 ## pyiceberg/io/pyarrow.py: ## @@ -1286,14 +1310,20 @@ def _task_to_record_batches( continue output_batches = arr

Re: [PR] Purge RCK test entries in `afterEach` instead of `beforeEach` [iceberg]

2025-01-03 Thread via GitHub
github-actions[bot] commented on PR #11699: URL: https://github.com/apache/iceberg/pull/11699#issuecomment-2569954996 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1902235105 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,318 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +from

Re: [PR] Implement column projection [iceberg-python]

2025-01-03 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1902240839 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1127,63 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_ta

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2025-01-03 Thread via GitHub
RussellSpitzer commented on PR #11781: URL: https://github.com/apache/iceberg/pull/11781#issuecomment-2569914295 From a discussion I had with @sopel39 today; I think we can go forward this solution but I think it will basically re-introduce the memory usage issue that we saw previousl

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902219038 ## mkdocs/docs/configuration.md: ## @@ -102,21 +102,21 @@ For the FileIO there are several configuration options available: -| Key | E

[PR] [Core] Support Truncate(0) for metrics [iceberg]

2025-01-03 Thread via GitHub
KartikKapur opened a new pull request, #11905: URL: https://github.com/apache/iceberg/pull/11905 **Background** At Pinterest, we've started utilizing iceberg metrics considerably for offline validation as well as query speedups. Counts are consistently useful for all columns and upper/lo

Re: [PR] Remove unneeded metadata read during update event generation [iceberg]

2025-01-03 Thread via GitHub
grantatspothero commented on code in PR #11829: URL: https://github.com/apache/iceberg/pull/11829#discussion_r1893174793 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -475,10 +475,14 @@ public void commit() { } } + Object updateEvent(Snapshot c

Re: [PR] Remove unneeded metadata read during update event generation [iceberg]

2025-01-03 Thread via GitHub
grantatspothero commented on code in PR #11829: URL: https://github.com/apache/iceberg/pull/11829#discussion_r1902204257 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -475,10 +475,14 @@ public void commit() { } } + Object updateEvent(Snapshot c

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-03 Thread via GitHub
dramaticlly commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1902174767 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java: ## @@ -0,0 +1,720 @@ +/* + * Licensed to the Apache Software

Re: [PR] Rest catalog integration testing [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1469: URL: https://github.com/apache/iceberg-python/pull/1469#discussion_r1902181377 ## tests/integration/test_rest_catalog.py: ## @@ -16,34 +16,788 @@ # under the License. # pylint:disable=redefined-outer-name + +from typing import Any, D

Re: [PR] Remove deprecation warnings in test [iceberg-python]

2025-01-03 Thread via GitHub
Fokko merged PR #1416: URL: https://github.com/apache/iceberg-python/pull/1416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Remove deprecation warnings in test [iceberg-python]

2025-01-03 Thread via GitHub
Fokko commented on code in PR #1416: URL: https://github.com/apache/iceberg-python/pull/1416#discussion_r1902135055 ## tests/expressions/test_parser.py: ## @@ -70,7 +70,6 @@ def test_equals_false() -> None: def test_is_null() -> None: assert IsNull("foo") == parser.parse("

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-03 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1902124366 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-03 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1902124366 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] Bump Spark 3.5.4 [iceberg]

2025-01-03 Thread via GitHub
Fokko commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2569750175 Thanks @pan3793 for fixing this! And thanks @singhpk234, @huaxingao, @LuciferYang, and @viirya for reviewing! -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Bump Spark 3.5.4 [iceberg]

2025-01-03 Thread via GitHub
Fokko merged PR #11731: URL: https://github.com/apache/iceberg/pull/11731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Revert "Hive: close the fileIO client when closing the hive catalog" [iceberg]

2025-01-03 Thread via GitHub
Fokko commented on PR #11858: URL: https://github.com/apache/iceberg/pull/11858#issuecomment-2569742820 Thanks for reviewing this @hussein-awala, @bryanck and @amogh-jahagirdar. I fully agree with Amoghs' assessment. Let me cherry-pick this to the `1.7.x` branch -- This is an automated m

Re: [PR] Tests: Set PySpark driver host to `localhost` [iceberg-python]

2025-01-03 Thread via GitHub
Fokko commented on PR #1466: URL: https://github.com/apache/iceberg-python/pull/1466#issuecomment-2569740575 I'm a bit torn on this one, it doesn't seem to cause any issues, but I'm inclined to think it is something with your local setup. -- This is an automated message from the Apache Gi

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-03 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1902114140 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] ci: configure codespell in pre-commit [iceberg-python]

2025-01-03 Thread via GitHub
Fokko merged PR #1478: URL: https://github.com/apache/iceberg-python/pull/1478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] [Question] Why does plan_files not seem to get multi-threading improvement [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu closed issue #1479: [Question] Why does plan_files not seem to get multi-threading improvement URL: https://github.com/apache/iceberg-python/issues/1479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [Question] Why does plan_files not seem to get multi-threading improvement [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on issue #1479: URL: https://github.com/apache/iceberg-python/issues/1479#issuecomment-2569711529 sounds good! Lets close this issue and move the discussion to #1229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

2025-01-03 Thread via GitHub
sanchay0 commented on issue #11894: URL: https://github.com/apache/iceberg/issues/11894#issuecomment-2569685261 Thanks @pvary, explaining my job setup will provide helpful context. Setup is straightforward: just reads from Kafka, deserializes it, and writes to Iceberg. The sink has this set

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-03 Thread via GitHub
huaxingao commented on PR #11799: URL: https://github.com/apache/iceberg/pull/11799#issuecomment-2569656539 > Looks like tests are not passing? Somehow the test failed 😟 The test passed on my local machine. I stepped into the test, and it worked as expected. I'm not sure why the te

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-03 Thread via GitHub
zeroshade commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1902041491 ## catalog/rest.go: ## @@ -710,3 +777,54 @@ func (r *RestCatalog) UpdateNamespaceProperties(ctx context.Context, namespace t return doPost[payload, Properti

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-03 Thread via GitHub
zeroshade commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1902040341 ## catalog/rest.go: ## @@ -626,11 +628,76 @@ func (r *RestCatalog) LoadTable(ctx context.Context, identifier table.Identifier } func (r *RestCatalog) DropTable(

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-03 Thread via GitHub
RussellSpitzer commented on PR #11799: URL: https://github.com/apache/iceberg/pull/11799#issuecomment-2569564361 Looks like tests are not passing? ```java TestBloomRowGroupFilter > testStructFieldEq() FAILED org.opentest4j.AssertionFailedError: [Should not read: value outside

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-03 Thread via GitHub
RussellSpitzer commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901996701 ## parquet/src/test/java/org/apache/iceberg/parquet/TestBloomRowGroupFilter.java: ## @@ -193,6 +195,7 @@ public void createInputFile() throws IOException {

Re: [PR] Parquet: Internal writer and reader [iceberg]

2025-01-03 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1901971027 ## parquet/src/test/java/org/apache/iceberg/parquet/TestInternalWriter.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Parquet: Internal writer and reader [iceberg]

2025-01-03 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1901970283 ## parquet/src/main/java/org/apache/iceberg/data/parquet/InternalReader.java: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Spark: Change Delete granularity to file for Spark 3.5 [iceberg]

2025-01-03 Thread via GitHub
amogh-jahagirdar merged PR #11478: URL: https://github.com/apache/iceberg/pull/11478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Spark: Change Delete granularity to file for Spark 3.5 [iceberg]

2025-01-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #11478: URL: https://github.com/apache/iceberg/pull/11478#discussion_r1901966486 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -1301,7 +1305,7 @@ public void testDeleteWithMultiple

[PR] Parquet: Internal writer and reader [iceberg]

2025-01-03 Thread via GitHub
ajantha-bhat opened a new pull request, #11904: URL: https://github.com/apache/iceberg/pull/11904 Splitted into 3 commits, a) Refactor BaseParquetWriter to only keep common functionality required for internal and generic writer. b) Refactor BaseParquetReaders to only keep common f

Re: [I] Streaming read from Iceberg table in S3 cause checkpoint related error [iceberg]

2025-01-03 Thread via GitHub
ismailsimsek commented on issue #3: URL: https://github.com/apache/iceberg/issues/3#issuecomment-2569421286 @singhpk234 is this means iceberg docs should be updated? the examples are heavily using `org.apache.iceberg.aws.s3.S3FileIO` in which cases its correct to use `S3FileIO

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-03 Thread via GitHub
RussellSpitzer commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1901893758 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-03 Thread via GitHub
RussellSpitzer commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1901891884 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time

Re: [PR] Metadata table scans as streams [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #870: URL: https://github.com/apache/iceberg-rust/pull/870#discussion_r1901890393 ## crates/iceberg/src/table.rs: ## @@ -203,7 +203,7 @@ impl Table { /// Creates a metadata table which provides table-like APIs for inspecting metadata. /

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1901880814 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1901880814 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
paleolimbot commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1901860878 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional

[PR] Split metadata tables into separate modules [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv opened a new pull request, #872: URL: https://github.com/apache/iceberg-rust/pull/872 Split metadata tables into separate modules. Context for this is to address https://github.com/apache/iceberg-rust/pull/863#discussion_r1901533450 where the point was made that `metadata_scan.

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901857443 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: I opened https://github.com/apache/iceberg-rust/pull/872 if the above sounds good to you. -- This

Re: [PR] fix(metadata): export iceberg schema in manifests table [iceberg-rust]

2025-01-03 Thread via GitHub
flaneur2020 commented on code in PR #871: URL: https://github.com/apache/iceberg-rust/pull/871#discussion_r1901839327 ## crates/iceberg/src/metadata_scan.rs: ## @@ -183,21 +264,21 @@ impl<'a> ManifestsTable<'a> { let mut existing_delete_files_count = PrimitiveBuilder::

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
paleolimbot commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1901836601 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional

Re: [PR] fix(metadata): export iceberg schema in manifests table [iceberg-rust]

2025-01-03 Thread via GitHub
flaneur2020 commented on code in PR #871: URL: https://github.com/apache/iceberg-rust/pull/871#discussion_r1901836558 ## crates/iceberg/src/metadata_scan.rs: ## @@ -134,44 +137,122 @@ pub struct ManifestsTable<'a> { } impl<'a> ManifestsTable<'a> { -fn partition_summary_f

[PR] fix(metadata): export iceberg schema in manifests table [iceberg-rust]

2025-01-03 Thread via GitHub
flaneur2020 opened a new pull request, #871: URL: https://github.com/apache/iceberg-rust/pull/871 fixes #868 the code is still very ugly, hope i could get some advices on this 😲 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901824596 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGET

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901822370 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

[PR] Metadata table scans as streams [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv opened a new pull request, #870: URL: https://github.com/apache/iceberg-rust/pull/870 This changes the metadata table APIs to have `scan()` return streams instead of a single `RecordBatch`. Context for this is https://github.com/apache/iceberg-rust/pull/863#discussion_r19015456

Re: [PR] Spec: Support geo type [iceberg]

2025-01-03 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1901819194 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

[I] feat: Expose Iceberg table statistics in DataFusion interface(s) [iceberg-rust]

2025-01-03 Thread via GitHub
gruuya opened a new issue, #869: URL: https://github.com/apache/iceberg-rust/issues/869 At present the two key DataFusion interfaces for Iceberg lack statistics information, as they rely on default (i.e. missing/unknown) implementations for `TableProvider::statistics` and `ExecutionPlan::st

Re: [I] Flink Iceberg Writer : To be able to use copy-on-write mode to write the iceberg tables for batch jobs [iceberg]

2025-01-03 Thread via GitHub
pvary commented on issue #11893: URL: https://github.com/apache/iceberg/issues/11893#issuecomment-2569268308 Flink is primarily for streaming use-cases. For streaming updates copy-on-write is not a viable option as the user will easily end up rewriting the whole table for every checkpoint.

Re: [I] feat: Expose Iceberg table statistics in DataFusion interface(s) [iceberg-rust]

2025-01-03 Thread via GitHub
gruuya commented on issue #869: URL: https://github.com/apache/iceberg-rust/issues/869#issuecomment-2569253841 I'd be happy to work on developing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] feat: Expose Iceberg table statistics in DataFusion interface(s) [iceberg-rust]

2025-01-03 Thread via GitHub
Xuanwo commented on issue #869: URL: https://github.com/apache/iceberg-rust/issues/869#issuecomment-2569257673 Thank you a lot for working on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] HiveTableOperations may incorrectly consider a successful commit as failed [iceberg]

2025-01-03 Thread via GitHub
pvary commented on issue #11866: URL: https://github.com/apache/iceberg/issues/11866#issuecomment-2569257248 Answered on #11814. Basically `CommitStateUnknown` exception forces a `checkCommitStatus`, so this seems like a good direction -- This is an automated message from the Apache Git

Re: [I] Table corruption using lock-free Hive commits [iceberg]

2025-01-03 Thread via GitHub
pvary commented on issue #11814: URL: https://github.com/apache/iceberg/issues/11814#issuecomment-2569254811 +1 for @RussellSpitzer's suggestion. We have `CommitStateUnknown` exception for exactly these cases. And as the issue highlighted we can never be sure that an exception is happened b

[I] Support both adjust-to-utc and local-timestamp-micros in Iceberg [iceberg]

2025-01-03 Thread via GitHub
Gezi-lzq opened a new issue, #11903: URL: https://github.com/apache/iceberg/issues/11903 ### Feature Request / Improvement Hi folks, I would like to understand more about the statement "Avro type annotation adjust-to-utc is an Iceberg convention,” given that Avro already has th

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901768298 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-03 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901768298 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745575 ## crates/iceberg/src/metadata_scan.rs: ## @@ -255,8 +345,513 @@ impl<'a> ManifestsTable<'a> { } } +/// Builds the struct describing data files listed in a tab

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745235 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901741027 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: I agree but wasn't sure if that's just me being a Java dev 😄 ## crates/iceberg/src/metadata

Re: [PR] fix: parse var len of decimal for parquet statistic [iceberg-rust]

2025-01-03 Thread via GitHub
Xuanwo merged PR #837: URL: https://github.com/apache/iceberg-rust/pull/837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] fix: parse var len of decimal for parquet statistic [iceberg-rust]

2025-01-03 Thread via GitHub
Xuanwo commented on PR #837: URL: https://github.com/apache/iceberg-rust/pull/837#issuecomment-2569133599 Thank you @ZENOTME for fixing this and thank you @liurenjie1024 for the review, let's merge! -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901726793 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] Doc:Hive 4.0 and later versions allow vectorized read and write opera… [iceberg]

2025-01-03 Thread via GitHub
pvary commented on code in PR #11877: URL: https://github.com/apache/iceberg/pull/11877#discussion_r1901705503 ## docs/docs/hive.md: ## @@ -138,7 +138,7 @@ For example, setting this in the `hive-site.xml` loaded by Spark will enable the by Spark. !!! danger -Starting wi

Re: [I] Manifests table scan should return iceberg schema rather arrow schema [iceberg-rust]

2025-01-03 Thread via GitHub
flaneur2020 commented on issue #868: URL: https://github.com/apache/iceberg-rust/issues/868#issuecomment-2569087910 :+1: let me fix this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Auth Manager API part 3: OAuth2 Manager [iceberg]

2025-01-03 Thread via GitHub
johnnysohn commented on code in PR #11844: URL: https://github.com/apache/iceberg/pull/11844#discussion_r1901683532 ## core/src/main/java/org/apache/iceberg/rest/auth/AuthManagers.java: ## @@ -42,6 +57,9 @@ public static AuthManager loadAuthManager(String name, Map prope

Re: [PR] ParallelIterable: Queue Size w/ O(1) [iceberg]

2025-01-03 Thread via GitHub
shanielh commented on PR #11895: URL: https://github.com/apache/iceberg/pull/11895#issuecomment-2568987952 > I wonder if this is as important if we switch ParallelIterable to use the implementation suggested here https://github.com/apache/iceberg/issues/11768 which limits the queue depth si

Re: [PR] Introduce `MissingRequiredFilesToDeleteException` for Streaming Deletes [iceberg]

2025-01-03 Thread via GitHub
shanielh commented on PR #11887: URL: https://github.com/apache/iceberg/pull/11887#issuecomment-2568980615 > I think generally we wouldn't want to introduce new API concepts unless there is some usage of that API within the core library itself (Otherwise we are basically just opening up tec

Re: [PR] ci: use taiki-e/install-action to install tools from binary [iceberg-rust]

2025-01-03 Thread via GitHub
xxchan commented on code in PR #852: URL: https://github.com/apache/iceberg-rust/pull/852#discussion_r1901630538 ## .github/workflows/ci.yml: ## @@ -61,9 +62,17 @@ jobs: - name: Cargo clippy run: make check-clippy + - name: Install cargo-sort +uses

Re: [PR] Doc:Hive 4.0 and later versions allow vectorized read and write opera… [iceberg]

2025-01-03 Thread via GitHub
BsoBird commented on code in PR #11877: URL: https://github.com/apache/iceberg/pull/11877#discussion_r1901628786 ## docs/docs/hive.md: ## @@ -138,7 +138,7 @@ For example, setting this in the `hive-site.xml` loaded by Spark will enable the by Spark. !!! danger -Starting

Re: [I] Manifests table scan should return iceberg schema rather arrow schema [iceberg-rust]

2025-01-03 Thread via GitHub
Xuanwo commented on issue #868: URL: https://github.com/apache/iceberg-rust/issues/868#issuecomment-2568959304 Makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] ci: use taiki-e/install-action to install tools from binary [iceberg-rust]

2025-01-03 Thread via GitHub
xxchan commented on code in PR #852: URL: https://github.com/apache/iceberg-rust/pull/852#discussion_r1901620567 ## .github/workflows/ci.yml: ## @@ -46,11 +46,12 @@ jobs: - name: Check License Header uses: apache/skywalking-eyes/header@v0.6.0 - - name: Ins

Re: [PR] ci: use taiki-e/install-action to install tools from binary [iceberg-rust]

2025-01-03 Thread via GitHub
xxchan commented on PR #852: URL: https://github.com/apache/iceberg-rust/pull/852#issuecomment-2568954244 INFRA has approved the action -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2025-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1901611620 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +128,133 @@ impl<'a> SnapshotsTable<'a> { } } +/// Manifests table. +pub struct ManifestsTable<'a>

  1   2   >