[PR] Doc: Add DELETE ORPHAN-FILES example [iceberg]

2024-12-31 Thread via GitHub
ebyhr opened a new pull request, #11896: URL: https://github.com/apache/iceberg/pull/11896 Relates to https://github.com/apache/hive/pull/4897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900339989 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_INCL

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900328737 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_I

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900328737 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_I

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900327762 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_INCL

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900327030 ## cmake_modules/ThirdpartyToolchain.cmake: ## @@ -0,0 +1,142 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900325544 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_I

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
flaneur2020 commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900324950 ## crates/iceberg/src/metadata_scan.rs: ## @@ -50,6 +52,13 @@ impl MetadataTable { } } +/// Get the manifests table. +pub fn manifests(&s

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
flaneur2020 commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900324416 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +137,135 @@ impl<'a> SnapshotsTable<'a> { } } +/// Manifests table. +pub struct ManifestsTable<'a> {

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
flaneur2020 commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900324391 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +137,135 @@ impl<'a> SnapshotsTable<'a> { } } +/// Manifests table. +pub struct ManifestsTable<'a> {

Re: [I] How to apply partition/bloom filter to old data? Does rewrite_data_files/rewrite_manifests procedure work? [iceberg]

2024-12-31 Thread via GitHub
hashmapybx commented on issue #11878: URL: https://github.com/apache/iceberg/issues/11878#issuecomment-2566842621 by the way, ALTER TABLE prod.db.sample SET TBLPROPERTIES . Do you meet any other problems? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
xxchan commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900309436 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +137,135 @@ impl<'a> SnapshotsTable<'a> { } } +/// Manifests table. +pub struct ManifestsTable<'a> { +

Re: [PR] Remove unneeded metadata read during update event generation [iceberg]

2024-12-31 Thread via GitHub
amogh-jahagirdar commented on code in PR #11829: URL: https://github.com/apache/iceberg/pull/11829#discussion_r1900307975 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -157,12 +158,16 @@ public List apply(TableMetadata base, Snapshot snapshot) { } @Ove

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
xxchan commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900309256 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +137,135 @@ impl<'a> SnapshotsTable<'a> { } } +/// Manifests table. +pub struct ManifestsTable<'a> { +

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900309004 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_INCL

Re: [PR] Core: Add list/map block sizes [iceberg]

2024-12-31 Thread via GitHub
rustyconover commented on PR #10973: URL: https://github.com/apache/iceberg/pull/10973#issuecomment-2566807700 Seems like its still pending. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] [SUPPORT] Support setting the maximum number of partitions for a table [iceberg]

2024-12-31 Thread via GitHub
melin closed issue #10628: [SUPPORT] Support setting the maximum number of partitions for a table URL: https://github.com/apache/iceberg/issues/10628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] `parquet_path_to_id_mapping` generates incorrect path for List types [iceberg-python]

2024-12-31 Thread via GitHub
github-actions[bot] closed issue #716: `parquet_path_to_id_mapping` generates incorrect path for List types URL: https://github.com/apache/iceberg-python/issues/716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] `parquet_path_to_id_mapping` generates incorrect path for List types [iceberg-python]

2024-12-31 Thread via GitHub
github-actions[bot] commented on issue #716: URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2566767375 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] Crash when writing map type with unsigned types [iceberg-python]

2024-12-31 Thread via GitHub
github-actions[bot] commented on issue #837: URL: https://github.com/apache/iceberg-python/issues/837#issuecomment-2566767364 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] Crash when writing map type with unsigned types [iceberg-python]

2024-12-31 Thread via GitHub
github-actions[bot] closed issue #837: Crash when writing map type with unsigned types URL: https://github.com/apache/iceberg-python/issues/837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Core: Expose `added_rows_count`, `existing_rows_count` and `deleted_rows_count` fields in all_manifests and manifests tables [iceberg]

2024-12-31 Thread via GitHub
github-actions[bot] commented on PR #11679: URL: https://github.com/apache/iceberg/pull/11679#issuecomment-2566766304 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Core: Add list/map block sizes [iceberg]

2024-12-31 Thread via GitHub
github-actions[bot] commented on PR #10973: URL: https://github.com/apache/iceberg/pull/10973#issuecomment-2566766280 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] [SUPPORT] Support setting the maximum number of partitions for a table [iceberg]

2024-12-31 Thread via GitHub
github-actions[bot] commented on issue #10628: URL: https://github.com/apache/iceberg/issues/10628#issuecomment-2566766271 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Custom s3 endpoint: Unable to execute HTTP request: Remote host terminated the handshake [iceberg]

2024-12-31 Thread via GitHub
github-actions[bot] closed issue #10490: Custom s3 endpoint: Unable to execute HTTP request: Remote host terminated the handshake URL: https://github.com/apache/iceberg/issues/10490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Custom s3 endpoint: Unable to execute HTTP request: Remote host terminated the handshake [iceberg]

2024-12-31 Thread via GitHub
github-actions[bot] commented on issue #10490: URL: https://github.com/apache/iceberg/issues/10490#issuecomment-2566766228 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

[PR] Bump pyparsing from 3.2.0 to 3.2.1 [iceberg-python]

2024-12-31 Thread via GitHub
dependabot[bot] opened a new pull request, #1481: URL: https://github.com/apache/iceberg-python/pull/1481 Bumps [pyparsing](https://github.com/pyparsing/pyparsing) from 3.2.0 to 3.2.1. Changelog Sourced from https://github.com/pyparsing/pyparsing/blob/master/CHANGES";>pyparsing's

Re: [PR] Count rows as a metadata only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1900228680 ## pyiceberg/table/__init__.py: ## @@ -1594,6 +1609,29 @@ def to_ray(self) -> ray.data.dataset.Dataset: return ray.data.from_arrow(self.to_arrow

Re: [PR] Count rows as a metadata only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1900228680 ## pyiceberg/table/__init__.py: ## @@ -1594,6 +1609,29 @@ def to_ray(self) -> ray.data.dataset.Dataset: return ray.data.from_arrow(self.to_arrow

Re: [PR] Count rows as a metadata only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1900228680 ## pyiceberg/table/__init__.py: ## @@ -1594,6 +1609,29 @@ def to_ray(self) -> ray.data.dataset.Dataset: return ray.data.from_arrow(self.to_arrow

Re: [PR] Count rows as a metadata only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1900228680 ## pyiceberg/table/__init__.py: ## @@ -1594,6 +1609,29 @@ def to_ray(self) -> ray.data.dataset.Dataset: return ray.data.from_arrow(self.to_arrow

Re: [I] FileIO S3: Add support for Assume-Role-Arn and other AWS Client properties [iceberg-rust]

2024-12-31 Thread via GitHub
charlesdong1991 commented on issue #527: URL: https://github.com/apache/iceberg-rust/issues/527#issuecomment-2566649287 Hi, I am new to the project, if nobody yet picks it up, can I give it a try to get to know the code base better? -- This is an automated message from the Apache Git Serv

Re: [I] Fields are out of order in equality delete files if equality fields are not together [iceberg]

2024-12-31 Thread via GitHub
singhpk234 commented on issue #11891: URL: https://github.com/apache/iceberg/issues/11891#issuecomment-2566633478 > But this equality delete file is out of order and this record and still be read in iceberg table Equality delete file written had ptr as **111** instead of **202412130**

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2024-12-31 Thread via GitHub
gli-chris-hao commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2566622940 We have the same use case and concerns about loading too much data into memory for counting, the way I'm doing it to use `DataScan.to_arrow_batch_reader`, and then coun

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
Xuanwo commented on code in PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#discussion_r1900169821 ## crates/iceberg/src/metadata_scan.rs: ## @@ -50,6 +52,13 @@ impl MetadataTable { } } +/// Get the manifests table. +pub fn manifests(&self)

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900166523 ## cmake_modules/BuildUtils.cmake: ## @@ -182,13 +183,7 @@ function(ADD_ICEBERG_LIB LIB_NAME) target_include_directories(${LIB_NAME}_static PRIVATE ${ARG_PRIVATE_I

[PR] Updated Readme file to reflect Implemented operations [iceberg-go]

2024-12-31 Thread via GitHub
chil-pavn opened a new pull request, #242: URL: https://github.com/apache/iceberg-go/pull/242 Hey @zeroshade , raised this PR as I found this would be helpful for folks checking the Readme.md file to see the Roadmap. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1900073077 ## README.md: ## @@ -44,9 +61,14 @@ After installing the core libraries, you can build the examples: ```bash cd iceberg-cpp/example -mkdir build && cd build -cmake .. -D

Re: [PR] Gh 1223 metadata only row count [iceberg-python]

2024-12-31 Thread via GitHub
tusharchou closed pull request #1480: Gh 1223 metadata only row count URL: https://github.com/apache/iceberg-python/pull/1480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Gh 1223 metadata only row count [iceberg-python]

2024-12-31 Thread via GitHub
tusharchou opened a new pull request, #1480: URL: https://github.com/apache/iceberg-python/pull/1480 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: Support metadata table "Manifests" [iceberg-rust]

2024-12-31 Thread via GitHub
flaneur2020 commented on PR #861: URL: https://github.com/apache/iceberg-rust/pull/861#issuecomment-2566334491 @Xuanwo merged the main branch, PTAL 🫑 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] ParallelIterable: Queue Size w/ O(1) [iceberg]

2024-12-31 Thread via GitHub
shanielh opened a new pull request, #11895: URL: https://github.com/apache/iceberg/pull/11895 Instead of using ConcurrentLinkedQueue.size() which runs over the Linked Queue in order to get the size of the queue, manage an AtomicInteger with the size of the queue. ConcurrentLinke

[I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

2024-12-31 Thread via GitHub
sanchay0 opened a new issue, #11894: URL: https://github.com/apache/iceberg/issues/11894 ### Query engine Flink ### Question I am running a Flink job that reads data from Kafka, processes it into a Flink [Row object](https://nightlies.apache.org/flink/flink-docs-master/

Re: [I] [Question] Why does plan_files not seem to get multi-threading improvement [iceberg-python]

2024-12-31 Thread via GitHub
gitzwz commented on issue #1479: URL: https://github.com/apache/iceberg-python/issues/1479#issuecomment-2566291942 Here is my test code: ` from pyiceberg.catalog import load_catalog from pyspark.sql import SparkSession from pyiceberg import expressions as pyi_expr import t

[I] [Question] Why does plan_files not seem to get multi-threading improvement [iceberg-python]

2024-12-31 Thread via GitHub
gitzwz opened a new issue, #1479: URL: https://github.com/apache/iceberg-python/issues/1479 ### Question I encountered a problem with table.scan.plan_files() where there is no noticeable time difference between single-threaded and multi-threaded execution. The total time is directly

Re: [PR] Add GitHub cpp-linter-action [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on PR #20: URL: https://github.com/apache/iceberg-cpp/pull/20#issuecomment-2566233077 > pre-commit only runs clang-format, not clang-tidy - I think it'd still be useful even without the PR comment? That makes sense! Let me enable it for now. We can improve or discard

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2024-12-31 Thread via GitHub
wgtmac commented on PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#issuecomment-2566218953 Thanks @kou for your suggestion! Now the CMake implementation is greatly simplified. FYI, the installed directory looks like below: ``` β”œβ”€β”€ include/ β”‚ β”œβ”€β”€ iceberg/ β”‚ β”‚ β”œβ”€β”€ p