Re: [I] HiveTableOperations may incorrectly consider a successful commit as failed [iceberg]

2025-01-02 Thread via GitHub
sauliusvl commented on issue #11866: URL: https://github.com/apache/iceberg/issues/11866#issuecomment-2568810397 Looks like a duplicate of https://github.com/apache/iceberg/issues/11814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

2025-01-02 Thread via GitHub
pvary commented on issue #11894: URL: https://github.com/apache/iceberg/issues/11894#issuecomment-2568809136 > @pvary I did restart from a savepoint. I mentioned that in the description > > > I restart the job from the latest savepoint, which is committed at the Kafka source So

Re: [I] when drop a non-Iceberg table , the directory associated with the table was not deleted [iceberg]

2025-01-02 Thread via GitHub
MonkeyCanCode commented on issue #11820: URL: https://github.com/apache/iceberg/issues/11820#issuecomment-2568808355 > > Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add `purge` at the end. This is documented i

Re: [PR] Bump pyparsing from 3.2.0 to 3.2.1 [iceberg-python]

2025-01-02 Thread via GitHub
Fokko merged PR #1481: URL: https://github.com/apache/iceberg-python/pull/1481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]

2025-01-02 Thread via GitHub
pvary commented on PR #11750: URL: https://github.com/apache/iceberg/pull/11750#issuecomment-2568797214 > It's due to API changes like [this](https://github.com/apache/spark/pull/48823/files#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afR182) and [this](https://githu

Re: [I] when MERGE INTO a merge-on-read table got NoSuchMethodError [iceberg]

2025-01-02 Thread via GitHub
lordk911 commented on issue #11821: URL: https://github.com/apache/iceberg/issues/11821#issuecomment-2568792490 ``` ll $SPARK_HOME/jars/*iceberg lrwxrwxrwx 1 bigtop bigtop 79 Dec 26 11:19 jars/iceberg-spark-runtime-3.4_2.12-1.6.1.jar -> /data/soft/extentions4spark3.4/iceberg/icebe

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
ebyhr commented on code in PR #11892: URL: https://github.com/apache/iceberg/pull/11892#discussion_r1901525658 ## docs/docs/hive.md: ## @@ -841,8 +842,6 @@ ALTER TABLE ice_t EXECUTE ROLLBACK(); ### Compaction Hive 4 supports full table compaction of Iceberg tables using

Re: [PR] feat: sql catalog support update table [iceberg-rust]

2025-01-02 Thread via GitHub
liurenjie1024 commented on code in PR #862: URL: https://github.com/apache/iceberg-rust/pull/862#discussion_r1901523435 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -626,6 +626,12 @@ impl TableMetadata { Ok(()) } + +/// Returns snapshot references. +

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
ebyhr commented on code in PR #11892: URL: https://github.com/apache/iceberg/pull/11892#discussion_r1901525658 ## docs/docs/hive.md: ## @@ -841,8 +842,6 @@ ALTER TABLE ice_t EXECUTE ROLLBACK(); ### Compaction Hive 4 supports full table compaction of Iceberg tables using

Re: [I] Field not found in source schema [iceberg]

2025-01-02 Thread via GitHub
MonkeyCanCode commented on issue #11843: URL: https://github.com/apache/iceberg/issues/11843#issuecomment-2568787480 @rohitanil as commented by @stym06 , you are missing couple things and one of them is aws jar. If u want to load hadoop-aws jars, you will need to load couple additional depe

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
ebyhr commented on code in PR #11892: URL: https://github.com/apache/iceberg/pull/11892#discussion_r1901525221 ## docs/docs/hive.md: ## @@ -300,6 +300,7 @@ The result is: | i | BUCKET\[2\]| NULL The supported transformations for Hive are

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
pvary commented on code in PR #11892: URL: https://github.com/apache/iceberg/pull/11892#discussion_r1901524177 ## docs/docs/hive.md: ## @@ -841,8 +842,6 @@ ALTER TABLE ice_t EXECUTE ROLLBACK(); ### Compaction Hive 4 supports full table compaction of Iceberg tables using

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
pvary commented on code in PR #11892: URL: https://github.com/apache/iceberg/pull/11892#discussion_r1901523737 ## docs/docs/hive.md: ## @@ -300,6 +300,7 @@ The result is: | i | BUCKET\[2\]| NULL The supported transformations for Hive are

Re: [PR] fix: valid identifier id in nested map fail [iceberg-rust]

2025-01-02 Thread via GitHub
liurenjie1024 merged PR #864: URL: https://github.com/apache/iceberg-rust/pull/864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] When MAP and ARRAY are next to each other, changing the field name inside the nested field will cause an ERROR [iceberg]

2025-01-02 Thread via GitHub
MonkeyCanCode commented on issue #11872: URL: https://github.com/apache/iceberg/issues/11872#issuecomment-2568776618 @madeirak this seems to be working on iceberg runtime 1.5.0 and spark 3.5.1. Here is what I did: ``` # Setup demo infra ## use docker-compose.yaml from https:

Re: [I] when drop a non-Iceberg table , the directory associated with the table was not deleted [iceberg]

2025-01-02 Thread via GitHub
lordk911 commented on issue #11820: URL: https://github.com/apache/iceberg/issues/11820#issuecomment-2568775134 > Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add `purge` at the end. This is documented in http

Re: [I] when drop a non-Iceberg table , the directory associated with the table was not deleted [iceberg]

2025-01-02 Thread via GitHub
MonkeyCanCode commented on issue #11820: URL: https://github.com/apache/iceberg/issues/11820#issuecomment-2568763829 @lordk911 that is expected. Drop table only remove metadata refs and not the actual data files since 0.14. For actual data file removal, you will need to add `purge` at the e

Re: [PR] Flink: Backport #11662 Fix range distribution npe when value is null to Flink 1.18 and 1.19 [iceberg]

2025-01-02 Thread via GitHub
Guosmilesmile commented on PR #11745: URL: https://github.com/apache/iceberg/pull/11745#issuecomment-2568757065 @pvary Hi Peter, if you have some time, could you please help me review it again? Thank you very much! -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
ebyhr commented on PR #11892: URL: https://github.com/apache/iceberg/pull/11892#issuecomment-2568737907 Rebased on main to resolve conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Docs: Add history to Hive's metadata tables [iceberg]

2025-01-02 Thread via GitHub
okumin commented on PR #11902: URL: https://github.com/apache/iceberg/pull/11902#issuecomment-2568654161 Thanks for merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: valid identifier id in nested map fail [iceberg-rust]

2025-01-02 Thread via GitHub
ZENOTME commented on PR #864: URL: https://github.com/apache/iceberg-rust/pull/864#issuecomment-2568640320 cc @liurenjie1024 @Xuanwo @Fokko @sdd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Spark: Change Delete granularity to file for Spark 3.5 [iceberg]

2025-01-02 Thread via GitHub
amogh-jahagirdar commented on PR #11478: URL: https://github.com/apache/iceberg/pull/11478#issuecomment-2568629522 Unrelated test failure: ``` TestCopyOnWriteDelete > testDeleteWithSnapshotIsolation() > catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog,

Re: [PR] Spark 3.5: Refactor scanning changelog table with timestamps [iceberg]

2025-01-02 Thread via GitHub
flyrain commented on PR #11612: URL: https://github.com/apache/iceberg/pull/11612#issuecomment-2568621811 > 2. `endSnapshot` is `null` (`endTimestamp == null` ensures this is from calculation) Makes sense to check 2 within `if (startTimestamp != null || endTimestamp != null)` with th

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-02 Thread via GitHub
chil-pavn commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1901421555 ## catalog/rest.go: ## @@ -710,3 +777,54 @@ func (r *RestCatalog) UpdateNamespaceProperties(ctx context.Context, namespace t return doPost[payload, Properti

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-02 Thread via GitHub
chil-pavn commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1901420358 ## catalog/rest.go: ## @@ -626,11 +628,76 @@ func (r *RestCatalog) LoadTable(ctx context.Context, identifier table.Identifier } func (r *RestCatalog) DropTable(

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-02 Thread via GitHub
flyrain commented on code in PR #11555: URL: https://github.com/apache/iceberg/pull/11555#discussion_r1901388519 ## core/src/main/java/org/apache/iceberg/TableMetadataUtil.java: ## @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-02 Thread via GitHub
chil-pavn commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1901416497 ## README.md: ## @@ -1,38 +1,3 @@ - - -# Iceberg Golang - -[![Go Reference](https://pkg.go.dev/badge/github.com/apache/iceberg-go.svg)](https://pkg.go.dev/github.co

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-02 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901405319 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGET

Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]

2025-01-02 Thread via GitHub
manuzhang commented on PR #11750: URL: https://github.com/apache/iceberg/pull/11750#issuecomment-2568588698 It's due to API changes like [this](https://github.com/apache/spark/pull/48823/files#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afR182) and [this](https://git

Re: [PR] [doc] Remove registry mirror recommendations [iceberg-rust]

2025-01-02 Thread via GitHub
liurenjie1024 merged PR #866: URL: https://github.com/apache/iceberg-rust/pull/866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-02 Thread via GitHub
huaxingao commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901407198 ## parquet/src/test/java/org/apache/iceberg/parquet/TestBloomRowGroupFilter.java: ## @@ -109,7 +109,7 @@ public class TestBloomRowGroupFilter { optional(

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-02 Thread via GitHub
wgtmac commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901405319 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGET

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-02 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1901143637 ## format/spec.md: ## @@ -693,6 +686,64 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the start

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-02 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1894512326 ## format/spec.md: ## @@ -693,6 +686,64 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the start

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-02 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1894512326 ## format/spec.md: ## @@ -693,6 +686,64 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the start

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-02 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1901398996 ## format/spec.md: ## @@ -693,6 +686,64 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the start

Re: [PR] Spark 3.5: Refactor scanning changelog table with timestamps [iceberg]

2025-01-02 Thread via GitHub
manuzhang commented on PR #11612: URL: https://github.com/apache/iceberg/pull/11612#issuecomment-2568552990 @flyrain Thanks for taking a look. I think what might be confusing is `startSnapshotId` and `endSnapshotId` could be passed in or calculated from `startTimestamp` and `endTimestamp`.

Re: [I] [Feature] Support Metrics Reporting [iceberg-python]

2025-01-02 Thread via GitHub
github-actions[bot] commented on issue #847: URL: https://github.com/apache/iceberg-python/issues/847#issuecomment-2568548257 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] [Feature] Support Metrics Reporting [iceberg-python]

2025-01-02 Thread via GitHub
github-actions[bot] closed issue #847: [Feature] Support Metrics Reporting URL: https://github.com/apache/iceberg-python/issues/847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] API: Remove deprecated `apply()` [iceberg]

2025-01-02 Thread via GitHub
github-actions[bot] commented on PR #11691: URL: https://github.com/apache/iceberg/pull/11691#issuecomment-2568546675 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Core: Fix a bug in streams closing while read or write metadata files [iceberg]

2025-01-02 Thread via GitHub
github-actions[bot] commented on PR #11609: URL: https://github.com/apache/iceberg/pull/11609#issuecomment-2568546651 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Core: Fix a bug in streams closing while read or write metadata files [iceberg]

2025-01-02 Thread via GitHub
github-actions[bot] closed pull request #11609: Core: Fix a bug in streams closing while read or write metadata files URL: https://github.com/apache/iceberg/pull/11609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] API: Align CharSequenceSet impl with Data/DeleteFileSet [iceberg]

2025-01-02 Thread via GitHub
github-actions[bot] closed pull request #11322: API: Align CharSequenceSet impl with Data/DeleteFileSet URL: https://github.com/apache/iceberg/pull/11322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] API: Align CharSequenceSet impl with Data/DeleteFileSet [iceberg]

2025-01-02 Thread via GitHub
github-actions[bot] commented on PR #11322: URL: https://github.com/apache/iceberg/pull/11322#issuecomment-2568546627 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Implement column projection [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1901382763 ## pyiceberg/io/pyarrow.py: ## @@ -1237,16 +1257,20 @@ def _task_to_record_batches( # When V3 support is introduced, we will update `downcast_ns_tim

Re: [PR] Spark 3.5: Refactor scanning changelog table with timestamps [iceberg]

2025-01-02 Thread via GitHub
flyrain commented on code in PR #11612: URL: https://github.com/apache/iceberg/pull/11612#discussion_r1901376613 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -560,20 +560,15 @@ public Scan buildChangelogScan() { } boo

Re: [I] [BUG] ArrowTypeError: "Could not convert" Error in inspect._files method [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1477: URL: https://github.com/apache/iceberg-python/issues/1477#issuecomment-2568518425 > i believe the core issue is with the parsing on the column_sizes sub-directory I dont see anything out of the ordinary. is there a particular reason you think its

Re: [PR] Add iceberg_arrow library [iceberg-cpp]

2025-01-02 Thread via GitHub
kou commented on code in PR #6: URL: https://github.com/apache/iceberg-cpp/pull/6#discussion_r1901365957 ## cmake_modules/BuildUtils.cmake: ## @@ -201,17 +202,26 @@ function(ADD_ICEBERG_LIB LIB_NAME) PUBLIC "$") endif() -install(TARGETS $

Re: [I] Column Names in REST calls [iceberg]

2025-01-02 Thread via GitHub
kpkab commented on issue #11898: URL: https://github.com/apache/iceberg/issues/11898#issuecomment-2568495632 @RussellSpitzer - Would you be able to share more details about the "scan planning apis", may the git hub link. Also we are building our custom Iceberg catalog using the api spec, an

Re: [PR] feat: search current working directory for config file [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on code in PR #1464: URL: https://github.com/apache/iceberg-python/pull/1464#discussion_r1901245337 ## mkdocs/docs/api.md: ## @@ -49,7 +49,7 @@ catalog: and loaded in python by calling `load_catalog(name="hive")` and `load_catalog(name="rest")`. -This

Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]

2025-01-02 Thread via GitHub
pvary commented on PR #11750: URL: https://github.com/apache/iceberg/pull/11750#issuecomment-2568457773 What is the root cause behind the strict version requirements of older Spark versions? The 4.0.0 HMS supposed to be compatible with older clients -- This is an automated message from th

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2568446859 Thank you, @xuanwo. Rebased and ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901273157 ## crates/iceberg/src/arrow/schema.rs: ## @@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min); get_parquet_stat_as_datum!(max); +/// Utilities to deal with [arr

Re: [PR] ci: configure codespell in pre-commit [iceberg-python]

2025-01-02 Thread via GitHub
IndexSeek commented on PR #1478: URL: https://github.com/apache/iceberg-python/pull/1478#issuecomment-2568445857 > @IndexSeek thanks for the PR. could you fix the lint issue by running `make lint`? also we need to add the apache license header to the new file `.codespellrc` You're we

Re: [PR] Doc:Hive 4.0 and later versions allow vectorized read and write opera… [iceberg]

2025-01-02 Thread via GitHub
pvary commented on code in PR #11877: URL: https://github.com/apache/iceberg/pull/11877#discussion_r1901299979 ## docs/docs/hive.md: ## @@ -138,7 +138,7 @@ For example, setting this in the `hive-site.xml` loaded by Spark will enable the by Spark. !!! danger -Starting wi

Re: [I] [Question] Why does plan_files not seem to get multi-threading improvement [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1479: URL: https://github.com/apache/iceberg-python/issues/1479#issuecomment-2568436698 > there is no noticeable time difference between single-threaded and multi-threaded execution. The total time is directly proportional to the number of manifest entries.

Re: [I] Some column statistics are missing after writing data to a table [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1482: URL: https://github.com/apache/iceberg-python/issues/1482#issuecomment-2568422521 Difference between table 1 (add_files) and table 2 (write) * `data_size` for `location`: `null` vs `14001826255` * `nulls_fraction` for `location`: `null` vs `0`

Re: [I] [BUG] ArrowTypeError: "Could not convert" Error in inspect._files method [iceberg-python]

2025-01-02 Thread via GitHub
xsfa commented on issue #1477: URL: https://github.com/apache/iceberg-python/issues/1477#issuecomment-2568420709 ```json { "content": "DATA", "file_path": "s3a://dataplatform/silver/iceberg/spark/dbname/tablename/data/1-3933-e97b5082-3b9e-4c4e-b965-f290205bcf3a-0-1.parq

Re: [I] Some column statistics are missing after writing data to a table [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1482: URL: https://github.com/apache/iceberg-python/issues/1482#issuecomment-2568418644 Thanks for reporting this issue! Both write and add_file uses [`data_file_statistics_from_parquet_metadata`](https://github.com/apache/iceberg-python/blob/5da1f4d6b66cdc689

Re: [PR] Use ExternalTypeInfo in Rowconverter code instead of deprecated TableSchema.getFieldTypes [iceberg]

2025-01-02 Thread via GitHub
abharath9 commented on PR #11838: URL: https://github.com/apache/iceberg/pull/11838#issuecomment-2568410623 @stevenzwu can you review this pr? Once it is merged then I will work on the backporting -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] ci: configure codespell in pre-commit [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on PR #1478: URL: https://github.com/apache/iceberg-python/pull/1478#issuecomment-2568410279 @IndexSeek thanks for the PR. could you fix the lint issue by running `make lint`? also we need to add the apache license header to the new file `.codespellrc` -- This is an

Re: [I] [BUG] ArrowTypeError: "Could not convert" Error in inspect._files method [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1477: URL: https://github.com/apache/iceberg-python/issues/1477#issuecomment-2568406804 Thanks for reporting this @xsfa it looks like the issue happens when the underlying data is transformed into an arrow table ``` return pa.Table.from_pylist

Re: [I] how to grant s3 temp permissions when using pyiceberg? [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1463: URL: https://github.com/apache/iceberg-python/issues/1463#issuecomment-2568401155 pyiceberg passes those s3 configs to the underlying filesystem for [pyarrow we use pyarrow.fs.S3FileSystem](https://github.com/apache/iceberg-python/blob/main/pyiceberg/

Re: [I] Parameter type is not org.apache.avro.Schema for AvroSchemaUtil.toIceberg() [iceberg]

2025-01-02 Thread via GitHub
pvary commented on issue #11884: URL: https://github.com/apache/iceberg/issues/11884#issuecomment-2568399795 @njalan: If you must, you can convert between the 2 Avro Schema using toString and parse. The reason behind this is that Iceberg uses a shaded Avro version for internal purposes --

Re: [I] Support for timestamp downcasting when loading data to iceberg tables [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #1045: URL: https://github.com/apache/iceberg-python/issues/1045#issuecomment-2568392156 @rotem-ad i dont see an active PR for this issue. Would you like to open one? Happy to review -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] release_rc.sh upload artifacts to apache dist [iceberg-go]

2025-01-02 Thread via GitHub
kevinjqliu commented on PR #237: URL: https://github.com/apache/iceberg-go/pull/237#issuecomment-2568387306 Thanks @chil-pavn for the PR! @zeroshade were you able to run `release_rc.sh` and test this PR? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901261131 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -266,6 +272,47 @@ private WriteBuilder createContextFunc( return this; }

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901260644 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -266,6 +272,47 @@ private WriteBuilder createContextFunc( return this; }

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901255823 ## crates/iceberg/src/spec/manifest.rs: ## @@ -966,6 +966,12 @@ impl ManifestEntry { self.sequence_number } +/// File sequence number. +#[inlin

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901255023 ## parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java: ## @@ -266,6 +272,47 @@ private WriteBuilder createContextFunc( return this; }

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901252514 ## crates/iceberg/src/arrow/schema.rs: ## @@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min); get_parquet_stat_as_datum!(max); +/// Utilities to deal with [arr

Re: [PR] [doc] Remove registry mirror recommendations [iceberg-rust]

2025-01-02 Thread via GitHub
kevinjqliu commented on PR #866: URL: https://github.com/apache/iceberg-rust/pull/866#issuecomment-2568364064 cc @liurenjie1024 / @lewiszlw following up https://github.com/apache/iceberg-rust/pull/856#discussion_r1899887407 -- This is an automated message from the Apache Git Service. To r

Re: [PR] Add orbstack guide [iceberg-rust]

2025-01-02 Thread via GitHub
kevinjqliu commented on code in PR #856: URL: https://github.com/apache/iceberg-rust/pull/856#discussion_r1901251672 ## docs/contributing/orbstack.md: ## @@ -0,0 +1,39 @@ + + +# OrbStack as a docker alternative on macOS +1. Install OrbStack by downloading [installer](https://orb

[PR] [doc] Remove registry mirror recommendations [iceberg-rust]

2025-01-02 Thread via GitHub
kevinjqliu opened a new pull request, #866: URL: https://github.com/apache/iceberg-rust/pull/866 Following up https://github.com/apache/iceberg-rust/pull/856 Removes registry mirror recommendations -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Use compatible column name to set Parquet bloom filter [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11799: URL: https://github.com/apache/iceberg/pull/11799#discussion_r1901247617 ## parquet/src/test/java/org/apache/iceberg/parquet/TestBloomRowGroupFilter.java: ## @@ -683,23 +684,23 @@ public void testBytesEq() { } @Test - publi

Re: [PR] Support Location Providers [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on code in PR #1452: URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1901229046 ## pyiceberg/io/pyarrow.py: ## @@ -2622,13 +2631,15 @@ def _dataframe_to_data_files( property_name=TableProperties.WRITE_TARGET_FILE_SIZE_BYTES,

Re: [I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

2025-01-02 Thread via GitHub
sanchay0 commented on issue #11894: URL: https://github.com/apache/iceberg/issues/11894#issuecomment-2568342110 @pvary I did restart from a savepoint. I mentioned that in the description > I restart the job from the latest savepoint, which is committed at the Kafka source The

Re: [PR] Doc: Fix format of Hive [iceberg]

2025-01-02 Thread via GitHub
pvary commented on PR #11892: URL: https://github.com/apache/iceberg/pull/11892#issuecomment-2568320787 @ebyhr: Could you please rebase? Part of the fix is already merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] BUG: Bug: partition name stored in partition data in data file contains special character [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu closed issue #175: BUG: Bug: partition name stored in partition data in data file contains special character URL: https://github.com/apache/iceberg-python/issues/175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] (Potential Bug) Partition field names are not URL-encoded in file locations [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu closed issue #1458: (Potential Bug) Partition field names are not URL-encoded in file locations URL: https://github.com/apache/iceberg-python/issues/1458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] URL-encode partition field names in file locations [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu merged PR #1457: URL: https://github.com/apache/iceberg-python/pull/1457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] URL-encode partition field names in file locations [iceberg-python]

2025-01-02 Thread via GitHub
kevinjqliu commented on code in PR #1457: URL: https://github.com/apache/iceberg-python/pull/1457#discussion_r1901220393 ## tests/integration/test_partitioning_key.py: ## @@ -721,6 +753,27 @@ VALUES (CAST('2023-01-01 11:55:59.99' AS TIMESTAMP), CAS

Re: [I] when MERGE INTO a merge-on-read table got NoSuchMethodError [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on issue #11821: URL: https://github.com/apache/iceberg/issues/11821#issuecomment-2568305036 This usually signifies a version mismatch on the runtime classpath for Spark. Make sure there are no other iceberg-spark-runtime jars -- This is an automated message from

Re: [I] [pyiceberg_core] Expose `IcebergTableProvider` to python [iceberg-rust]

2025-01-02 Thread via GitHub
kevinjqliu commented on issue #865: URL: https://github.com/apache/iceberg-rust/issues/865#issuecomment-2568302236 Possibly blocked by https://github.com/apache/datafusion/issues/13851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Implemented Remaining Catalog operations for REST catalog [iceberg-go]

2025-01-02 Thread via GitHub
zeroshade commented on code in PR #240: URL: https://github.com/apache/iceberg-go/pull/240#discussion_r1901207365 ## README.md: ## @@ -1,38 +1,3 @@ - - -# Iceberg Golang - -[![Go Reference](https://pkg.go.dev/badge/github.com/apache/iceberg-go.svg)](https://pkg.go.dev/github.co

Re: [I] SparkValue converter Timestamp Issue [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on issue #11840: URL: https://github.com/apache/iceberg/issues/11840#issuecomment-2568288012 I am not sure I understand the issue here. Iceberg defines Timestamp as microseconds from epoch. When we actually store it in files it is up to that file format to def

Re: [PR] Impl rest catalog + table updates & requirements [iceberg-go]

2025-01-02 Thread via GitHub
zeroshade commented on PR #146: URL: https://github.com/apache/iceberg-go/pull/146#issuecomment-2568286153 @jwtryg Is there anything else outstanding on this or is this ready for review again? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes? [iceberg]

2025-01-02 Thread via GitHub
pvary commented on issue #11894: URL: https://github.com/apache/iceberg/issues/11894#issuecomment-2568285191 @sanchay0: you need to restart the job from a save savepoint or checkpoint. That will make sure to restore your job in a consistent state. This is more like a Flink question,

Re: [PR] docs: fix prerequisites [iceberg-go]

2025-01-02 Thread via GitHub
zeroshade merged PR #241: URL: https://github.com/apache/iceberg-go/pull/241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Docs: Add history to Hive's metadata tables [iceberg]

2025-01-02 Thread via GitHub
pvary commented on PR #11902: URL: https://github.com/apache/iceberg/pull/11902#issuecomment-2568278804 Thanks @okumin for the fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Docs: Add history to Hive's metadata tables [iceberg]

2025-01-02 Thread via GitHub
pvary merged PR #11902: URL: https://github.com/apache/iceberg/pull/11902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [I] Column Names in REST calls [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on issue #11898: URL: https://github.com/apache/iceberg/issues/11898#issuecomment-2568275546 I think this is only possible if folks go through the newer scan planning apis, the current "table" API removes the Catalog from the process as soon as the metadata.json is

Re: [I] [Java API] Rough edges when partitioning by time types [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on issue #11899: URL: https://github.com/apache/iceberg/issues/11899#issuecomment-2568271032 I think the issue here is that the Copy constructor here just do type checking so the accessor is failing because the Generic record has an illegal object in it. We should h

Re: [I] [Java API] Rough edges when recreating a DataFile that is partitioned by month or hour [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on issue #11900: URL: https://github.com/apache/iceberg/issues/11900#issuecomment-2568217084 I think we probably should be moving to deprecate .withPartitionPath and .withPartitionValues since they basically assume you are importing from a Hive table and only using

Re: [PR] feat(datafusion): Support cast operations [iceberg-rust]

2025-01-02 Thread via GitHub
ryzhyk commented on PR #821: URL: https://github.com/apache/iceberg-rust/pull/821#issuecomment-2568204177 @Fokko , thanks again for working on this. I am very much looking forward to this PR landing (due to #811). Any chance you could finalize it soon? Thanks! -- This is an automated mess

Re: [PR] Introduce `MissingRequiredFilesToDeleteException` for Streaming Deletes [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on PR #11887: URL: https://github.com/apache/iceberg/pull/11887#issuecomment-2568202058 I think generally we wouldn't want to introduce new API concepts unless there is some usage of that API within the core library itself (Otherwise we are basically just opening up

Re: [PR] ParallelIterable: Queue Size w/ O(1) [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on PR #11895: URL: https://github.com/apache/iceberg/pull/11895#issuecomment-2568194825 I wonder if this is as important if we switch ParallelIterable to use the implementation suggested here https://github.com/apache/iceberg/issues/11768 which limits the queue dept

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1901098696 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable {

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2025-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1901095259 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +263,21 @@ private static class Task implements Supplier>>, Closeable {

Re: [PR] Implement column projection [iceberg-python]

2025-01-02 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1901092863 ## pyiceberg/io/pyarrow.py: ## @@ -1216,6 +1216,25 @@ def _field_id(self, field: pa.Field) -> int: return -1 +def _get_column_projection_values( +

  1   2   >