Re: [PR] Config for deciding whether to use Iceberg Time type [iceberg]

2024-11-20 Thread via GitHub
kumarpritam863 commented on PR #11174: URL: https://github.com/apache/iceberg/pull/11174#issuecomment-2490311787 Hi Bryan, I hope you're well. I wanted to follow up on this. It's been a while, and I'd appreciate the opportunity to discuss it further. -- This is an automated message from t

Re: [PR] Added support for evolving the partition of the table [iceberg]

2024-11-20 Thread via GitHub
kumarpritam863 commented on PR #11275: URL: https://github.com/apache/iceberg/pull/11275#issuecomment-2490310518 Hi Bryan, I hope you're well. I wanted to follow up on this. It's been a while, and I'd appreciate the opportunity to discuss it further. We are already performing all the necess

Re: [PR] Handling NO Coordinator Scenario and Data Loss in the current Design [iceberg]

2024-11-20 Thread via GitHub
kumarpritam863 commented on PR #11298: URL: https://github.com/apache/iceberg/pull/11298#issuecomment-2490306516 Hi Bryan, I hope you're well. I wanted to follow up on [specific issue]. It's been a while, and I'd appreciate the opportunity to discuss it further. -- This is an automated m

Re: [PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-20 Thread via GitHub
binayakd commented on PR #1354: URL: https://github.com/apache/iceberg-python/pull/1354#issuecomment-2490301659 Thanks @Fokko! Yes that did come to mind, I was also thinking of its possible to create the stats on the fly, but though it might be left as an enhancement. Ok let me try

Re: [I] REST Catalog S3 Signer Endpoint should be Catalog specific [iceberg]

2024-11-20 Thread via GitHub
c-thiel commented on issue #11608: URL: https://github.com/apache/iceberg/issues/11608#issuecomment-2490285559 This is not only a problem with spark but at least also affects starrocks. According to a user on our discord we see the same behavior as I describe for spark above: I can

Re: [PR] Core,Open-API: Don't expose the `last-column-id` [iceberg]

2024-11-20 Thread via GitHub
nastra commented on PR #11514: URL: https://github.com/apache/iceberg/pull/11514#issuecomment-2490255141 @hussein-awala the Schema API is part of `iceberg-core` and thus allows things to be deprecated and then removed in the next minor release (see also https://iceberg.apache.org/contribute

Re: [PR] Spark: Write DVs for V3 MoR tables [iceberg]

2024-11-20 Thread via GitHub
nastra commented on code in PR #11561: URL: https://github.com/apache/iceberg/pull/11561#discussion_r1851486037 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java: ## @@ -397,4 +450,9 @@ protected void assertAllB

Re: [PR] Spark: Write DVs for V3 MoR tables [iceberg]

2024-11-20 Thread via GitHub
nastra commented on code in PR #11561: URL: https://github.com/apache/iceberg/pull/11561#discussion_r1851471897 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -521,7 +528,10 @@ public void deleteSingleRecordProducesDeleteO

Re: [PR] Flink: Tests alignment for the Flink Sink v2-based implemenation (IcebergSink) [iceberg]

2024-11-20 Thread via GitHub
arkadius commented on PR #11219: URL: https://github.com/apache/iceberg/pull/11219#issuecomment-2490189961 Hi @rodmeneses. No, this change is not relevant. I'll close it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-20 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1851216659 ## data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Flink: Add table.exec.iceberg.use-v2-sink option [iceberg]

2024-11-20 Thread via GitHub
arkadius commented on PR #11244: URL: https://github.com/apache/iceberg/pull/11244#issuecomment-2490187518 I think that this option would benefit this project in a good transitional migration to the new interface in Table API / Flink SQL. What is the alternative plan for the migration betwe

[I] java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: Required field 'num_values' was not found in serialized data [iceberg]

2024-11-20 Thread via GitHub
wardlican opened a new issue, #11614: URL: https://github.com/apache/iceberg/issues/11614 ### Apache Iceberg version 1.4.3 ### Query engine Spark ### Please describe the bug 🐞 ``` CALL spark_catalog.system.rewrite_data_files( table => '${DATA

Re: [PR] Materialized View Spec [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1851160036 ## format/view-spec.md: ## @@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when

Re: [PR] Spark: Fix changelog table bug for start time older than current snapshot [iceberg]

2024-11-20 Thread via GitHub
manuzhang commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2490115014 @Acehaidrey @flyrain The bug was due to we were not checking the timestamp of `endSnapshot` calculated from `endTimestamp` is less than the `startTimestamp`. I've submitted #

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-11-20 Thread via GitHub
GTerrygo commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2490101705 I encountered the same error while attempting to load a large Iceberg table in AWS Glue. Can we prioritize the bug fix for version 1.6.2, since AWS Glue does not support Java 11?

[I] Using the Struct type as the primary key in equalDelete operation will cause data reading errors. [iceberg]

2024-11-20 Thread via GitHub
Leven2023 opened a new issue, #11611: URL: https://github.com/apache/iceberg/issues/11611 ### Apache Iceberg version 1.5.2 ### Query engine Other ### Please describe the bug 🐞 ### Steps to reproduce the bug: > 1)Create a new table, specify the storage

Re: [I] Handling Updates on Partition Columns in Iceberg with Flink CDC [iceberg]

2024-11-20 Thread via GitHub
a8356555 commented on issue #11573: URL: https://github.com/apache/iceberg/issues/11573#issuecomment-2489959870 > What are the records generated by the MySQL CDC connector? > > You are using upsert mode in FlinkSink. > > In upsert mode when an update happens, Flink expects an un

Re: [I] Handling Updates on Partition Columns in Iceberg with Flink CDC [iceberg]

2024-11-20 Thread via GitHub
a8356555 commented on issue #11573: URL: https://github.com/apache/iceberg/issues/11573#issuecomment-2489959318 But my use case requires upsert, so in this scenario, using status as the partition key is not suitable, right? -- This is an automated message from the Apache Git Service. To r

[PR] fix `KeyError` raised by `add_files` when parquet file doe not have column stats [iceberg-python]

2024-11-20 Thread via GitHub
binayakd opened a new pull request, #1354: URL: https://github.com/apache/iceberg-python/pull/1354 Resolves #1353, by switching `del` with `pop` to prevent `KeyError`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Spark: Fix changelog table bug for start time older than current snapshot [iceberg]

2024-11-20 Thread via GitHub
flyrain merged PR #11564: URL: https://github.com/apache/iceberg/pull/11564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[PR] Spark: remove ROW_POSITION from project schema [iceberg]

2024-11-20 Thread via GitHub
huaxingao opened a new pull request, #11610: URL: https://github.com/apache/iceberg/pull/11610 Originally, we have `ReadConfig#generateOffsetToStartPos(Schema schema)` to compute the row offsets of the row groups. This method needs to check if the schema contains ROW_POSITION. https://githu

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-11-20 Thread via GitHub
11xor6 commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2489912105 I'm encountering this as well, specifically with methods that rely on `plan_files`. If there's anything I can do to help or move this forward please let me know. -- This is

Re: [PR] Parquet: Correctly prune nested columns [iceberg]

2024-11-20 Thread via GitHub
MichaelDeSteven commented on PR #11373: URL: https://github.com/apache/iceberg/pull/11373#issuecomment-2485102114 @RussellSpitzer Sorry for late response, PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Add Python Release Action to publish `pyiceberg_core` dist to Pypi [iceberg-rust]

2024-11-20 Thread via GitHub
sungwy commented on PR #705: URL: https://github.com/apache/iceberg-rust/pull/705#issuecomment-2485847390 @Fokko @Xuanwo - I put this together by referring to the [opendal release_python.yml](https://github.com/apache/opendal/blob/main/.github/workflows/release_python.yml) gh actions file.

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-20 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1851206461 ## data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Flink: Tests alignment for the Flink Sink v2-based implemenation (IcebergSink) [iceberg]

2024-11-20 Thread via GitHub
rodmeneses commented on PR #11219: URL: https://github.com/apache/iceberg/pull/11219#issuecomment-2489891209 Hi @arkadius is this still needed/relevant ? please advise, as it will be closed due to inactivity soon cc @pvary -- This is an automated message from the Apache Git Service. T

Re: [PR] Parquet: Use native getRowIndexOffset support instead of calculating it [iceberg]

2024-11-20 Thread via GitHub
flyrain merged PR #11520: URL: https://github.com/apache/iceberg/pull/11520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Parquet: Use native getRowIndexOffset support instead of calculating it [iceberg]

2024-11-20 Thread via GitHub
flyrain commented on PR #11520: URL: https://github.com/apache/iceberg/pull/11520#issuecomment-2489897035 Thanks @wypoon for working on it. Thanks @huaxingao @Fokko @szehon-ho for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-20 Thread via GitHub
ajantha-bhat commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1851202734 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -205,6 +211,8 @@ public T get(int pos, Class javaClass) { public void set(int pos, T val

Re: [PR] Flink: Add table.exec.iceberg.use-v2-sink option [iceberg]

2024-11-20 Thread via GitHub
rodmeneses commented on PR #11244: URL: https://github.com/apache/iceberg/pull/11244#issuecomment-2489891920 Hi @arkadius is this still needed/relevant ? please advise, as it will be closed due to inactivity soon cc @pvary -- This is an automated message from the Apache Git Service. T

Re: [I] .pyiceberg.yaml config files should be loaded from current dir instead of home folder [iceberg-python]

2024-11-20 Thread via GitHub
anentropic commented on issue #1333: URL: https://github.com/apache/iceberg-python/issues/1333#issuecomment-2489400660 true I probably wouldn't check it in but I'd never put it in my home folder either way, it seems like a project file -- This is an automated message from the Apac

Re: [PR] Replace reference of `Table.identifier` with `Table.name` [iceberg-python]

2024-11-20 Thread via GitHub
kevinjqliu commented on PR #1346: URL: https://github.com/apache/iceberg-python/pull/1346#issuecomment-2489368811 opened #1349 and added back the warning filter, we can remove once we upgrade PySpark to 4.0 -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.28.1 to 1.28.5 [iceberg-go]

2024-11-20 Thread via GitHub
dependabot[bot] opened a new pull request, #212: URL: https://github.com/apache/iceberg-go/pull/212 Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.28.1 to 1.28.5. Commits https://github.com/aws/aws-sdk-go-v2/commit/d125de3792b20980da07

[I] `add_files` raises `KeyError` if parquet file doe not have column stats [iceberg-python]

2024-11-20 Thread via GitHub
binayakd opened a new issue, #1353: URL: https://github.com/apache/iceberg-python/issues/1353 ### Apache Iceberg version 0.8.0 (latest release) ### Please describe the bug 🐞 Using the NYC taxi data set found [here](https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_t

Re: [PR] Document procedure for stats collection [iceberg]

2024-11-20 Thread via GitHub
manuzhang commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1851183453 ## docs/docs/spark-procedures.md: ## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-||-

Re: [PR] 1.7.1rc0 cherry picks [iceberg]

2024-11-20 Thread via GitHub
bryanck commented on PR #11593: URL: https://github.com/apache/iceberg/pull/11593#issuecomment-2488919924 New PR is here: https://github.com/apache/iceberg/pull/11603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on code in PR #11564: URL: https://github.com/apache/iceberg/pull/11564#discussion_r1851153915 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { }

Re: [PR] Document procedure for stats collection [iceberg]

2024-11-20 Thread via GitHub
manuzhang commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1851182827 ## docs/docs/spark-procedures.md: ## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-||-

Re: [PR] FIX: Exception Handling in AWS Glue renameTable Method [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11165: FIX: Exception Handling in AWS Glue renameTable Method URL: https://github.com/apache/iceberg/pull/11165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] remove orphan file question [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on issue #10363: URL: https://github.com/apache/iceberg/issues/10363#issuecomment-2489804671 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Data loss in the Incremental Co-operative Mode of Rebalancing [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11289: URL: https://github.com/apache/iceberg/pull/11289#issuecomment-2489805181 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Nessie: respect the nearest namespace's `location` property when creating a table or view [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11215: URL: https://github.com/apache/iceberg/pull/11215#issuecomment-2489805045 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] FIX: Exception Handling in AWS Glue renameTable Method [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11165: URL: https://github.com/apache/iceberg/pull/11165#issuecomment-2489804888 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Always update table metadata when `refresh` is called [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11194: Always update table metadata when `refresh` is called URL: https://github.com/apache/iceberg/pull/11194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Data loss in the Incremental Co-operative Mode of Rebalancing [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11289: Data loss in the Incremental Co-operative Mode of Rebalancing URL: https://github.com/apache/iceberg/pull/11289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Update Examples to Replace Hadoop Catalog with JDBC Catalog [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11285: URL: https://github.com/apache/iceberg/pull/11285#issuecomment-2489805148 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Materialized View Spec [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1851174083 ## format/view-spec.md: ## @@ -42,12 +42,25 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata fil

[PR] Materialized View Spec [iceberg]

2024-11-20 Thread via GitHub
JanKaul opened a new pull request, #11041: URL: https://github.com/apache/iceberg/pull/11041 This PR implements the Iceberg Materialized View Proposal #10043 by adding a section for Materialized Views to the View spec. It follows the design of the [proposal document](https://docs.google.co

Re: [PR] Materialized View Spec [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1851160036 ## format/view-spec.md: ## @@ -158,6 +175,57 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2489847192 Thank you, I had to fix a formatting issue so pushed another update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Always update table metadata when `refresh` is called [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11194: URL: https://github.com/apache/iceberg/pull/11194#issuecomment-2489804923 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] PoC: Add Variant type support in Iceberg [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11201: PoC: Add Variant type support in Iceberg URL: https://github.com/apache/iceberg/pull/11201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] PoC: Add Variant type support in Iceberg [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11201: URL: https://github.com/apache/iceberg/pull/11201#issuecomment-2489804982 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [I] Iceberg 1.7.0 java.lang.IllegalStateException: Connection pool shut down [iceberg]

2024-11-20 Thread via GitHub
hussein-awala commented on issue #11582: URL: https://github.com/apache/iceberg/issues/11582#issuecomment-2489819727 I believe #11609 will fix the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Core: Add support for view-override property in catalog [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11200: URL: https://github.com/apache/iceberg/pull/11200#issuecomment-2489804953 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Flink: Add table.exec.iceberg.use-v2-sink option [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11244: URL: https://github.com/apache/iceberg/pull/11244#issuecomment-2489805105 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2489826751 thank you @flyrain for all the help here - I actually took your advice, think it looks cleaner this way if you dont mind seeing once more -- This is an automated message from the

Re: [I] Cannot access table endpoint in REST catalog when table name contains a slash character (`/`) [iceberg-python]

2024-11-20 Thread via GitHub
github-actions[bot] closed issue #710: Cannot access table endpoint in REST catalog when table name contains a slash character (`/`) URL: https://github.com/apache/iceberg-python/issues/710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Cannot access table endpoint in REST catalog when table name contains a slash character (`/`) [iceberg-python]

2024-11-20 Thread via GitHub
github-actions[bot] commented on issue #710: URL: https://github.com/apache/iceberg-python/issues/710#issuecomment-2489807576 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [PR] OpenAPI: Add planning-mode to loadTable response [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11156: URL: https://github.com/apache/iceberg/pull/11156#issuecomment-2489804811 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on code in PR #11564: URL: https://github.com/apache/iceberg/pull/11564#discussion_r1851157822 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { }

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
flyrain commented on code in PR #11564: URL: https://github.com/apache/iceberg/pull/11564#discussion_r1851145985 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -560,11 +560,13 @@ public Scan buildChangelogScan() { } boo

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-20 Thread via GitHub
aokolnychyi commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r185117 ## data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java: ## @@ -0,0 +1,332 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Parquet: Use native getRowIndexOffset support instead of calculating it [iceberg]

2024-11-20 Thread via GitHub
wypoon commented on code in PR #11520: URL: https://github.com/apache/iceberg/pull/11520#discussion_r1851154584 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReader.java: ## @@ -28,5 +28,14 @@ public interface ParquetValueReader { List> columns(); + /**

Re: [I] Support Snapshot Management Operations [iceberg-python]

2024-11-20 Thread via GitHub
github-actions[bot] commented on issue #737: URL: https://github.com/apache/iceberg-python/issues/737#issuecomment-2489807554 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [PR] Compatible with Spark4 (upgrade antlr4 to version 4.13.1 Compatible with jdk17  ) [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11204: Compatible with Spark4 (upgrade antlr4 to version 4.13.1 Compatible with jdk17  ) URL: https://github.com/apache/iceberg/pull/11204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Nessie: respect the nearest namespace's `location` property when creating a table or view [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11215: Nessie: respect the nearest namespace's `location` property when creating a table or view URL: https://github.com/apache/iceberg/pull/11215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Update Examples to Replace Hadoop Catalog with JDBC Catalog [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11285: Update Examples to Replace Hadoop Catalog with JDBC Catalog URL: https://github.com/apache/iceberg/pull/11285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
flyrain commented on code in PR #11564: URL: https://github.com/apache/iceberg/pull/11564#discussion_r1851144167 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -408,4 +408,53 @@ private List collect(DataFrameReader

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-20 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1851148973 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -327,62 +354,71 @@ private ManifestFile filterManifest(Schema tableSchema, Manife

Re: [PR] Flink: Tests alignment for the Flink Sink v2-based implemenation (IcebergSink) [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11219: URL: https://github.com/apache/iceberg/pull/11219#issuecomment-2489805067 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Compatible with Spark4 (upgrade antlr4 to version 4.13.1 Compatible with jdk17  ) [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11204: URL: https://github.com/apache/iceberg/pull/11204#issuecomment-2489805016 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Core: Add support for view-override property in catalog [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11200: Core: Add support for view-override property in catalog URL: https://github.com/apache/iceberg/pull/11200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Docs: Add Bigquery Iceberg documentation, Update MRAP endpoint and add more docs [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11159: URL: https://github.com/apache/iceberg/pull/11159#issuecomment-2489804843 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Kafka Connect: add option to force columns to lowercase [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #11100: URL: https://github.com/apache/iceberg/pull/11100#issuecomment-2489804772 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on PR #9731: URL: https://github.com/apache/iceberg/pull/9731#issuecomment-2489804464 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Docs: Add Bigquery Iceberg documentation, Update MRAP endpoint and add more docs [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11159: Docs: Add Bigquery Iceberg documentation, Update MRAP endpoint and add more docs URL: https://github.com/apache/iceberg/pull/11159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Kafka Connect: add option to force columns to lowercase [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11100: Kafka Connect: add option to force columns to lowercase URL: https://github.com/apache/iceberg/pull/11100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] OpenAPI: Add planning-mode to loadTable response [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11156: OpenAPI: Add planning-mode to loadTable response URL: https://github.com/apache/iceberg/pull/11156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Test: JdbcCatalog should not drop child namespaces [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed pull request #11063: Test: JdbcCatalog should not drop child namespaces URL: https://github.com/apache/iceberg/pull/11063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] how do you guys back up your iceberg table? [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] closed issue #10299: how do you guys back up your iceberg table? URL: https://github.com/apache/iceberg/issues/10299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] how do you guys back up your iceberg table? [iceberg]

2024-11-20 Thread via GitHub
github-actions[bot] commented on issue #10299: URL: https://github.com/apache/iceberg/issues/10299#issuecomment-2489804588 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2489802477 Fixed! Sorry I missed that On Wed, Nov 20, 2024 at 7:10 PM Yufei Gu ***@***.***> wrote: > ***@***. commented on this pull request. > --

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-20 Thread via GitHub
Acehaidrey commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2489785701 Hey @sfc-gh-ygu @flyrain @RussellSpitzer @bryanck I have gone ahead and updated this . Please if you can take a look - the test passes now as do the other tests. Sorry for

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-20 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1851138290 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -327,62 +354,71 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFil

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-20 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1851129804 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -69,6 +70,7 @@ public String partition() { private final Map specsById; private f

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-20 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1851129804 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -69,6 +70,7 @@ public String partition() { private final Map specsById; private f

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-20 Thread via GitHub
aokolnychyi commented on code in PR #10986: URL: https://github.com/apache/iceberg/pull/10986#discussion_r1851007400 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Spec: Support geo type [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on PR #10981: URL: https://github.com/apache/iceberg/pull/10981#issuecomment-2489771653 Maybe we can do another type? Alternatively put back 'edges' property and more well-define the behavior of lower_bound and upper_bound -- This is an automated message from the Apac

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-20 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1851116414 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java: ## @@ -386,6 +386,12 @@ public void testHiveTableAndIcebergTableWithSameName(TableTyp

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-20 Thread via GitHub
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1851116414 ## hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java: ## @@ -386,6 +386,12 @@ public void testHiveTableAndIcebergTableWithSameName(TableTyp

Re: [PR] REST: AuthManager API [iceberg]

2024-11-20 Thread via GitHub
danielcweeks commented on code in PR #10753: URL: https://github.com/apache/iceberg/pull/10753#discussion_r1851108871 ## core/src/main/java/org/apache/iceberg/rest/AbstractHTTPClient.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Spec: Support geo type [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1851108117 ## format/spec.md: ## @@ -198,6 +199,9 @@ Notes: - Timestamp values _with time zone_ represent a point in time: values are stored as UTC and do not retain a s

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-20 Thread via GitHub
szehon-ho commented on PR #10986: URL: https://github.com/apache/iceberg/pull/10986#issuecomment-2489721338 Looks like all comment addressed, can do a follow up if more. Thanks @karuppayya , and also @aokolnychyi @ajantha-bhat @nastra for addition reviews! -- This is an automated message

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-20 Thread via GitHub
szehon-ho merged PR #10986: URL: https://github.com/apache/iceberg/pull/10986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[I] REST Catalog S3 Signer Endpoint should be Catalog specific [iceberg]

2024-11-20 Thread via GitHub
c-thiel opened a new issue, #11608: URL: https://github.com/apache/iceberg/issues/11608 ### Apache Iceberg version 1.7.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Currently when configuring two REST catalogs in spark, the `s3.sign

Re: [PR] Docs: Use the correct YAML text block indicator to prevent formatting issues [iceberg]

2024-11-20 Thread via GitHub
neodon commented on code in PR #11552: URL: https://github.com/apache/iceberg/pull/11552#discussion_r1851075235 ## site/docs/spark-quickstart.md: ## @@ -100,7 +100,7 @@ services: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east

Re: [PR] API, Core: Add formatVersion() to Table [iceberg]

2024-11-20 Thread via GitHub
amogh-jahagirdar commented on code in PR #11587: URL: https://github.com/apache/iceberg/pull/11587#discussion_r1849121688 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -212,6 +212,11 @@ public String toString() { return name(); } + @Override +

Re: [PR] Bump aiohttp from 3.10.5 to 3.10.11 [iceberg-python]

2024-11-20 Thread via GitHub
Fokko merged PR #1338: URL: https://github.com/apache/iceberg-python/pull/1338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-11-20 Thread via GitHub
RussellSpitzer commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1848650926 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -971,4 +979,109 @@ public int hashCode() { return Objects.hashCode

  1   2   3   >