Re: [I] PR Analysis and Follow-Up: Review of PR #5913 - Data Sequence Number in ManifestEntry [iceberg]

2025-01-21 Thread via GitHub
potiuk commented on issue #12039: URL: https://github.com/apache/iceberg/issues/12039#issuecomment-2606516361 This looks like AI-generated. useless issue report that brings no value and makes no sense. We are generally blocking users that sends a lot of spam AI reports generated by bots.. a

[PR] Doc: Fix expired link in vendor page [iceberg]

2025-01-21 Thread via GitHub
ebyhr opened a new pull request, #12045: URL: https://github.com/apache/iceberg/pull/12045 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Manifest list encryption [iceberg]

2025-01-21 Thread via GitHub
ggershinsky commented on PR #7770: URL: https://github.com/apache/iceberg/pull/7770#issuecomment-2606514708 > I'd also suggest to make sure we rebase because it's been a while since this was open, and we should see CI pass with the rebased changes before merging. Sure, I'll rebase th

Re: [PR] Manifest list encryption [iceberg]

2025-01-21 Thread via GitHub
ggershinsky commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1924835904 ## core/src/main/java/org/apache/iceberg/encryption/StandardEncryptionManager.java: ## @@ -81,22 +136,75 @@ private SecureRandom workerRNG() { return lazyRNG;

Re: [I] Support for S3 catalog to work with S3 Tables [iceberg-python]

2025-01-21 Thread via GitHub
felixscherz commented on issue #1404: URL: https://github.com/apache/iceberg-python/issues/1404#issuecomment-2606505572 @soumilshah1995 once #1429 is merged, an example would be: ```python from pyiceberg.catalog.s3tables import S3TablesCatalog import pyarrow as pa tab

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
ajantha-bhat commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924830895 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -239,6 +269,12 @@ public Optional> visit( LogicalTypeAnnotation.

Re: [PR] Docs: Fix latest and nightly link on javadoc (according to site README.md) [iceberg]

2025-01-21 Thread via GitHub
jbonofre commented on PR #12023: URL: https://github.com/apache/iceberg/pull/12023#issuecomment-2606480709 Ah I found it and it's not related to this change. If you go to iceberg.apache.org (https://iceberg.apache.org/docs/nightly), you can see the Javadoc links are pointing to latest.

Re: [PR] OpenAPI: Deprecate snapshot-id of SetStatisticsUpdate [iceberg]

2025-01-21 Thread via GitHub
c-thiel commented on code in PR #12010: URL: https://github.com/apache/iceberg/pull/12010#discussion_r1924816978 ## api/src/main/java/org/apache/iceberg/UpdateStatistics.java: ## @@ -27,9 +27,22 @@ public interface UpdateStatistics extends PendingUpdate> { * the snapshot if

Re: [PR] Docs: Fix latest and nightly link on javadoc (according to site README.md) [iceberg]

2025-01-21 Thread via GitHub
jbonofre commented on PR #12023: URL: https://github.com/apache/iceberg/pull/12023#issuecomment-2606477916 @manuzhang I checked, and I see 1.7.1, 1.7.0 etc folders in javadoc (without sym link). I guess you are talking about the links in the generated site/index.html ? Can you point me to t

[PR] add RisingWave to the Vendors page [iceberg]

2025-01-21 Thread via GitHub
hengm3467 opened a new pull request, #12043: URL: https://github.com/apache/iceberg/pull/12043 Please let me know if you have any questions or any clarification you need. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add data type/schema field/schema [iceberg-cpp]

2025-01-21 Thread via GitHub
lidavidm commented on PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2606400574 I'll add some unit tests as long as the API looks good to everyone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] WIP: Add headers for type/field/schema [iceberg-cpp]

2025-01-21 Thread via GitHub
lidavidm commented on PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2606393374 Ok, hopefully that fixes macOS. It appears C++20 support on the version of clang shipped by macOS is still flaky: https://godbolt.org/z/W9rTexc13 Maybe we want to avoid C++2

Re: [PR] Docs: Fix latest and nightly link on javadoc (according to site README.md) [iceberg]

2025-01-21 Thread via GitHub
jbonofre commented on PR #12023: URL: https://github.com/apache/iceberg/pull/12023#issuecomment-2606373741 @manuzhang I didn't change the versions. Let me check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Docs: Fix latest and nightly link on javadoc (according to site README.md) [iceberg]

2025-01-21 Thread via GitHub
manuzhang commented on PR #12023: URL: https://github.com/apache/iceberg/pull/12023#issuecomment-2606361948 @jbonofre I built the site based on the PR locally. The Javadoc links from nightly / latest are working now, but links from 1.7.1 / 1.7.0 / 1.6.1 / 1.6.0 all point to latest. Link

[I] Issue in Reading Iceberg tables in Nessie + Minio using Pyiceberg [iceberg-python]

2025-01-21 Thread via GitHub
heman026 opened a new issue, #1560: URL: https://github.com/apache/iceberg-python/issues/1560 ### Question I am getting "Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/2-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata

Re: [I] SparkExecutorCache causes slowness of RewriteDataFilesSparkAction [iceberg]

2025-01-21 Thread via GitHub
anuragmantri commented on issue #11648: URL: https://github.com/apache/iceberg/issues/11648#issuecomment-2606336520 I will work on this. I will create a config for disabling executor cache. Thanks for the bug report and inputs. -- This is an automated message from the Apache Git Service.

[I] NullPointerException when writing to Iceberg table using Spark 3.4.0 [iceberg]

2025-01-21 Thread via GitHub
Bhhsaurabh opened a new issue, #12037: URL: https://github.com/apache/iceberg/issues/12037 ### Apache Iceberg version 1.3.0 ### Query engine Spark ### Please describe the bug 🐞 When attempting to write data to an Iceberg table using the Spark write API, a N

[I] # Title Feature Request / Improvement: Hardening Lock Mechanism for Retry and Interrupt Handling [iceberg]

2025-01-21 Thread via GitHub
Aatirhassanpir opened a new issue, #12034: URL: https://github.com/apache/iceberg/issues/12034 ### Feature Request / Improvement ## Description This improvement aims to enhance the lock mechanism in Apache Hive's Metastore by introducing: Retry Logic for Lock Acquisition: Re

[I] UnsupportedOperationException: Unknown data file format during Spark query [iceberg]

2025-01-21 Thread via GitHub
atharv9017 opened a new issue, #12033: URL: https://github.com/apache/iceberg/issues/12033 ### Apache Iceberg version 1.4.1 ### Query engine Spark ### Please describe the bug 🐞 An error occurs when running a Spark query against a dataset managed by Apache I

Re: [PR] Spark3.5: Standardizing Error Handling in Iceberg Spark Module - TestViews [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11993: URL: https://github.com/apache/iceberg/pull/11993#discussion_r1924690853 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -213,10 +213,13 @@ public void readFromViewUsingNonExistingTa

Re: [I] SparkExecutorCache causes slowness of RewriteDataFilesSparkAction [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on issue #11648: URL: https://github.com/apache/iceberg/issues/11648#issuecomment-2606288708 Unfortunately, we haven't heard back. That said, I may have a guess. I believe it is related to the connection pool we use for reading deletes. The rewrite action submits multi

Re: [I] Feature Request / Improvement [iceberg]

2025-01-21 Thread via GitHub
Umesh7987 closed issue #12032: Feature Request / Improvement URL: https://github.com/apache/iceberg/issues/12032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Feature Request / Improvement [iceberg]

2025-01-21 Thread via GitHub
Umesh7987 commented on issue #12032: URL: https://github.com/apache/iceberg/issues/12032#issuecomment-2606287543 This feature request has already been addressed and implemented in [PR #6811](https://github.com/apache/iceberg/pull/6811). The lazy loading of snapshots in TableMetadata improve

[I] Feature Request / Improvement [iceberg]

2025-01-21 Thread via GitHub
Umesh7987 opened a new issue, #12032: URL: https://github.com/apache/iceberg/issues/12032 ### Feature Request / Improvement Please describe the feature and elaborate on the use case and motivation behind it: This feature introduces lazy loading of snapshots in TableMetadata. The go

[I] Issue with PositionDeletesTable in Apache Iceberg [iceberg]

2025-01-21 Thread via GitHub
jjavieralonso opened a new issue, #12031: URL: https://github.com/apache/iceberg/issues/12031 ### Apache Iceberg version 1.2.0 ### Query engine Other ### Please describe the bug 🐞 There is an issue related to the PositionDeletesTable implementation in the A

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924686678 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924685635 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -68,7 +70,8 @@ public void init(Map options) { @Override protected Iter

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924684066 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924681185 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924681185 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924681185 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [I] please add requires-python to pyproject.toml [iceberg-rust]

2025-01-21 Thread via GitHub
trim21 closed issue #896: please add requires-python to pyproject.toml URL: https://github.com/apache/iceberg-rust/issues/896 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1924680088 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +87,34 @@ private boolean shouldRewrite(List group) { return enough

Re: [PR] add python version support range to pyproject.toml [iceberg-rust]

2025-01-21 Thread via GitHub
Xuanwo merged PR #903: URL: https://github.com/apache/iceberg-rust/pull/903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Adding Crunchy Data to Iceberg Vendors list [iceberg]

2025-01-21 Thread via GitHub
ebyhr commented on code in PR #12020: URL: https://github.com/apache/iceberg/pull/12020#discussion_r1924632862 ## site/docs/vendors.md: ## @@ -57,6 +57,10 @@ the same copy of data using Spark and run analytics or AI with our [Machine Learning](https://www.cloudera.com/products

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2025-01-21 Thread via GitHub
rahil-c commented on PR #11369: URL: https://github.com/apache/iceberg/pull/11369#issuecomment-2606180047 Reviving this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support datetime objects in literal instantiation [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu commented on PR #1542: URL: https://github.com/apache/iceberg-python/pull/1542#issuecomment-2606171821 Thanks for the contribution @jayceslesar! and thanks for the review @Fokko :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] `datetime` objects in `row_filter` expressions are not casted and raise an error [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu closed issue #1456: `datetime` objects in `row_filter` expressions are not casted and raise an error URL: https://github.com/apache/iceberg-python/issues/1456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: support datetime objects in literal instantiation [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu merged PR #1542: URL: https://github.com/apache/iceberg-python/pull/1542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924616379 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]

2025-01-21 Thread via GitHub
shiv-io commented on PR #1242: URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-2606155161 @sungwy thanks for the review :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Build: Bump actions/stale from 9.0.0 to 9.1.0 [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu merged PR #1558: URL: https://github.com/apache/iceberg-python/pull/1558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump cachetools from 5.5.0 to 5.5.1 [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu merged PR #1559: URL: https://github.com/apache/iceberg-python/pull/1559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924605430 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924604772 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -88,7 +85,7 @@ public final ColumnarBatch read(ColumnarBa

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924605026 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924605201 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchUtil.java: ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924604659 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ## @@ -88,7 +85,7 @@ public final ColumnarBatch read(ColumnarBa

Re: [PR] Spark 3.5: Refactor delete logic in batch reading [iceberg]

2025-01-21 Thread via GitHub
huaxingao commented on code in PR #11933: URL: https://github.com/apache/iceberg/pull/11933#discussion_r1924604557 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnVectorBuilder.java: ## @@ -26,13 +26,6 @@ class ColumnVectorBuilder { private

Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]

2025-01-21 Thread via GitHub
sungwy commented on PR #1242: URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-2606139140 This looks good to me @shiv-io ! Thank you very much for including the integration test to test the API :) Could we run `make lint` to make the CI pass? Other than that, this

Re: [I] Support for S3 catalog to work with S3 Tables [iceberg-python]

2025-01-21 Thread via GitHub
soumilshah1995 commented on issue #1404: URL: https://github.com/apache/iceberg-python/issues/1404#issuecomment-2606062410 can we have full example code ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1924533722 ## api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java: ## @@ -431,6 +431,20 @@ public void testVariantUnsupported() { assertThat(bucket.canTransfo

Re: [PR] API: Add `UnknownType` [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #12012: URL: https://github.com/apache/iceberg/pull/12012#discussion_r1924520051 ## api/src/main/java/org/apache/iceberg/transforms/Identity.java: ## @@ -93,6 +95,10 @@ public SerializableFunction bind(Type type) { @Override public boolean

Re: [PR] ORC: Fail when initial default support is required [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #12026: URL: https://github.com/apache/iceberg/pull/12026#discussion_r1924520182 ## orc/src/main/java/org/apache/iceberg/orc/ORCSchemaUtil.java: ## @@ -326,13 +327,20 @@ private static TypeDescription buildOrcProjection( orcType = orig

Re: [I] Parquet column array> with null value is read in as empty list [iceberg-python]

2025-01-21 Thread via GitHub
github-actions[bot] commented on issue #251: URL: https://github.com/apache/iceberg-python/issues/251#issuecomment-2606006540 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [I] Formal verification discovers potential consistency issue [iceberg]

2025-01-21 Thread via GitHub
github-actions[bot] commented on issue #10720: URL: https://github.com/apache/iceberg/issues/10720#issuecomment-2606003572 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] On droping table with shared location the data got deleted which should not be case after version 0.14.0 of Apache iceberg [iceberg]

2025-01-21 Thread via GitHub
github-actions[bot] commented on issue #10779: URL: https://github.com/apache/iceberg/issues/10779#issuecomment-2606003632 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Core: Parsing and Writing Tests for V3 Metadata [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on PR #11947: URL: https://github.com/apache/iceberg/pull/11947#issuecomment-2605982667 Hi everyone, thanks for all the review and suggestions! Since there is an ongoing [discussion](https://github.com/apache/iceberg/pull/11947#discussion_r1917865349) on whether to add tes

[PR] Core, Test: Tests for V3 Table Metadata [iceberg]

2025-01-21 Thread via GitHub
HonahX opened a new pull request, #12025: URL: https://github.com/apache/iceberg/pull/12025 Splitted out from #11947 - Refactor TableMetadata tests by parametrizing table format versions in all tests - Add example metadata files for V3 table - Add constants for several features/

Re: [I] Oauth token flow uses hardcoded scope [iceberg-go]

2025-01-21 Thread via GitHub
zeroshade commented on issue #263: URL: https://github.com/apache/iceberg-go/issues/263#issuecomment-2605969035 Absolutely, I'll happily review the PR. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] Improvement to unittest cases in catalog/registry.go [iceberg-go]

2025-01-21 Thread via GitHub
dttung2905 opened a new pull request, #264: URL: https://github.com/apache/iceberg-go/pull/264 Hi team, I'm new to iceberg and iceberg-go and working on the adoption at my current company. I was digging at the code and found some area to contribute to. Not saying achiving 100% code c

[PR] Build: Bump actions/stale from 9.0.0 to 9.1.0 [iceberg-python]

2025-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #1558: URL: https://github.com/apache/iceberg-python/pull/1558 Bumps [actions/stale](https://github.com/actions/stale) from 9.0.0 to 9.1.0. Release notes Sourced from https://github.com/actions/stale/releases";>actions/stale's releases.

[PR] Build: Bump cachetools from 5.5.0 to 5.5.1 [iceberg-python]

2025-01-21 Thread via GitHub
dependabot[bot] opened a new pull request, #1559: URL: https://github.com/apache/iceberg-python/pull/1559 Bumps [cachetools](https://github.com/tkem/cachetools) from 5.5.0 to 5.5.1. Changelog Sourced from https://github.com/tkem/cachetools/blob/master/CHANGELOG.rst";>cachetools's c

Re: [PR] Spark 3.3: Backport support for default values [iceberg]

2025-01-21 Thread via GitHub
amogh-jahagirdar merged PR #11988: URL: https://github.com/apache/iceberg/pull/11988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Core, Spark: Include content offset/size in PositionDeletesTable [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11675: URL: https://github.com/apache/iceberg/pull/11675#discussion_r1924429613 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeletesRewrite.java: ## @@ -224,34 +228,49 @@ public DataWriter createWriter(int p

Re: [PR] Core, Spark: Include content offset/size in PositionDeletesTable [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11675: URL: https://github.com/apache/iceberg/pull/11675#discussion_r1924427229 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -110,32 +110,47 @@ public Map properties() { } private Schema calculateSchema()

Re: [PR] Core, Spark: Include content offset/size in PositionDeletesTable [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11675: URL: https://github.com/apache/iceberg/pull/11675#discussion_r1924424811 ## core/src/main/java/org/apache/iceberg/MetadataColumns.java: ## @@ -92,6 +92,20 @@ private MetadataColumns() {} Types.LongType.get(), "Com

Re: [PR] Core, Spark: Include content offset/size in PositionDeletesTable [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11675: URL: https://github.com/apache/iceberg/pull/11675#discussion_r1924424811 ## core/src/main/java/org/apache/iceberg/MetadataColumns.java: ## @@ -92,6 +92,20 @@ private MetadataColumns() {} Types.LongType.get(), "Com

Re: [PR] Core, Spark: Include content offset/size in PositionDeletesTable [iceberg]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #11675: URL: https://github.com/apache/iceberg/pull/11675#discussion_r1924424811 ## core/src/main/java/org/apache/iceberg/MetadataColumns.java: ## @@ -92,6 +92,20 @@ private MetadataColumns() {} Types.LongType.get(), "Com

Re: [PR] Spark 3.3: Backport support for default values [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on PR #11988: URL: https://github.com/apache/iceberg/pull/11988#issuecomment-2605803634 @manuzhang, this could be a correctness issue with Spark 3.3 and v3 tables, so I think it is an important fix. The language you're referencing is also trying to set expectations for othe

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924391656 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java: ## @@ -330,6 +361,37 @@ public void write(int repetitionLevel, CharSequence value) {

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924389663 ## parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetReaders.java: ## @@ -92,4 +151,124 @@ protected void set(Record struct, int pos, Object value) {

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924387728 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +54,32 @@ protected ParquetValueWriter createWriter(MessageType type) {

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924386754 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -76,6 +67,31 @@ protected ParquetValueReader createReader( protected abstrac

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924384924 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -397,7 +404,7 @@ public ParquetValueReader primitive( case INT96:

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924379496 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -50,6 +54,32 @@ protected ParquetValueWriter createWriter(MessageType type) {

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924377935 ## parquet/src/main/java/org/apache/iceberg/data/parquet/InternalReader.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924348066 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java: ## @@ -106,6 +114,10 @@ public static PrimitiveWriter byteBuffers(ColumnDescriptor desc)

Re: [I] Support reading V3 metadata [iceberg-python]

2025-01-21 Thread via GitHub
Fokko closed issue #1550: Support reading V3 metadata URL: https://github.com/apache/iceberg-python/issues/1550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924348066 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java: ## @@ -106,6 +114,10 @@ public static PrimitiveWriter byteBuffers(ColumnDescriptor desc)

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924346731 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -63,6 +69,18 @@ public static ParquetValueReader position() { return new Posit

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-21 Thread via GitHub
RussellSpitzer commented on PR #11948: URL: https://github.com/apache/iceberg/pull/11948#issuecomment-2605711498 @nastra @amogh-jahagirdar @HonahX @stevenzwu @nastra , Could you please take another pass? I'd like to get the next chunk of work done as well which is all based on this. -- T

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924346731 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -63,6 +69,18 @@ public static ParquetValueReader position() { return new Posit

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924343948 ## parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java: ## @@ -63,6 +69,18 @@ public static ParquetValueReader position() { return new Posit

Re: [PR] Add V3 read support [iceberg-python]

2025-01-21 Thread via GitHub
Fokko commented on code in PR #1554: URL: https://github.com/apache/iceberg-python/pull/1554#discussion_r1924335034 ## tests/table/test_partitioning.py: ## @@ -151,3 +151,17 @@ def test_partition_type(table_schema_simple: Schema) -> None: NestedField(field_id=1000, nam

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924342038 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -239,6 +269,12 @@ public Optional> visit( LogicalTypeAnnotation.BsonLo

Re: [PR] Modified exception objects being thrown when converting Pyarrow tables [iceberg-python]

2025-01-21 Thread via GitHub
Fokko commented on PR #1498: URL: https://github.com/apache/iceberg-python/pull/1498#issuecomment-2605695304 @DevChrisCross Please don't apologize and always feel free to ping me. Importing PyArrow outside of `pyarrow.py` was my only concern. This looks good, thanks for adding this, and tha

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924339986 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java: ## @@ -192,13 +222,17 @@ public Optional> visit( @Override public Optional>

Re: [PR] Modified exception objects being thrown when converting Pyarrow tables [iceberg-python]

2025-01-21 Thread via GitHub
Fokko merged PR #1498: URL: https://github.com/apache/iceberg-python/pull/1498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-21 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1924336176 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -76,6 +80,46 @@ protected ParquetValueReader createReader( protected abstrac

Re: [I] Better error messages when creating a table with unsupported types [iceberg-python]

2025-01-21 Thread via GitHub
Fokko closed issue #860: Better error messages when creating a table with unsupported types URL: https://github.com/apache/iceberg-python/issues/860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add V3 read support [iceberg-python]

2025-01-21 Thread via GitHub
Fokko merged PR #1554: URL: https://github.com/apache/iceberg-python/pull/1554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu commented on PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555#issuecomment-2605648387 2^32 (4_294_967_296) is around 4GB, we just need to test a scenario greater than that -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add V3 read support [iceberg-python]

2025-01-21 Thread via GitHub
kevinjqliu commented on code in PR #1554: URL: https://github.com/apache/iceberg-python/pull/1554#discussion_r1924304974 ## tests/table/test_partitioning.py: ## @@ -151,3 +151,17 @@ def test_partition_type(table_schema_simple: Schema) -> None: NestedField(field_id=1000

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1924290468 ## format/spec.md: ## @@ -1633,3 +1633,47 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1924290468 ## format/spec.md: ## @@ -1633,3 +1633,47 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1924290468 ## format/spec.md: ## @@ -1633,3 +1633,47 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-21 Thread via GitHub
HonahX commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1924290468 ## format/spec.md: ## @@ -1633,3 +1633,47 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

2025-01-21 Thread via GitHub
Fokko commented on PR #1555: URL: https://github.com/apache/iceberg-python/pull/1555#issuecomment-2605603737 @kevinjqliu I think the test is a bit too much, according to your comment here https://github.com/apache/iceberg-python/pull/1539#discussion_r1922705843 the test allocates almost 5gb

Re: [PR] Core, API, Spec: Metadata Row Lineage [iceberg]

2025-01-21 Thread via GitHub
nastra commented on code in PR #11948: URL: https://github.com/apache/iceberg/pull/11948#discussion_r1924046601 ## core/src/test/java/org/apache/iceberg/TestRowLineageMetadata.java: ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

  1   2   >