Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-17 Thread via GitHub
nastra commented on code in PR #11802: URL: https://github.com/apache/iceberg/pull/11802#discussion_r1889779301 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,10 @@ public void close() { accessor.clo

Re: [I] java.lang.IllegalStateException: Connection pool shut down in Spark [iceberg]

2024-12-17 Thread via GitHub
SandeepSinghGahir commented on issue #11633: URL: https://github.com/apache/iceberg/issues/11633#issuecomment-2550594353 I have been facing this issue from past 6 months now. https://github.com/apache/iceberg/issues/10340#issuecomment-2550591724 -- This is an automated message from the Ap

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-12-17 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2550591724 Hi @amogh-jahagirdar, This issue isn't resolved yet. Upon Glue 5.0 release, I tested a job with Iceberg 1.7.0 and I'm still seeing the same error with just differen

Re: [PR] chore: more consistently use type conversions [iceberg-rust]

2024-12-17 Thread via GitHub
roeap commented on PR #815: URL: https://github.com/apache/iceberg-rust/pull/815#issuecomment-2550581317 @Xuanwo @liurenjie1024 - understood. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: more consistently use type conversions [iceberg-rust]

2024-12-17 Thread via GitHub
roeap closed pull request #815: chore: more consistently use type conversions URL: https://github.com/apache/iceberg-rust/pull/815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Spark3.5 deprecate a few SparkCatalog APIs [iceberg]

2024-12-17 Thread via GitHub
huaxingao commented on PR #11807: URL: https://github.com/apache/iceberg/pull/11807#issuecomment-2550565522 cc @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Hive: Remove Hive runtime [iceberg]

2024-12-17 Thread via GitHub
manuzhang commented on code in PR #11801: URL: https://github.com/apache/iceberg/pull/11801#discussion_r1889749129 ## mr/build.gradle: ## @@ -41,21 +41,15 @@ project(':iceberg-mr') { exclude group: 'org.apache.avro', module: 'avro' } - compileOnly("${libs.hive2

Re: [PR] Doc: Add status page for different implementations. [iceberg]

2024-12-17 Thread via GitHub
liurenjie1024 commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1889710485 ## site/docs/status.md: ## @@ -0,0 +1,367 @@ +--- +title: "Implementation Status" +--- + + +# Implementation Status + +Apache iceberg's spec is implemented in m

Re: [PR] Hive: Remove Hive runtime [iceberg]

2024-12-17 Thread via GitHub
nastra commented on code in PR #11801: URL: https://github.com/apache/iceberg/pull/11801#discussion_r1889702292 ## mr/build.gradle: ## @@ -41,21 +41,15 @@ project(':iceberg-mr') { exclude group: 'org.apache.avro', module: 'avro' } - compileOnly("${libs.hive2.ex

Re: [PR] Doc: Add status page for different implementations. [iceberg]

2024-12-17 Thread via GitHub
liurenjie1024 commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1889699799 ## site/docs/status.md: ## @@ -0,0 +1,367 @@ +--- +title: "Implementation Status" +--- + + +# Implementation Status + +Apache iceberg's spec is implemented in m

Re: [PR] Spark3.4,3.5: In describe extended view command: fix wrong view catal… [iceberg]

2024-12-17 Thread via GitHub
nastra merged PR #11751: URL: https://github.com/apache/iceberg/pull/11751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Doc: Add status page for different implementations. [iceberg]

2024-12-17 Thread via GitHub
liurenjie1024 commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1889692562 ## site/docs/status.md: ## @@ -0,0 +1,367 @@ +--- +title: "Implementation Status" +--- + + +# Implementation Status + +Apache iceberg's spec is implemented in m

Re: [PR] Flink: make `StatisticsOrRecord` to be correctly serialized and deser… [iceberg]

2024-12-17 Thread via GitHub
stevenzwu commented on PR #11557: URL: https://github.com/apache/iceberg/pull/11557#issuecomment-2550436383 thanks @huyuanfeng2018 . can you create the back port PR too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Flink: make `StatisticsOrRecord` to be correctly serialized and deser… [iceberg]

2024-12-17 Thread via GitHub
stevenzwu merged PR #11557: URL: https://github.com/apache/iceberg/pull/11557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-17 Thread via GitHub
samster25 commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1889650976 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries

Re: [I] [SPJ] Skweded partitions harm merge performances [iceberg]

2024-12-17 Thread via GitHub
szehon-ho commented on issue #11800: URL: https://github.com/apache/iceberg/issues/11800#issuecomment-2550367687 Yes unfortunately that optimization is a bit limited, it splits the big size and replicate the small side, so is only correct to do for inner join. I think in this case, you hav

Re: [PR] Spark 3.5: Support default values in Parquet reader [iceberg]

2024-12-17 Thread via GitHub
manuzhang commented on PR #11803: URL: https://github.com/apache/iceberg/pull/11803#issuecomment-2550263174 For the context, is this PR (and previous PRs) resolving https://github.com/apache/iceberg/issues/10761? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-12-17 Thread via GitHub
wypoon commented on PR #10935: URL: https://github.com/apache/iceberg/pull/10935#issuecomment-2550247726 @flyrain you had indicated that you were interested in this. Can you please review? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Spark3.4,3.5: In describe extended view command: fix wrong view catal… [iceberg]

2024-12-17 Thread via GitHub
Ppei-Wang commented on code in PR #11751: URL: https://github.com/apache/iceberg/pull/11751#discussion_r1889532977 ## gradle.properties: ## @@ -20,7 +20,7 @@ systemProp.defaultFlinkVersions=1.20 systemProp.knownFlinkVersions=1.18,1.19,1.20 systemProp.defaultHiveVersions=2 sys

Re: [PR] Reduce code duplication in VectorizedParquetDefinitionLevelReader [iceberg]

2024-12-17 Thread via GitHub
wypoon commented on PR #11661: URL: https://github.com/apache/iceberg/pull/11661#issuecomment-2550192120 @nastra ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] chore: more consistently use type conversions [iceberg-rust]

2024-12-17 Thread via GitHub
liurenjie1024 commented on PR #815: URL: https://github.com/apache/iceberg-rust/pull/815#issuecomment-2550191862 Thanks @roeap for contributing! About this pr, I agree with @Xuanwo that explicit function call makes code easier to read, and there is no strong motivation to change it to `TryF

Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]

2024-12-17 Thread via GitHub
wypoon commented on code in PR #11738: URL: https://github.com/apache/iceberg/pull/11738#discussion_r1889531352 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -144,14 +144,16 @@ public Table loadTable(TableIdentifier ident) { return cached; } -

Re: [PR] Spark3.4,3.5: In describe extended view command: fix wrong view catal… [iceberg]

2024-12-17 Thread via GitHub
Ppei-Wang commented on code in PR #11751: URL: https://github.com/apache/iceberg/pull/11751#discussion_r1889529327 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1414,7 +1414,42 @@ public void describeExtendedView() {

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-17 Thread via GitHub
pan3793 commented on code in PR #11802: URL: https://github.com/apache/iceberg/pull/11802#discussion_r1889513913 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,10 @@ public void close() { accessor.cl

Re: [PR] chore: more consistently use type conversions [iceberg-rust]

2024-12-17 Thread via GitHub
Xuanwo commented on PR #815: URL: https://github.com/apache/iceberg-rust/pull/815#issuecomment-2550171155 Hi, thank you @roeap for working on this. It's intentional that we provide a clear API for the conversion. This ensures an easy-to-find and discoverable API for users. In the near futur

Re: [PR] Bump Spark 3.5.4 RC2 [iceberg]

2024-12-17 Thread via GitHub
pan3793 commented on code in PR #11731: URL: https://github.com/apache/iceberg/pull/11731#discussion_r1889519453 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,11 @@ public void close() { accessor.cl

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-17 Thread via GitHub
pan3793 commented on code in PR #11802: URL: https://github.com/apache/iceberg/pull/11802#discussion_r1889513913 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,10 @@ public void close() { accessor.cl

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-17 Thread via GitHub
pan3793 commented on code in PR #11802: URL: https://github.com/apache/iceberg/pull/11802#discussion_r1889502230 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,10 @@ public void close() { accessor.cl

Re: [PR] Add pre-commit config [iceberg-cpp]

2024-12-17 Thread via GitHub
zhjwpku commented on PR #16: URL: https://github.com/apache/iceberg-cpp/pull/16#issuecomment-2550137994 > Should we also add some instructions on how to use `pre-commit`? Similar to https://py.iceberg.apache.org/contributing/#linting I've added some linting instructions to Contribute

Re: [PR] Add pre-commit config [iceberg-cpp]

2024-12-17 Thread via GitHub
zhjwpku commented on PR #16: URL: https://github.com/apache/iceberg-cpp/pull/16#issuecomment-2550134922 > > Cool, I've raised an issue: https://issues.apache.org/jira/browse/INFRA-26378 > > It got approved :) Still got a startup error: https://github.com/apache/iceberg-cpp/act

Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-12-17 Thread via GitHub
huaxingao commented on code in PR #9841: URL: https://github.com/apache/iceberg/pull/9841#discussion_r1889490436 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java: ## @@ -214,7 +215,7 @@ public void testWriteWithCaseSensitiveOption() th

Re: [PR] Flink: make `StatisticsOrRecord` to be correctly serialized and deser… [iceberg]

2024-12-17 Thread via GitHub
huyuanfeng2018 commented on PR #11557: URL: https://github.com/apache/iceberg/pull/11557#issuecomment-2550119165 > Just a nit comment on style Thanks for review, fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] chore: chmod +x on `verify.py` script [iceberg-rust]

2024-12-17 Thread via GitHub
Xuanwo merged PR #817: URL: https://github.com/apache/iceberg-rust/pull/817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[PR] chore: chmod +x on `verify.py` script [iceberg-rust]

2024-12-17 Thread via GitHub
sungwy opened a new pull request, #817: URL: https://github.com/apache/iceberg-rust/pull/817 Make `./scripts/verify.py` executable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]

2024-12-17 Thread via GitHub
manuzhang commented on PR #11750: URL: https://github.com/apache/iceberg/pull/11750#issuecomment-2550030934 I've created a separate PR #11801 to remove Hive runtime since we can't upgrade hive-metastore dependency until Spark 4. -- This is an automated message from the Apache Git Service.

Re: [I] `parquet_path_to_id_mapping` generates incorrect path for List types [iceberg-python]

2024-12-17 Thread via GitHub
github-actions[bot] commented on issue #716: URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2549967377 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [I] ERROR: Could not build wheels for mmhash3, which is required to install pyproject.toml-based projects [iceberg-python]

2024-12-17 Thread via GitHub
github-actions[bot] commented on issue #836: URL: https://github.com/apache/iceberg-python/issues/836#issuecomment-2549967316 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [I] Bump `HiveCatalog` hive-metastore dependency to Hive 4 [iceberg]

2024-12-17 Thread via GitHub
github-actions[bot] commented on issue #10429: URL: https://github.com/apache/iceberg/issues/10429#issuecomment-2549962952 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Bump `HiveCatalog` hive-metastore dependency to Hive 4 [iceberg]

2024-12-17 Thread via GitHub
github-actions[bot] closed issue #10429: Bump `HiveCatalog` hive-metastore dependency to Hive 4 URL: https://github.com/apache/iceberg/issues/10429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Improve remove_orphan_files performance by using "inventory listing" [iceberg]

2024-12-17 Thread via GitHub
github-actions[bot] closed issue #10426: Improve remove_orphan_files performance by using "inventory listing" URL: https://github.com/apache/iceberg/issues/10426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Improve remove_orphan_files performance by using "inventory listing" [iceberg]

2024-12-17 Thread via GitHub
github-actions[bot] commented on issue #10426: URL: https://github.com/apache/iceberg/issues/10426#issuecomment-2549962905 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1889384130 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
barronw commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1889379616 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields(se

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
barronw commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1889379616 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields(se

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1889378149 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields

Re: [I] Add REST catalog integration tests [iceberg-python]

2024-12-17 Thread via GitHub
AhmedNader42 commented on issue #1439: URL: https://github.com/apache/iceberg-python/issues/1439#issuecomment-2549892936 Sounds good! Yes, I'll work on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-17 Thread via GitHub
danielcweeks merged PR #11769: URL: https://github.com/apache/iceberg/pull/11769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

Re: [I] Add REST catalog integration tests [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on issue #1439: URL: https://github.com/apache/iceberg-python/issues/1439#issuecomment-2549887131 > My current approach would be to migrate the test_rest.py unit tests from mock to the integration test I think thats a great first step! Generally, testing via mock

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-17 Thread via GitHub
danielcweeks commented on PR #11769: URL: https://github.com/apache/iceberg/pull/11769#issuecomment-2549851330 Thanks @adutra!, first one down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-17 Thread via GitHub
danielcweeks commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1889357046 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Flink: make `StatisticsOrRecord` to be correctly serialized and deser… [iceberg]

2024-12-17 Thread via GitHub
stevenzwu commented on code in PR #11557: URL: https://github.com/apache/iceberg/pull/11557#discussion_r1888968104 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsOrRecordTypeInformation.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
barronw commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1889308898 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields(se

[PR] Bump boto3 from 1.35.36 to 1.35.81 [iceberg-python]

2024-12-17 Thread via GitHub
dependabot[bot] opened a new pull request, #1440: URL: https://github.com/apache/iceberg-python/pull/1440 Bumps [boto3](https://github.com/boto/boto3) from 1.35.36 to 1.35.81. Commits https://github.com/boto/boto3/commit/1297fdd88cbea30cb77a5f104f2617392c57b057";>1297fdd Merge

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-17 Thread via GitHub
rdblue commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1889283765 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,17 @@ public interface ExpireSnapshots extends PendingUpdate> { * @return this for

Re: [I] Allow for a `ManifestList` to have a different snapshot ID than the `ManifestFile`s it points to [iceberg-rust]

2024-12-17 Thread via GitHub
Sl1mb0 commented on issue #816: URL: https://github.com/apache/iceberg-rust/issues/816#issuecomment-2549735223 Having written this I now realize this is only a problem if the `Manifest` has **not** been assigned a sequence number. Closing. -- This is an automated message from the Apache G

Re: [I] Allow for a `ManifestList` to have a different snapshot ID than the `ManifestFile`s it points to [iceberg-rust]

2024-12-17 Thread via GitHub
Sl1mb0 closed issue #816: Allow for a `ManifestList` to have a different snapshot ID than the `ManifestFile`s it points to URL: https://github.com/apache/iceberg-rust/issues/816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-17 Thread via GitHub
rdblue commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1889284613 ## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ## @@ -273,7 +275,7 @@ private Set findFilesToDelete( Set manifestsToScan, Set mani

Re: [I] Implement `namespace_exists` function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu closed issue #1430: Implement `namespace_exists` function on the REST Catalog URL: https://github.com/apache/iceberg-python/issues/1430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Allow for a `ManifestList` to have a different snapshot ID than the `ManifestFile`s it points to [iceberg-rust]

2024-12-17 Thread via GitHub
Sl1mb0 opened a new issue, #816: URL: https://github.com/apache/iceberg-rust/issues/816 In the [Iceberg specification](https://iceberg.apache.org/spec) it is implied that a `ManifestList` `A` and a`ManifestList` `B` may contain similar entries. Note that in the following diagram the (from l

[PR] Open-API: Fix compilation errors in generated Java classes due to mismatched return types [iceberg]

2024-12-17 Thread via GitHub
VladimirYushkevich opened a new pull request, #11806: URL: https://github.com/apache/iceberg/pull/11806 **Description** This PR addresses a compilation issue in the generated java classes from `rest-catalog-open-api.yaml` specification. **Problem** The generated Java classes fai

Re: [PR] Test: Bump Iceberg-Java to 1.7.1 [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #1323: URL: https://github.com/apache/iceberg-python/pull/1323#discussion_r1889232014 ## dev/Dockerfile: ## @@ -36,9 +36,9 @@ ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$ RUN mkdir -p ${HADOOP_HOME} && mkdir -p

Re: [PR] Test: Bump Iceberg-Java to 1.7.1 [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #1323: URL: https://github.com/apache/iceberg-python/pull/1323#discussion_r1889227649 ## dev/Dockerfile: ## @@ -38,7 +38,7 @@ WORKDIR ${SPARK_HOME} ENV SPARK_VERSION=3.5.3 Review Comment: ```suggestion ENV SPARK_VERSION=3.5.2 ``` --

Re: [PR] Remove `version` from `docker-compose` [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu merged PR #1438: URL: https://github.com/apache/iceberg-python/pull/1438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#issuecomment-2549620946 Thanks for the contribution, @AhmedNader42 ! I created #1439 so that we can add more integration tests for the REST catalog. I also started this [devlist discussion](http

Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu merged PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

[I] Add REST catalog integration tests [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu opened a new issue, #1439: URL: https://github.com/apache/iceberg-python/issues/1439 ### Feature Request / Improvement #1434 created `tests/integration/test_rest_catalog.py` for integration tests with the REST catalog. Previously REST catalog tests were using mocked request

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu merged PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

[PR] Remove `version` from `docker-compose` [iceberg-python]

2024-12-17 Thread via GitHub
Fokko opened a new pull request, #1438: URL: https://github.com/apache/iceberg-python/pull/1438 Seeing this in the logs: ``` time="2024-12-09T14:41:56Z" level=warning msg="/home/runner/work/iceberg-python/iceberg-python/dev/docker-compose-gcs-server.yml: `version` is obsolete"

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1889190270 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ]

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385#issuecomment-2549561323 @kevinjqliu Thanks, some bad tests. They didn't have the `manifest-list` in the snapshot, which is invalid according to the spec. -- This is an automated message from the Apache Gi

[PR] chore: more consistently use type conversions [iceberg-rust]

2024-12-17 Thread via GitHub
roeap opened a new pull request, #815: URL: https://github.com/apache/iceberg-rust/pull/815 Hey iceberg-rs!, looking to start to more actively contribute to the iceberg ecosystem and trying to grock the codebase, I thought maybe some small contributions might be welcome :). Th

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385#issuecomment-2549553584 `tests/table/test_metadata.py::test_serialize_v1` and `tests/table/test_metadata.py::test_v1_write_metadata_for_v2` also fails -- This is an automated message from the Apache

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385#issuecomment-2549549009 looks like `manifest-list` is missing in our example https://github.com/apache/iceberg-python/blob/b0ea716c91f19281d3d9cd7b6965d5d01f6cc3d5/tests/conftest.py#L628 -- This i

Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#discussion_r1889169039 ## tests/integration/test_rest_catalog.py: ## @@ -0,0 +1,63 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#issuecomment-2549530592 @AhmedNader42 looks like the RAT check failed. For new files, we need to include the ASF license on top, like https://github.com/apache/iceberg-python/blob/ceffe08ad90a0d150c6f1

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385#discussion_r1889157560 ## pyiceberg/table/snapshots.py: ## @@ -239,9 +239,7 @@ class Snapshot(IcebergBaseModel): parent_snapshot_id: Optional[int] = Field(alias="parent-snapshot-id"

Re: [PR] Add pre-commit config [iceberg-cpp]

2024-12-17 Thread via GitHub
Fokko commented on PR #16: URL: https://github.com/apache/iceberg-cpp/pull/16#issuecomment-2549497161 > Cool, I've raised an issue: https://issues.apache.org/jira/browse/INFRA-26378 It got approved :) -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Core, Spark3.5: Fix tests failure due to timeout [iceberg]

2024-12-17 Thread via GitHub
nastra merged PR #11654: URL: https://github.com/apache/iceberg/pull/11654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Doc: Add status page for different implementations. [iceberg]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1889060089 ## site/docs/status.md: ## @@ -0,0 +1,367 @@ +--- +title: "Implementation Status" +--- + + +# Implementation Status + +Apache iceberg's spec is implemented in mult

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-17 Thread via GitHub
Fokko commented on code in PR #11802: URL: https://github.com/apache/iceberg/pull/11802#discussion_r1889117177 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,10 @@ public void close() { accessor.clos

[I] Sink-managed consumer group expires after 7 days of no activity in the topic [iceberg]

2024-12-17 Thread via GitHub
fenil25 opened a new issue, #11805: URL: https://github.com/apache/iceberg/issues/11805 ### Apache Iceberg version 1.7.1 (latest release) ### Query engine Kafka Connect ### Please describe the bug 🐞 For the Kafka Connect Iceberg connector, sink-managed c

Re: [I] how to pass where clause predicate to rewrite_data_files which uses year of a timestamp column [iceberg]

2024-12-17 Thread via GitHub
salimpadela commented on issue #11789: URL: https://github.com/apache/iceberg/issues/11789#issuecomment-2549387121 if I have multiple jobs running in parallel and they only compact the data that they have written based on the criteria in where clause, will that create conflict errors? --

[I] correct sequence for running maintenance steps on an iceberg table [iceberg]

2024-12-17 Thread via GitHub
salimpadela opened a new issue, #11804: URL: https://github.com/apache/iceberg/issues/11804 ### Query engine Spark, AWS Glue ### Question What is the correct sequence of maintenance steps to run on an Iceberg table? Our tables are write-once-read-many so I am not sure if

Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]

2024-12-17 Thread via GitHub
AhmedNader42 commented on PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#issuecomment-2549362930 I created a new file for REST Catalog integration tests `tests/integration/test_rest_catalog.py` as suggestsed by @kevinjqliu and included some tests covering the `namespace_e

Re: [PR] Spec: Support geo type [iceberg]

2024-12-17 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1889044843 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Snapshot: Make manifest-list required [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #1385: URL: https://github.com/apache/iceberg-python/pull/1385#discussion_r1889038870 ## pyiceberg/table/snapshots.py: ## @@ -239,9 +239,7 @@ class Snapshot(IcebergBaseModel): parent_snapshot_id: Optional[int] = Field(alias="parent-snapsho

Re: [PR] Spec: Support geo type [iceberg]

2024-12-17 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1889044843 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] Remove support for catalog_name in table identifier string [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #963: URL: https://github.com/apache/iceberg-python/pull/963#discussion_r1889024364 ## tests/catalog/test_base.py: ## @@ -514,7 +514,7 @@ def test_rename_table(catalog: InMemoryCatalog) -> None: # Then assert table._identifier == Ca

[PR] Spark 3.5: Support default values in Parquet reader [iceberg]

2024-12-17 Thread via GitHub
rdblue opened a new pull request, #11803: URL: https://github.com/apache/iceberg/pull/11803 This is similar to #11785 but updates the Spark readers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Change dot notation in add column documentation to tuple [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1433: URL: https://github.com/apache/iceberg-python/pull/1433#issuecomment-2549215540 Heres where the errors happens https://github.com/apache/iceberg-python/blob/b0ea716c91f19281d3d9cd7b6965d5d01f6cc3d5/pyiceberg/table/update/schema.py#L184-L192 And

Re: [PR] Flink: make `StatisticsOrRecord` to be correctly serialized and deser… [iceberg]

2024-12-17 Thread via GitHub
stevenzwu commented on code in PR #11557: URL: https://github.com/apache/iceberg/pull/11557#discussion_r1888968104 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsOrRecordTypeInformation.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-17 Thread via GitHub
danielcweeks commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1888950284 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] fix: field id in name mapping should be optional [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1888945215 ## pyiceberg/table/name_mapping.py: ## @@ -232,7 +234,9 @@ def mapping(self, nm: NameMapping, field_results: List[MappedField]) -> List[Map def fields

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-17 Thread via GitHub
danielcweeks commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1888947570 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [I] PyIceberg appending data creates snapshots incompatible with Athena/Spark [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on issue #1424: URL: https://github.com/apache/iceberg-python/issues/1424#issuecomment-2549099056 I think the snapshot id is generated on the client side. So its possible only if glue is also committing the table. If you can share the metadata json, that would be

Re: [I] [Investigate] Whether `data_files` metadata table requires both pyarrow and s3fs [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu closed issue #1317: [Investigate] Whether `data_files` metadata table requires both pyarrow and s3fs URL: https://github.com/apache/iceberg-python/issues/1317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [Investigate] Whether `data_files` metadata table requires both pyarrow and s3fs [iceberg-python]

2024-12-17 Thread via GitHub
jiakai-li commented on issue #1317: URL: https://github.com/apache/iceberg-python/issues/1317#issuecomment-2549089683 Cool, thanks @kevinjqliu . Let's close this one then. At the mean time, as you mentioned, I'll verify the `fsspec` and open another issue if I found any problem for that.

Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu commented on PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#issuecomment-2549088372 Thanks for the great catch @paulcichonski -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu merged PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] Schema Deserialization Ignores Field initial-default and write-default Values [iceberg-python]

2024-12-17 Thread via GitHub
kevinjqliu closed issue #1431: Schema Deserialization Ignores Field initial-default and write-default Values URL: https://github.com/apache/iceberg-python/issues/1431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

  1   2   >