Re: [PR] Arrow: add support for null vectors [iceberg]

2024-09-25 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1776485623 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,18 @@ public static class ConstantVectorHolder extends VectorHolder

Re: [PR] Core: Add a util to compute partition stats [iceberg]

2024-09-25 Thread via GitHub
ajantha-bhat commented on PR #11146: URL: https://github.com/apache/iceberg/pull/11146#issuecomment-2376070543 @aokolnychyi: Thanks for the review and guidance. I have addressed the final nits. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Upgrade to Gradle 8.10.2 [iceberg]

2024-09-25 Thread via GitHub
jbonofre commented on code in PR #11212: URL: https://github.com/apache/iceberg/pull/11212#discussion_r1776464839 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -1,7 +1,7 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionSha256Sum=1541fa36599

Re: [PR] Upgrade to Gradle 8.10.2 [iceberg]

2024-09-25 Thread via GitHub
jbonofre commented on code in PR #11212: URL: https://github.com/apache/iceberg/pull/11212#discussion_r1776457256 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -1,7 +1,7 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionSha256Sum=1541fa36599

Re: [PR] Upgrade to Gradle 8.10.2 [iceberg]

2024-09-25 Thread via GitHub
nastra commented on code in PR #11212: URL: https://github.com/apache/iceberg/pull/11212#discussion_r1776455680 ## gradle/wrapper/gradle-wrapper.properties: ## @@ -1,7 +1,7 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionSha256Sum=1541fa36599e1

Re: [PR] feat: Safer PartitionSpec & SchemalessPartitionSpec [iceberg-rust]

2024-09-25 Thread via GitHub
c-thiel commented on PR #645: URL: https://github.com/apache/iceberg-rust/pull/645#issuecomment-2376026264 Introducing `SchemalessPartitionSpec` might be our way to avoid https://github.com/apache/iceberg/issues/4563. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-09-25 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1776421822 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,18 @@ public static class ConstantVectorHolder extends VectorHolder

Re: [I] Why not use the profile name when initialising the S3FileSystem class? [iceberg-python]

2024-09-25 Thread via GitHub
wudihero2 commented on issue #1207: URL: https://github.com/apache/iceberg-python/issues/1207#issuecomment-2375828069 Hello, I am interested in this, do I need to tag the person who will assign this task to me? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375810823 I like that its converted, its more readable! Do you know where the transform happens? Is it only for the metadata table? -- This is an automated message from the Apache Git

Re: [I] Enabling schema evolution feature using spark configuration like we have in Delta Lake [iceberg]

2024-09-25 Thread via GitHub
aleenamg21-1 closed issue #9651: Enabling schema evolution feature using spark configuration like we have in Delta Lake URL: https://github.com/apache/iceberg/issues/9651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Enabling schema evolution feature using spark configuration like we have in Delta Lake [iceberg]

2024-09-25 Thread via GitHub
aleenamg21-1 commented on issue #9651: URL: https://github.com/apache/iceberg/issues/9651#issuecomment-2375808924 Closing this issue since [#9640] got merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1776310431 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinzwang commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375801600 > Im not 100% sure, perhaps the metadata table does the transformation. > > https://iceberg.apache.org/docs/latest/spark-queries/#partitions I think you are correct

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1776310431 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1776310431 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] OpenAPI: Add AppendDataFile models to openapi spec for fine grained metadata commits [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #10202: URL: https://github.com/apache/iceberg/pull/10202#discussion_r1776236397 ## open-api/rest-catalog-open-api.yaml: ## @@ -2893,6 +3003,37 @@ components: additionalProperties: type: string +AppendDataFil

Re: [PR] Core: Add a util to compute partition stats [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11146: URL: https://github.com/apache/iceberg/pull/11146#discussion_r1776149156 ## core/src/main/java/org/apache/iceberg/PartitionStatsUtil.java: ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [I] What's the use of old metadata file, why not delete by default? [iceberg]

2024-09-25 Thread via GitHub
madeirak commented on issue #11206: URL: https://github.com/apache/iceberg/issues/11206#issuecomment-2375784911 > Keeping old metadata helps support [rollback & time travel](https://iceberg.apache.org/docs/latest/spark-queries/#time-travel). It's often useful to know what the state of the t

Re: [PR] Build: Bump Spark 3.5 to 3.5.3 [iceberg]

2024-09-25 Thread via GitHub
manuzhang commented on PR #11160: URL: https://github.com/apache/iceberg/pull/11160#issuecomment-2375773222 Spark community is [reverting changes](https://github.com/apache/spark/pull/48257), and we will skip `3.5.3` and wait for next Spark 3.5 release. -- This is an automated message fr

Re: [PR] DO NOT MERGE WILL BREAK - Change BaseCatalog to Interface [iceberg]

2024-09-25 Thread via GitHub
manuzhang commented on PR #11210: URL: https://github.com/apache/iceberg/pull/11210#issuecomment-2375771669 Thanks @RussellSpitzer. Spark community is [reverting the changes](https://github.com/apache/spark/pull/48257) such that we don't need to change `BaseCatalog` now. -- This is an au

[I] Why does executing a sql "desc tableA" in hive command line report a error on a iceberg table with decimal(2,2) field type [iceberg]

2024-09-25 Thread via GitHub
denghaiy opened a new issue, #11211: URL: https://github.com/apache/iceberg/issues/11211 ### Apache Iceberg version 1.0.0 ### Query engine Spark ### Please describe the bug 🐞 We have created a iceberg table named "test.tableA" with a column type decimal(

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on PR #10179: URL: https://github.com/apache/iceberg/pull/10179#issuecomment-2375754209 yes, we should have a config to determine which sink implementation used for Table API/SQL. Default should be using the old `FlinkSink`. When the new v2 sink implementation becomes st

Re: [PR] OpenAPI: Add AppendDataFile models to openapi spec for fine grained metadata commits [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #10202: URL: https://github.com/apache/iceberg/pull/10202#discussion_r1776260258 ## open-api/rest-catalog-open-api.yaml: ## @@ -2893,6 +3003,37 @@ components: additionalProperties: type: string +AppendDataFil

Re: [PR] OpenAPI: Add AppendDataFile models to openapi spec for fine grained metadata commits [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #10202: URL: https://github.com/apache/iceberg/pull/10202#discussion_r1776236397 ## open-api/rest-catalog-open-api.yaml: ## @@ -2893,6 +3003,37 @@ components: additionalProperties: type: string +AppendDataFil

Re: [I] bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema [iceberg-rust]

2024-09-25 Thread via GitHub
liurenjie1024 commented on issue #627: URL: https://github.com/apache/iceberg-rust/issues/627#issuecomment-2375681475 I think this could be solve together with other problems like type promotion. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375566385 Im not 100% sure, perhaps the metadata table does the transformation. https://iceberg.apache.org/docs/latest/spark-queries/#partitions -- This is an automated message from

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776175392 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776175392 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776175392 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinzwang commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375530341 Ok so interesting... Spark actually does store day transforms as date type in the metadata, which is why the integration test is failing. This is probably why this library had t

Re: [PR] Compatible with Spark4 (upgrade antlr4 to version 4.13.1 Compatible with jdk17  ) [iceberg]

2024-09-25 Thread via GitHub
awol2005ex commented on PR #11204: URL: https://github.com/apache/iceberg/pull/11204#issuecomment-2375517182 > Have you checked out #10622? No , I just see that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1776132332 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DeleteFilesProcessor.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Soft

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on code in PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#discussion_r1776147728 ## pyiceberg/transforms.py: ## @@ -517,9 +517,6 @@ def day_func(v: Any) -> int: def can_transform(self, source: IcebergType) -> bool: return isi

Re: [PR] Compatible with Spark4 (upgrade antlr4 to version 4.13.1 Compatible with jdk17  ) [iceberg]

2024-09-25 Thread via GitHub
manuzhang commented on PR #11204: URL: https://github.com/apache/iceberg/pull/11204#issuecomment-2375511027 Have you checked out https://github.com/apache/iceberg/pull/10622? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Core: Remove unused code for streaming position deletes [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar merged PR #11175: URL: https://github.com/apache/iceberg/pull/11175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Core: Remove unused code for streaming position deletes [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on PR #11175: URL: https://github.com/apache/iceberg/pull/11175#issuecomment-2375505084 Thanks @wypoon , merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinzwang commented on code in PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#discussion_r1776140899 ## pyiceberg/transforms.py: ## @@ -517,9 +517,6 @@ def day_func(v: Any) -> int: def can_transform(self, source: IcebergType) -> bool: return isi

Re: [I] DELETE fails with "java.lang.IllegalArgumentException: info must be ExtendedLogicalWriteInfo" [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on issue #8926: URL: https://github.com/apache/iceberg/issues/8926#issuecomment-2375486924 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Iceberg streaming using checkpoint does not ignore the stream-from-timestamp option [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on issue #8921: URL: https://github.com/apache/iceberg/issues/8921#issuecomment-2375486850 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark 3.5: Fix Migrate procedure renaming issue for custom catalog [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on PR #8931: URL: https://github.com/apache/iceberg/pull/8931#issuecomment-2375486985 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Spark write abort result in table miss metadata location file [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on issue #8927: URL: https://github.com/apache/iceberg/issues/8927#issuecomment-2375486939 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Missing serialVersionUID in Serializable implementation [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on issue #8929: URL: https://github.com/apache/iceberg/issues/8929#issuecomment-2375486956 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Vulnerabilities found on latest version - jackson, avro, openssl [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on issue #8923: URL: https://github.com/apache/iceberg/issues/8923#issuecomment-2375486901 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-2375486872 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on code in PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#discussion_r1776111931 ## pyiceberg/transforms.py: ## @@ -517,9 +517,6 @@ def day_func(v: Any) -> int: def can_transform(self, source: IcebergType) -> bool: return isi

Re: [PR] Flink: Avoid metaspace memory leak by not registering ShutdownHook for ExecutorService in Flink [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on PR #11073: URL: https://github.com/apache/iceberg/pull/11073#issuecomment-2375462905 I think we should first add Javadoc to `ThreadPools.newWorkerPool` that it adds shutdown hook. It is not obvious from the method name. regarding `ThreadPools.newNonExitingWorker

Re: [PR] Core: Replace use of CharSequenceMap in DeleteFileIndex with Map [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11199: URL: https://github.com/apache/iceberg/pull/11199#discussion_r1776090858 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -49,7 +50,21 @@ public static , K> K copy( } } + /** + * @deprecated since

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1776099559 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/maintenance/operator/TestExpireSnapshotsProcessor.java: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apac

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776097229 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -78,9 +78,11 @@ public String partition() { private boolean failMissingDeletePa

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinzwang commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375442625 > is this the source of truth? https://iceberg.apache.org/spec/#partition-transforms Yup, precisely -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Core: Support merging in PositionDeleteIndex [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on PR #11208: URL: https://github.com/apache/iceberg/pull/11208#issuecomment-2375438906 Thank you, @singhpk234 @anuragmantri @amogh-jahagirdar! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Core: Support merging in PositionDeleteIndex [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi merged PR #11208: URL: https://github.com/apache/iceberg/pull/11208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Core: Replace use of CharSequenceMap in DeleteFileIndex with Map [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11199: URL: https://github.com/apache/iceberg/pull/11199#discussion_r1776091909 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -49,7 +50,21 @@ public static , K> K copy( } } + /** + * @deprecated since

Re: [PR] Core: Replace use of CharSequenceMap in DeleteFileIndex with Map [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11199: URL: https://github.com/apache/iceberg/pull/11199#discussion_r1776092317 ## core/src/main/java/org/apache/iceberg/DeleteFileIndex.java: ## @@ -458,14 +457,14 @@ DeleteFileIndex build() { } private void add( -CharSeq

Re: [PR] Core: Replace use of CharSequenceMap in DeleteFileIndex with Map [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11199: URL: https://github.com/apache/iceberg/pull/11199#discussion_r1776090858 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -49,7 +50,21 @@ public static , K> K copy( } } + /** + * @deprecated since

Re: [PR] Core: Replace use of CharSequenceMap in DeleteFileIndex with Map [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11199: URL: https://github.com/apache/iceberg/pull/11199#discussion_r1776090183 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -49,7 +50,21 @@ public static , K> K copy( } } + /** + * @deprecated since

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776062427 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776062427 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1776062427 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -325,7 +341,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manifes

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1774350481 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -81,6 +81,7 @@ public String partition() { // cache filtered manifests to avo

Re: [PR] Bump getdaft from 0.3.2 to 0.3.3 [iceberg-python]

2024-09-25 Thread via GitHub
dependabot[bot] commented on PR #1204: URL: https://github.com/apache/iceberg-python/pull/1204#issuecomment-2375358770 Superseded by #1209. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Bump getdaft from 0.3.2 to 0.3.3 [iceberg-python]

2024-09-25 Thread via GitHub
dependabot[bot] closed pull request #1204: Bump getdaft from 0.3.2 to 0.3.3 URL: https://github.com/apache/iceberg-python/pull/1204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Bump getdaft from 0.3.2 to 0.3.4 [iceberg-python]

2024-09-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1209: URL: https://github.com/apache/iceberg-python/pull/1209 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.3.2 to 0.3.4. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] Spark: Added merge schema as spark configuration [iceberg]

2024-09-25 Thread via GitHub
RussellSpitzer commented on PR #9640: URL: https://github.com/apache/iceberg/pull/9640#issuecomment-2375349511 Thanks for the PR @aleenamg21-1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Spark: Added merge schema as spark configuration [iceberg]

2024-09-25 Thread via GitHub
RussellSpitzer merged PR #9640: URL: https://github.com/apache/iceberg/pull/9640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

Re: [PR] Core: Add ContentFileSet and ContentFileWrapper [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11195: URL: https://github.com/apache/iceberg/pull/11195#discussion_r1776041636 ## api/src/main/java/org/apache/iceberg/util/ContentFileSet.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Core: Add ContentFileSet and ContentFileWrapper [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11195: URL: https://github.com/apache/iceberg/pull/11195#discussion_r1776041636 ## api/src/main/java/org/apache/iceberg/util/ContentFileSet.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Core: Add ContentFileSet and ContentFileWrapper [iceberg]

2024-09-25 Thread via GitHub
aokolnychyi commented on code in PR #11195: URL: https://github.com/apache/iceberg/pull/11195#discussion_r1776041008 ## api/src/main/java/org/apache/iceberg/util/ContentFileSet.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] fix: DayTransform result type override and docs [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on PR #1208: URL: https://github.com/apache/iceberg-python/pull/1208#issuecomment-2375337828 is this the source of truth? https://iceberg.apache.org/spec/#partition-transforms -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1776031911 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/maintenance/stream/ScheduledBuilderTestBase.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Sof

Re: [PR] DO NOT MERGE WILL BREAK [iceberg]

2024-09-25 Thread via GitHub
RussellSpitzer commented on code in PR #11210: URL: https://github.com/apache/iceberg/pull/11210#discussion_r1776029888 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java: ## @@ -193,7 +148,7 @@ public StagedTable stageCreate( } cat

Re: [PR] Build: Bump Spark 3.5 to 3.5.3 [iceberg]

2024-09-25 Thread via GitHub
RussellSpitzer commented on PR #11160: URL: https://github.com/apache/iceberg/pull/11160#issuecomment-2375318216 Theoretical patch for changing to DelegatingCatalogExtension - Note this breaks a bunch of stuff (staging is broken and init has to be skipped so configuration is broken)

Re: [PR] Add Files metadata table [iceberg-python]

2024-09-25 Thread via GitHub
DieHertz commented on PR #614: URL: https://github.com/apache/iceberg-python/pull/614#issuecomment-2375318027 Will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1776008439 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType()

Re: [PR] Add Files metadata table [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on PR #614: URL: https://github.com/apache/iceberg-python/pull/614#issuecomment-2375285704 I think there's definitely room for improvement. @DieHertz do you mind opening an issue for this? -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] `ALTER TABLE ... DROP COLUMN` allows dropping a column used by old PartitionSpecs [iceberg]

2024-09-25 Thread via GitHub
osscm commented on issue #4563: URL: https://github.com/apache/iceberg/issues/4563#issuecomment-2375198579 @hashhar @rdblue any conclusion on this issue, we saw this one with 421 and 438. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add Files metadata table [iceberg-python]

2024-09-25 Thread via GitHub
DieHertz commented on PR #614: URL: https://github.com/apache/iceberg-python/pull/614#issuecomment-2375186118 Hi guys, sorry if it's not the right place to ask this question. Do you know of a viable way to speed up `table.inspect.files()` for large tables? Maybe something in mind that

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-09-25 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2375168340 @danielcweeks thanks a lot for the update and prioritizing the fix. Looking forward to the 1.7 release. @amogh-jahagirdar thanks for all the hard work 🙌 -- This is

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-09-25 Thread via GitHub
rodmeneses commented on PR #10179: URL: https://github.com/apache/iceberg/pull/10179#issuecomment-2375119421 > > Hi @rodmeneses, by everything works I meant that I did some manual tests and the results were the same as with the old one. Probably "everything" was an overkill here ;-) Yes, I

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-09-25 Thread via GitHub
rodmeneses commented on PR #10179: URL: https://github.com/apache/iceberg/pull/10179#issuecomment-2375118465 > Hi @rodmeneses, by everything works I meant that I did some manual tests and the results were the same as with the old one. Probably "everything" was an overkill here ;-) Yes, I ca

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-09-25 Thread via GitHub
arkadius commented on PR #10179: URL: https://github.com/apache/iceberg/pull/10179#issuecomment-2375109539 Hi @rodmeneses, by everything works I meant that I did some manual tests and the results were the same as with the old one. Probably "everything" was an overkill here ;-) Yes, I can ta

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r177578 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/MaintenanceTaskBuilder.java: ## @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Soft

Re: [PR] AWS: Set better defaults for S3 retry behaviour [iceberg]

2024-09-25 Thread via GitHub
jackye1995 commented on PR #11052: URL: https://github.com/apache/iceberg/pull/11052#issuecomment-2374896767 @ookumuso looks like CI failed for some unrelated reason, can you rebase the PR to retrigger the CI? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1775726921 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/ExpireSnapshots.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Flink: id generation for schema starts from 1 [iceberg]

2024-09-25 Thread via GitHub
pvary commented on PR #11209: URL: https://github.com/apache/iceberg/pull/11209#issuecomment-2374893887 Why would we need this? The ID generated by Iceberg should be internal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] AWS: Set better defaults for S3 retry behaviour [iceberg]

2024-09-25 Thread via GitHub
jackye1995 commented on PR #11052: URL: https://github.com/apache/iceberg/pull/11052#issuecomment-2374873258 @amogh-jahagirdar @nastra for the concerns regarding the new config values, to give some additional data points here, we have similar configs internally for quite some time now in Ic

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1775623776 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] AWS: Introduce opt-in S3LocationProvider which is optimized for S3 performance [iceberg]

2024-09-25 Thread via GitHub
jackye1995 commented on PR #2: URL: https://github.com/apache/iceberg/pull/2#issuecomment-2374861417 Sorry for the late review, was busy with some internal work... > it also seems like it would fit cleanly into the existing ObjectStoreLocationProvider as opposed to a separate

Re: [PR] AWS: Introduce opt-in S3LocationProvider which is optimized for S3 performance [iceberg]

2024-09-25 Thread via GitHub
jackye1995 commented on code in PR #2: URL: https://github.com/apache/iceberg/pull/2#discussion_r1775753318 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3LocationProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [I] Why not use the profile name when initialising the S3FileSystem class? [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu commented on issue #1207: URL: https://github.com/apache/iceberg-python/issues/1207#issuecomment-2374840462 I think this is a feature gap on the S3 FileIO. It makes sense to support `profile_name`. We would need to support both [fsspec](https://github.com/apache/iceberg-pyth

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1775734036 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1775726921 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/ExpireSnapshots.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Found

Re: [I] What's the use of old metadata file, why not delete by default? [iceberg]

2024-09-25 Thread via GitHub
eric-maynard commented on issue #11206: URL: https://github.com/apache/iceberg/issues/11206#issuecomment-2374795119 Keeping old metadata helps support [rollback & time travel](https://iceberg.apache.org/docs/latest/spark-queries/#time-travel). It's often useful to know what the state of the

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1775706927 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/ExpireSnapshots.java: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] HA HMS support [iceberg-python]

2024-09-25 Thread via GitHub
kevinjqliu merged PR #752: URL: https://github.com/apache/iceberg-python/pull/752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1775672026 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-25 Thread via GitHub
stevenzwu commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1775689568 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/ExpireSnapshotsProcessor.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

Re: [PR] Core: Remove unused code for streaming position deletes [iceberg]

2024-09-25 Thread via GitHub
wypoon commented on PR #11175: URL: https://github.com/apache/iceberg/pull/11175#issuecomment-2374694585 The Flink CI issues are unrelated to this change. All tests passed prior to my last commit, and the only change in the last commit was updating javadoc comments. -- This is an automat

Re: [PR] PR #1169 [iceberg-python]

2024-09-25 Thread via GitHub
JE-Chen commented on code in PR #1206: URL: https://github.com/apache/iceberg-python/pull/1206#discussion_r1775672026 ## pyiceberg/io/pyarrow.py: ## @@ -1068,20 +1068,13 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return StringType() e

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-09-25 Thread via GitHub
jackye1995 commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1775675415 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3103,95 @@ components: uuid: type: string +ADLSCredential: + type: object +

  1   2   >