Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1800578445 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -68,9 +68,9 @@ public String partition() { private final Map specsById; private fina

Re: [I] Manifest List/Entry Creation [iceberg-go]

2024-10-15 Thread via GitHub
zeroshade commented on issue #172: URL: https://github.com/apache/iceberg-go/issues/172#issuecomment-2414433498 Thanks for filing this! > I understand that table creation through a catalog is one of the design goals, but direct creation of manifests (snapshots, manifest lists/entries,

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on PR #11262: URL: https://github.com/apache/iceberg/pull/11262#issuecomment-2414157721 @nastra Reflected your review. Could you review new changes when you have a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800721063 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ExtensionsTestBase.java: ## @@ -43,6 +43,8 @@ public static void startMetastoreAndS

Re: [PR] Spec: Fix table of content generation [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11067: URL: https://github.com/apache/iceberg/pull/11067#discussion_r1801414926 ## format/spec.md: ## @@ -158,27 +158,27 @@ Readers should be more permissive because v1 metadata files are allowed in v2 ta Readers may be more strict for

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-10-15 Thread via GitHub
DieHertz commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2414395991 > There's already an ExecutorFactory, do you think we can use that instead of ProcessPoolExecutor? The issue with the `ExecutorFactory` is it's using a `ThreadPool

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
emkornfield commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1801477641 ## format/puffin-spec.md: ## @@ -123,6 +123,49 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct va

Re: [PR] Remove unnecessary copying of FileScanTask [iceberg]

2024-10-15 Thread via GitHub
huaxingao commented on PR #11319: URL: https://github.com/apache/iceberg/pull/11319#issuecomment-2414404248 Thanks, everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1801559797 ## format/puffin-spec.md: ## @@ -123,6 +123,54 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
aokolnychyi commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1801573447 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -82,11 +82,9 @@ abstract class MergingSnapshotProducer extends SnapshotProducer {

Re: [PR] AWS: Introduce opt-in S3LocationProvider which is optimized for S3 performance [iceberg]

2024-10-15 Thread via GitHub
ookumuso commented on code in PR #2: URL: https://github.com/apache/iceberg/pull/2#discussion_r1801578557 ## docs/docs/configuration.md: ## @@ -39,52 +39,53 @@ Iceberg tables support table properties to configure table behavior, like the de ### Write properties -|

Re: [PR] Core: Align CharSequenceSet impl with Data/DeleteFileSet [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11322: URL: https://github.com/apache/iceberg/pull/11322#discussion_r1801116091 ## api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java: ## @@ -18,183 +18,53 @@ */ package org.apache.iceberg.util; -import java.io.Serializable; -impo

Re: [I] flink:FlinkSink support dynamically changed schema [iceberg]

2024-10-15 Thread via GitHub
ottomata commented on issue #4190: URL: https://github.com/apache/iceberg/issues/4190#issuecomment-2413847042 How does [flink-cdc do it](https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/core-concept/schema-evolution/)? -- This is an automated message from the Apache G

Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

2024-10-15 Thread via GitHub
sungwy commented on issue #1227: URL: https://github.com/apache/iceberg-python/issues/1227#issuecomment-2413868478 Thanks for raising this @MrDerecho - in the initial version of add_files, we wanted to limit it to just parquet files that that were created in an external system. The assumpt

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-15 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1801716441 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -829,6 +833,11 @@ protected Map summary() { @Override public List apply(

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-15 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1801716441 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -829,6 +833,11 @@ protected Map summary() { @Override public List apply(

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1801739303 ## format/puffin-spec.md: ## @@ -123,6 +123,54 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1801750718 ## api/src/main/java/org/apache/iceberg/VariantLike.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1801754339 ## api/src/main/java/org/apache/iceberg/VariantLike.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1801754339 ## api/src/main/java/org/apache/iceberg/VariantLike.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1801763229 ## api/src/test/java/org/apache/iceberg/types/TestSerializableTypes.java: ## @@ -112,13 +113,13 @@ public void testMaps() throws Exception { @Test publ

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-15 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1801762698 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -1519,6 +1574,69 @@ public void testConcurrentDeletesRewriteSameDeleteFile() { st

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1801762819 ## api/src/test/java/org/apache/iceberg/util/RandomUtil.java: ## @@ -225,4 +229,8 @@ private static BigInteger randomUnscaled(int precision, Random random) {

Re: [PR] Support wasb[s] paths in ADLSFileIO [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on PR #11294: URL: https://github.com/apache/iceberg/pull/11294#issuecomment-2414799200 @ashvina could you do another pass? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-10-15 Thread via GitHub
kevinjqliu commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2414816052 That's interesting. I thought the `ThreadPoolExexutor` is good for I/O bound tasks such as reading from the avro manifest files. If you have a PoC, its something I'd wa

Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

2024-10-15 Thread via GitHub
kevinjqliu commented on issue #1227: URL: https://github.com/apache/iceberg-python/issues/1227#issuecomment-241482 @MrDerecho would you like to contribute this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

2024-10-15 Thread via GitHub
kevinjqliu commented on issue #1227: URL: https://github.com/apache/iceberg-python/issues/1227#issuecomment-2414821679 I think we can relax the constraints for `add_files` to allow field ids that are aligned, such as one written by an external engine like Trino. A use case I can thin

Re: [PR] OpenAPI: Add endpoint for refreshing vended credentials [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11281: URL: https://github.com/apache/iceberg/pull/11281#discussion_r1800738992 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3141,32 @@ components: uuid: type: string +Credential: + type: object + requ

Re: [PR] OpenAPI: Add endpoint for refreshing vended credentials [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11281: URL: https://github.com/apache/iceberg/pull/11281#discussion_r1800740656 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3141,32 @@ components: uuid: type: string +Credential: + type: object + requ

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800718584 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800744929 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800718584 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800706912 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ExtensionsTestBase.java: ## @@ -43,6 +43,8 @@ public static void startMetastoreAndSpar

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800718584 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1800720749 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -82,11 +82,9 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { priv

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800721063 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ExtensionsTestBase.java: ## @@ -43,6 +43,8 @@ public static void startMetastoreAndS

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800718584 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-10-15 Thread via GitHub
twuebi commented on PR #11317: URL: https://github.com/apache/iceberg/pull/11317#issuecomment-2413354958 > The problem is that table properties will only be respected by clients which know how to use it, so although you may set this property, you have no guarantee clients will follow the pr

Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

2024-10-15 Thread via GitHub
MrDerecho commented on issue #1227: URL: https://github.com/apache/iceberg-python/issues/1227#issuecomment-2413900767 For context: right now I manage a very large data lake of time partitioned data. The use case has to do with the archival process put into place wherein after a rollin

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-10-15 Thread via GitHub
sungwy commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2413901958 Hi @DieHertz - thank you for raising this issue, and for sharing your benchmarks. I think this is a great idea, that I think we should also consider applying to other `Inspect

Re: [PR] Core: fix NPE with HadoopFileIO because FileIOParser doesn't serialize Hadoop configuration [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #10926: URL: https://github.com/apache/iceberg/pull/10926#discussion_r1800809705 ## core/src/test/java/org/apache/iceberg/hadoop/HadoopFileIOTest.java: ## @@ -176,6 +178,52 @@ public void testResolvingFileIOLoad() { assertThat(result).isInstan

Re: [PR] Revert "feat: Add equality delete writer (#372)" [iceberg-rust]

2024-10-15 Thread via GitHub
Xuanwo commented on PR #672: URL: https://github.com/apache/iceberg-rust/pull/672#issuecomment-2413998124 > @Xuanwo approved since this is a revert PR Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Revert "feat: Add equality delete writer (#372)" [iceberg-rust]

2024-10-15 Thread via GitHub
Xuanwo merged PR #672: URL: https://github.com/apache/iceberg-rust/pull/672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Revert "feat: Add equality delete writer (#372)" [iceberg-rust]

2024-10-15 Thread via GitHub
kevinjqliu commented on PR #672: URL: https://github.com/apache/iceberg-rust/pull/672#issuecomment-2413989340 @Xuanwo approved since this is a revert PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1801225247 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2Branch.java: ## @@ -46,12 +46,19 @@ public class TestFlinkIcebergSinkV2Branch

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-10-15 Thread via GitHub
DieHertz commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2414054569 > Would you be interested in working on this issue? Yes, I'd be happy to contribute back -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Improve Memory Use in SparkScanBuilder [iceberg]

2024-10-15 Thread via GitHub
nastra commented on issue #11245: URL: https://github.com/apache/iceberg/issues/11245#issuecomment-2413444074 fixed by https://github.com/apache/iceberg/pull/11319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Remove unnecessary copying of FileScanTask [iceberg]

2024-10-15 Thread via GitHub
nastra merged PR #11319: URL: https://github.com/apache/iceberg/pull/11319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Workers gets stuck as there is no-coordinator for emitting Start_Commit request in Incremental Cooperative Rebalancing[ICR] Mode [iceberg]

2024-10-15 Thread via GitHub
kumarpritam863 commented on PR #11288: URL: https://github.com/apache/iceberg/pull/11288#issuecomment-2413451166 @bryanck @fqaiser94 @ajantha-bhat We are seeing this issue in ICR mode. Above is our analysis as to why it is happening. Can you please review this analysis. -- This is

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800732927 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameterVal

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2413545697 Any more suggestions for this? PR looks to be ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Core: Deprecate ContentCache.invalidateAll [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #10494: URL: https://github.com/apache/iceberg/pull/10494#discussion_r1800595996 ## core/src/main/java/org/apache/iceberg/io/ContentCache.java: ## @@ -147,10 +147,23 @@ public InputFile tryCache(InputFile input) { return input; } + /** +

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1800592740 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -595,20 +596,20 @@ private List writeDataFileGroup( } protected List writeDeleteManifests

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
tomtongue commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800706543 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameter

Re: [PR] docs: fix spec sidebar [iceberg]

2024-10-15 Thread via GitHub
jdockerty commented on PR #11316: URL: https://github.com/apache/iceberg/pull/11316#issuecomment-2413508562 Closing in favour of https://github.com/apache/iceberg/pull/11067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] docs: fix spec sidebar [iceberg]

2024-10-15 Thread via GitHub
jdockerty closed pull request #11316: docs: fix spec sidebar URL: https://github.com/apache/iceberg/pull/11316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] Spec: Fix table of content generation [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on code in PR #11067: URL: https://github.com/apache/iceberg/pull/11067#discussion_r1800898330 ## format/spec.md: ## @@ -158,27 +158,27 @@ Readers should be more permissive because v1 metadata files are allowed in v2 ta Readers may be more strict for m

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1800594126 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -162,10 +160,10 @@ protected Expression rowFilter() { protected List addedDataFiles()

Re: [PR] Flink: Add IcebergSinkBuilder interface allowed unification of most of operations on FlinkSink and IcebergSink Builders [iceberg]

2024-10-15 Thread via GitHub
pvary commented on PR #11305: URL: https://github.com/apache/iceberg/pull/11305#issuecomment-2413178657 @stevenzwu: The failure for ` TestFlinkIcebergSinkRangeDistributionBucketing > testBucketNumberHigherThanWriterParallelismNotDivisible()` should not be related. Do we know if it is sti

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11262: URL: https://github.com/apache/iceberg/pull/11262#discussion_r1800693298 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -225,7 +222,11 @@ private Stream createContextForParameters( Stream parameterVal

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
rdblue commented on PR #11238: URL: https://github.com/apache/iceberg/pull/11238#issuecomment-2414554731 > > I’m not sure it’s worth drawing a line in the sand over this particular issue and I’d like to talk about it a bit more as a community before we merge this. I don’t want to set a prec

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
aokolnychyi commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1801592765 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -595,20 +596,22 @@ private List writeDataFileGroup( } protected List writeDeleteMani

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11254: URL: https://github.com/apache/iceberg/pull/11254#discussion_r1801607994 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -595,20 +596,22 @@ private List writeDataFileGroup( } protected List writeDeleteManifests

[I] Document table properties [iceberg-python]

2024-10-15 Thread via GitHub
kevinjqliu opened a new issue, #1231: URL: https://github.com/apache/iceberg-python/issues/1231 ### Feature Request / Improvement While debugging, we noticed that certain table properties are not documented in the [configurations page](https://py.iceberg.apache.org/configuration/#wri

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1801639837 ## format/puffin-spec.md: ## @@ -123,6 +123,54 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct

Re: [I] Support Nessie catalog [iceberg-python]

2024-10-15 Thread via GitHub
sean-pasabi commented on issue #19: URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2414642909 @cee-shubham I am having a similar issue. If someone has managed to load a Nessie catalog using pyicebergs `RestCatalog`, that would be greatly appreciated. -- This is an a

Re: [PR] feat(catalog/glue): add support for list namespaces [iceberg-go]

2024-10-15 Thread via GitHub
nastra merged PR #169: URL: https://github.com/apache/iceberg-go/pull/169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

2024-10-15 Thread via GitHub
kevinjqliu commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2414173534 As an aside, I think parallelly reading multiple manifests is something we'd want to reuse at other parts of the program -- This is an automated message from the Apache

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-15 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1801556767 ## format/puffin-spec.md: ## @@ -123,6 +123,54 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra merged PR #11254: URL: https://github.com/apache/iceberg/pull/11254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Core: Track data files by spec id instead of full PartitionSpec [iceberg]

2024-10-15 Thread via GitHub
nastra commented on code in PR #11323: URL: https://github.com/apache/iceberg/pull/11323#discussion_r1801684210 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -138,20 +138,16 @@ protected boolean isCaseSensitive() { } protected PartitionSpec

Re: [PR] Core: Rename DeleteFileHolder to PendingDeleteFile / Optimize duplicate data/delete file detection [iceberg]

2024-10-15 Thread via GitHub
nastra commented on PR #11254: URL: https://github.com/apache/iceberg/pull/11254#issuecomment-2414677896 thanks @aokolnychyi for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Spark: Add RewriteTablePath action interface [iceberg]

2024-10-15 Thread via GitHub
szehon-ho commented on code in PR #10920: URL: https://github.com/apache/iceberg/pull/10920#discussion_r1801690970 ## api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] API, Core: Add scan planning apis to REST Catalog [iceberg]

2024-10-15 Thread via GitHub
rahil-c commented on code in PR #11180: URL: https://github.com/apache/iceberg/pull/11180#discussion_r1801692407 ## core/src/main/java/org/apache/iceberg/rest/RESTContentFileParser.java: ## @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Updating SparkScan to only read Apache DataSketches [iceberg]

2024-10-15 Thread via GitHub
jeesou commented on code in PR #11035: URL: https://github.com/apache/iceberg/pull/11035#discussion_r1802349064 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java: ## @@ -911,9 +1027,17 @@ private void checkColStatisticsReported( assertThat

[PR] Spark-3.5: Refactor BaseProcedure to support views [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat opened a new pull request, #11326: URL: https://github.com/apache/iceberg/pull/11326 Would like to add some procedures for Iceberg views and the current framework doesn't support it. Hence, the refactor. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Spark-3.5: Refactor BaseProcedure to support views [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on code in PR #11326: URL: https://github.com/apache/iceberg/pull/11326#discussion_r1802439972 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java: ## @@ -397,4 +401,61 @@ public UnboundFunction loadFunction(Identifier ident

Re: [PR] Spark-3.5: Refactor BaseProcedure to support views [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on code in PR #11326: URL: https://github.com/apache/iceberg/pull/11326#discussion_r1802441910 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -30,12 +30,14 @@ import org.apache.spark.sql.connector.iceberg.catalog.Proced

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.99.2 to 1.100.2 [iceberg-go]

2024-10-15 Thread via GitHub
nastra merged PR #171: URL: https://github.com/apache/iceberg-go/pull/171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] build(deps): bump github.com/aws/smithy-go from 1.21.0 to 1.22.0 [iceberg-go]

2024-10-15 Thread via GitHub
dependabot[bot] closed pull request #163: build(deps): bump github.com/aws/smithy-go from 1.21.0 to 1.22.0 URL: https://github.com/apache/iceberg-go/pull/163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] build(deps): bump github.com/aws/smithy-go from 1.21.0 to 1.22.0 [iceberg-go]

2024-10-15 Thread via GitHub
dependabot[bot] commented on PR #163: URL: https://github.com/apache/iceberg-go/pull/163#issuecomment-2415848281 Looks like github.com/aws/smithy-go is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Spec: Fix table of content generation [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on PR #11067: URL: https://github.com/apache/iceberg/pull/11067#issuecomment-2415625928 @danielcweeks: Thanks for the feedback. I have updated it accordingly. New TOC looks like this. https://github.com/user-attachments/assets/bd43a672-d3aa-49a3-b643-2d

Re: [PR] Flink: Add RowConverter for Iceberg Source [iceberg]

2024-10-15 Thread via GitHub
abharath9 commented on PR #11301: URL: https://github.com/apache/iceberg/pull/11301#issuecomment-2415654755 @stevenzwu can i get a review for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-10-15 Thread via GitHub
ZachDischner commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1802296599 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java: ## @@ -160,6 +172,32 @@ public RewriteManifestsSparkAction sta

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-10-15 Thread via GitHub
ZachDischner commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1802296391 ## api/src/main/java/org/apache/iceberg/actions/RewriteManifests.java: ## @@ -44,6 +47,43 @@ public interface RewriteManifests */ RewriteManifests rewriteIf

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-10-15 Thread via GitHub
ZachDischner commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1802299884 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java: ## @@ -160,6 +172,32 @@ public RewriteManifestsSparkAction sta

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-10-15 Thread via GitHub
ZachDischner commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1802300273 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java: ## @@ -160,6 +172,32 @@ public RewriteManifestsSparkAction sta

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

2024-10-15 Thread via GitHub
ZachDischner commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1802300866 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java: ## @@ -250,12 +288,59 @@ private List writeUnpartitionedManife

Re: [PR] Updating SparkScan to only read Apache DataSketches [iceberg]

2024-10-15 Thread via GitHub
jeesou commented on code in PR #11035: URL: https://github.com/apache/iceberg/pull/11035#discussion_r1802355766 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -198,25 +198,31 @@ protected Statistics estimateStatistics(Snapshot snapshot)

Re: [PR] Spark-3.5: Refactor BaseProcedure to support views [iceberg]

2024-10-15 Thread via GitHub
ajantha-bhat commented on code in PR #11326: URL: https://github.com/apache/iceberg/pull/11326#discussion_r1802441910 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -30,12 +30,14 @@ import org.apache.spark.sql.connector.iceberg.catalog.Proced

Re: [PR] Core: Fix version number in deprecation note for invalidateAll [iceberg]

2024-10-15 Thread via GitHub
findepi merged PR #11325: URL: https://github.com/apache/iceberg/pull/11325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Flink: Add IcebergSinkBuilder interface allowed unification of most of operations on FlinkSink and IcebergSink Builders [iceberg]

2024-10-15 Thread via GitHub
arkadius commented on PR #11305: URL: https://github.com/apache/iceberg/pull/11305#issuecomment-2414951010 > @stevenzwu: The failure for `TestFlinkIcebergSinkRangeDistributionBucketing > testBucketNumberHigherThanWriterParallelismNotDivisible()` should not be related. Do we know if it is s

Re: [PR] Core: fix NPE with HadoopFileIO because FileIOParser doesn't serialize Hadoop configuration [iceberg]

2024-10-15 Thread via GitHub
stevenzwu commented on code in PR #10926: URL: https://github.com/apache/iceberg/pull/10926#discussion_r1801897324 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java: ## @@ -120,6 +120,12 @@ public void setConf(Configuration conf) { @Override public Config

[PR] Core: Fix version number in deprecation note for invalidateAll [iceberg]

2024-10-15 Thread via GitHub
findepi opened a new pull request, #11325: URL: https://github.com/apache/iceberg/pull/11325 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Updating SparkScan to only read Apache DataSketches [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11035: URL: https://github.com/apache/iceberg/pull/11035#discussion_r1802049314 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -198,25 +198,31 @@ protected Statistics estimateStatistics(Snapshot sn

Re: [PR] Core: Deprecate ContentCache.invalidateAll [iceberg]

2024-10-15 Thread via GitHub
findepi commented on code in PR #10494: URL: https://github.com/apache/iceberg/pull/10494#discussion_r1802050253 ## core/src/main/java/org/apache/iceberg/io/ContentCache.java: ## @@ -147,10 +147,23 @@ public InputFile tryCache(InputFile input) { return input; } + /**

Re: [PR] API: Add Variant data type [iceberg]

2024-10-15 Thread via GitHub
RussellSpitzer commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1802058539 ## api/src/main/java/org/apache/iceberg/VariantLike.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] Spec: Support geo type [iceberg]

2024-10-15 Thread via GitHub
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1802140800 ## format/spec.md: ## @@ -483,6 +485,8 @@ Notes: 2. For `float` and `double`, the value `-0.0` must precede `+0.0`, as in the IEEE 754 `totalOrder` predicate. NaNs

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-15 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1802167002 ## core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java: ## @@ -163,6 +163,12 @@ protected void validate(TableMetadata base, Snapshot parent) { }

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-10-15 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1802165212 ## core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java: ## @@ -163,6 +163,12 @@ protected void validate(TableMetadata base, Snapshot parent) { }

  1   2   >