Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-10 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2533696010 > Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review! > > @Guosmilesmile please prepare the backport commits to the other Flink versions. > > Th

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-10 Thread via GitHub
pvary commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2532940822 Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review! @Guosmilesmile please prepare the backport commits to the other Flink versions. Thanks, Peter --

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-10 Thread via GitHub
pvary merged PR #11662: URL: https://github.com/apache/iceberg/pull/11662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
pvary commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2530496119 @stevenzwu: any last minute comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876608916 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876362946 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSerializer {

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876059320 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876059320 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-09 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876033356 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSerializer {

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-05 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-252050 > I think this looks good, but I'll defer to @stevenzwu and @pvary for the final approval. @mxm Thank you very much for taking the time to review my code. The relevant comm

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-05 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1871472107 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java: ## @@ -73,12 +73,25 @@ static byte[] serializeCompletedStatistics( }

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-04 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1870045831 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java: ## @@ -73,12 +73,24 @@ static byte[] serializeCompletedStatistics(

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-04 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1869958746 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java: ## @@ -73,12 +73,24 @@ static byte[] serializeCompletedStatistics( }

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-04 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2517084700 @mxm @pvary @stevenzwu Thank you all very much for your guidance. I would appreciate it if you could take another look and let me know what needs to be done next to keep things mo

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511169797 @mxm Thank you very much for your suggestions. I have made the necessary modifications, and I appreciate you taking the time out of your busy schedule to review it again. I am ver

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511130152 @mxm Thank you very much for your suggestions. I have made the necessary modifications, and I appreciate you taking the time out of your busy schedule to review it again. I am ver

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1865523100 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -310,10 +366,16 @@ public void writeSnapshot(DataOutputView out) th

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-02 Thread via GitHub
mxm commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511041771 Thanks for the update @Guosmilesmile! Unfortunately, we don't have a way to encode the serializer version for all serializers, so a best-effort approach to retry with a different serializer

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-29 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507769750 Hi @mxm! Thank you very much for your suggestions. Following your advice, I added a version number to SortKeySerializer through restoreSerializer, implementing different serializa

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-29 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1863289530 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -310,10 +324,16 @@ public void writeSnapshot(DataOutputView out) th

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-29 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1863289530 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -310,10 +324,16 @@ public void writeSnapshot(DataOutputView out) th

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-28 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507255659 @mxm I'm sorry, I'm a bit confused and would like to ask for your advice. Which part do I need to modify? Based on the current modifications, simply change TypeSerializerSc

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-28 Thread via GitHub
mxm commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507208103 @Guosmilesmile Ideally, we want to return compatibleAfterMigration() and make sure the old and new serializer can be instantiated. -- This is an automated message from the Apache Git Ser

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-28 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2506978587 @mxm @stevenzwu @pvary Thank you all for your suggestions. I have submitted a version that mainly modifies the SortKeySerializerSnapshot and implements version detection. If the v

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-28 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2505917957 @mxm Thank you very much for your suggestions. I need to add a version check in SortKeySerializerSnapshot. If the state is restored from an old version, I will directly return Typ

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-28 Thread via GitHub
mxm commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861944660 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, DataOutputV

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2505426989 @pvary Because a new flag has been added, the testSerializationSize test case needs to be modified to reflect the corresponding size value + 1byte for the flag . It has been adjus

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861570865 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, DataOutpu

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, D

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861436171 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, D

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, D

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, D

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
stevenzwu commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861329534 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,14 @@ public void serialize(SortKey record, DataO

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2504787892 @pvary I'm sorry, I just re-uploaded the missing parts. Please help me trigger the test again. I apologize for the inconvenience. -- This is an automated message from the Apache

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860984047 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java: ## @@ -152,6 +152,80 @@ public void testDataStatist

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860974486 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java: ## @@ -152,6 +152,80 @@ public void testDataStatist

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860967334 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java: ## @@ -152,6 +152,80 @@ public void testDataStatisticsEvent

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2504108856 @pvary I have also added a unit test for the scenario of building from the table to the sink, which includes data containing null values. Could you please review it as well? Thank

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503954677 @pvary I have added test cases for DataStatisticsOperator and DataStatisticsCoordinator, specifically for the null scenario. I verified the ProcessElement and EventHand scenarios,

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503806486 @pvary Thank you very much for your guidance. I have made the changes according to your suggestions. Please take a look and see if they are feasible. -- This is an automated me

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503849860 Thanks for the changes @Guosmilesmile! One question remains from my side: > It would be nice to have an end2end test for null values too. > Currently we only tests that the sta

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503747694 @Guosmilesmile: Thanks for the changes. Left some comments, but started the tests to see if this change cause any other test failures. Please remove the 1.19, 1.18 changes for now -

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860565686 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsOperator.java: ## @@ -136,6 +137,27 @@ public void testProcessElement(Statist

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-27 Thread via GitHub
pvary commented on code in PR #11662: URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860563456 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java: ## @@ -124,6 +124,13 @@ public void serialize(SortKey record, DataOutpu

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503066336 @pvary Yes, you are right. We changed our approach to handle null values instead of filtering them out. Before serialization, we added a flag for each field, where true indicates

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502780003 @ConeyLiu yes,I have added UT to cover the relevant changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
ConeyLiu commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502609743 Thanks for the contributions. Should we better handle the null value instead of skipping it? Could you also add the UT to cover these changes? cc @stevenzwu @pvary -- This is an

[PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-11-26 Thread via GitHub
Guosmilesmile opened a new pull request, #11662: URL: https://github.com/apache/iceberg/pull/11662 When configuring the distribution mode to RANGE, if the partition field in the data contains null values, it will cause the SortKey serialization to fail, resulting in the job continuously res