Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2533696010
> Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review!
>
> @Guosmilesmile please prepare the backport commits to the other Flink
versions.
>
> Th
pvary commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2532940822
Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review!
@Guosmilesmile please prepare the backport commits to the other Flink
versions.
Thanks, Peter
--
pvary merged PR #11662:
URL: https://github.com/apache/iceberg/pull/11662
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa
pvary commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2530496119
@stevenzwu: any last minute comments?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876608916
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876362946
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSerializer {
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876059320
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876059320
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSeria
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1876033356
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -52,8 +52,32 @@ class SortKeySerializer extends TypeSerializer {
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-252050
> I think this looks good, but I'll defer to @stevenzwu and @pvary for the
final approval.
@mxm Thank you very much for taking the time to review my code. The relevant
comm
mxm commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1871472107
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java:
##
@@ -73,12 +73,25 @@ static byte[] serializeCompletedStatistics(
}
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1870045831
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java:
##
@@ -73,12 +73,24 @@ static byte[] serializeCompletedStatistics(
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1869958746
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java:
##
@@ -73,12 +73,24 @@ static byte[] serializeCompletedStatistics(
}
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2517084700
@mxm @pvary @stevenzwu Thank you all very much for your guidance. I would
appreciate it if you could take another look and let me know what needs to be
done next to keep things mo
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511169797
@mxm Thank you very much for your suggestions. I have made the necessary
modifications, and I appreciate you taking the time out of your busy schedule
to review it again. I am ver
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511130152
@mxm Thank you very much for your suggestions. I have made the necessary
modifications, and I appreciate you taking the time out of your busy schedule
to review it again. I am ver
mxm commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1865523100
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -310,10 +366,16 @@ public void writeSnapshot(DataOutputView out) th
mxm commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2511041771
Thanks for the update @Guosmilesmile! Unfortunately, we don't have a way to
encode the serializer version for all serializers, so a best-effort approach to
retry with a different serializer
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507769750
Hi @mxm! Thank you very much for your suggestions. Following your advice, I
added a version number to SortKeySerializer through restoreSerializer,
implementing different serializa
mxm commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1863289530
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -310,10 +324,16 @@ public void writeSnapshot(DataOutputView out) th
mxm commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1863289530
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -310,10 +324,16 @@ public void writeSnapshot(DataOutputView out) th
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507255659
@mxm I'm sorry, I'm a bit confused and would like to ask for your advice.
Which part do I need to modify?
Based on the current modifications, simply change
TypeSerializerSc
mxm commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2507208103
@Guosmilesmile Ideally, we want to return compatibleAfterMigration() and
make sure the old and new serializer can be instantiated.
--
This is an automated message from the Apache Git Ser
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2506978587
@mxm @stevenzwu @pvary Thank you all for your suggestions. I have submitted
a version that mainly modifies the SortKeySerializerSnapshot and implements
version detection. If the v
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2505917957
@mxm Thank you very much for your suggestions. I need to add a version check
in SortKeySerializerSnapshot. If the state is restored from an old version, I
will directly return Typ
mxm commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861944660
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, DataOutputV
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2505426989
@pvary Because a new flag has been added, the testSerializationSize test
case needs to be modified to reflect the corresponding size value + 1byte for
the flag . It has been adjus
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861570865
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, DataOutpu
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, D
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861436171
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, D
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, D
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861359801
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, D
stevenzwu commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1861329534
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,14 @@ public void serialize(SortKey record, DataO
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2504787892
@pvary I'm sorry, I just re-uploaded the missing parts. Please help me
trigger the test again. I apologize for the inconvenience.
--
This is an automated message from the Apache
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860984047
##
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java:
##
@@ -152,6 +152,80 @@ public void testDataStatist
Guosmilesmile commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860974486
##
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java:
##
@@ -152,6 +152,80 @@ public void testDataStatist
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860967334
##
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsCoordinator.java:
##
@@ -152,6 +152,80 @@ public void testDataStatisticsEvent
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2504108856
@pvary I have also added a unit test for the scenario of building from the
table to the sink, which includes data containing null values. Could you please
review it as well? Thank
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503954677
@pvary I have added test cases for DataStatisticsOperator and
DataStatisticsCoordinator, specifically for the null scenario. I verified the
ProcessElement and EventHand scenarios,
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503806486
@pvary Thank you very much for your guidance. I have made the changes
according to your suggestions. Please take a look and see if they are feasible.
--
This is an automated me
pvary commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503849860
Thanks for the changes @Guosmilesmile!
One question remains from my side:
> It would be nice to have an end2end test for null values too.
> Currently we only tests that the sta
pvary commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503747694
@Guosmilesmile: Thanks for the changes. Left some comments, but started the
tests to see if this change cause any other test failures.
Please remove the 1.19, 1.18 changes for now -
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860565686
##
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsOperator.java:
##
@@ -136,6 +137,27 @@ public void testProcessElement(Statist
pvary commented on code in PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#discussion_r1860563456
##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java:
##
@@ -124,6 +124,13 @@ public void serialize(SortKey record, DataOutpu
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2503066336
@pvary Yes, you are right. We changed our approach to handle null values
instead of filtering them out. Before serialization, we added a flag for each
field, where true indicates
Guosmilesmile commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502780003
@ConeyLiu yes,I have added UT to cover the relevant changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
ConeyLiu commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2502609743
Thanks for the contributions. Should we better handle the null value instead
of skipping it? Could you also add the UT to cover these changes?
cc @stevenzwu @pvary
--
This is an
Guosmilesmile opened a new pull request, #11662:
URL: https://github.com/apache/iceberg/pull/11662
When configuring the distribution mode to RANGE, if the partition field in
the data contains null values, it will cause the SortKey serialization to fail,
resulting in the job continuously res
48 matches
Mail list logo