[I] Failed to assign splits due to the serialized split size [iceberg]

via GitHub Thu, 04 Jan 2024 02:09:14 -0800


javrasya opened a new issue, #9410:
URL: https://github.com/apache/iceberg/issues/9410


   ### Apache Iceberg version
   
   1.4.2 (latest release)
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   Hi there, I am trying to consume records from an Iceberg table in my Flink 
application and I am running into the following issue;
   
   ```
   Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to serialize 
splits.
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$4(SourceCoordinatorContext.java:223)
        at java.base/java.util.HashMap.forEach(HashMap.java:1337)
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$5(SourceCoordinatorContext.java:213)
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:428)
        ... 14 more
   Caused by: java.io.UTFDataFormatException: Encoded string is too long: 123214
        at 
org.apache.flink.core.memory.DataOutputSerializer.writeUTF(DataOutputSerializer.java:257)
        at 
org.apache.iceberg.flink.source.split.IcebergSourceSplit.serializeV2(IcebergSourceSplit.java:150)
        at 
org.apache.iceberg.flink.source.split.IcebergSourceSplitSerializer.serialize(IcebergSourceSplitSerializer.java:42)
        at 
org.apache.iceberg.flink.source.split.IcebergSourceSplitSerializer.serialize(IcebergSourceSplitSerializer.java:25)
        at 
org.apache.flink.runtime.source.event.AddSplitEvent.<init>(AddSplitEvent.java:44)
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$4(SourceCoordinatorContext.java:220)
        ... 17 more
   ```
   
   Not really sure why it gets too big but  when I looked at the source code 
[here](https://github.com/apache/iceberg/blob/27e8c421358378bd80bed8b328d5b69e884b7484/flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java#L148-L151),
 it might be because there is too many file scan task in one split and that is 
why this is happening.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Failed to assign splits due to the serialized split size [iceberg]

Reply via email to