pvary commented on issue #9410:
URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1877233309

   @stevenzwu: After a quick check, I have found this:
   
https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/main/java/org/apache/flink/core/memory/DataOutputSerializer.java#L256-L260
   ```
           if (utflen > 65535) {
               throw new UTFDataFormatException("Encoded string is too long: " 
+ utflen);
           } else if (this.position > this.buffer.length - utflen - 2) {
               resize(utflen + 2);
           }
   ```
   
   This means that anything which is above 64k could not be serialized by 
`DataOutputSerializer.writeUTF`, which seems a bit arbitrary limit for me.
   
   We could use `DataOutputSerializer.writeChars` which uses 
`DataOutputSerializer.writeChar`. The downside is that it is less effective if 
we use simple chars (between `0x0001` and ` 0x007F`): See: 
https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/main/java/org/apache/flink/core/memory/DataOutputSerializer.java#L245C1-L254C10
   ```
           for (int i = 0; i < strlen; i++) {
               c = str.charAt(i);
               if ((c >= 0x0001) && (c <= 0x007F)) {
                   utflen++;
               } else if (c > 0x07FF) {
                   utflen += 3;
               } else {
                   utflen += 2;
               }
           }
   ```
   
   The upside that it should work regardless of the size of the string.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to