pvary commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1890335313
Let's take a step back before rushing to a solution. Here are some things we have to solve: - Serializing long Strings - Serializing extra chars, like Chinese chars - Backward compatibility to read old serialized splits - Performance will become even more important, as we have long buffers, and potentially many splits My current ideas: - Compatibility: We might have to introduce SerializerV3 - Extra chars: Is 2 bytes enough for all chars? For me, some research would be needed - Performance: if possible, reusing buffers Thanks for starting the work on this @javrasya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org