javrasya commented on PR #9464:
URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1890390386

   Good catches @pvary , thank you. What if we get full inspiration from 
writeUTF and have our own writer but supports longer JSON. Btw, the reason why 
it limits the size to be 65K max because the first 2 bytes of the serialized 
value holds the length of the UTF and that is unsigned short which can be max 
65K. I have introduced my own writeUTF and called it writeLongUTF/readLongUTF. 
It writes the first bytes which holds the length as int which is 4 bytes 
instead of unsigned short. 
   
   Do mind taking a look at [those changes 
here](https://github.com/apache/iceberg/compare/main...javrasya:iceberg:issue-9410-implement-custom-utf-serde)
 and let me know what you think? I didn't want to update this PR directly 
without talking to you about it? If you think that is good idea, I can proceed 
and merge it on this branch first and we can continue with the discussions here.
   
   But it is not compatible with V2 since that is using initial 2 bytes to 
indicate the length. Introducing v3 is good idea as you suggested. But not 
really sure how we would be able to distinguish a serialized split with V2 
earlier from V3 though 🤔 Do you know how this was done from v1 to v2? Can you 
help me there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to