I think there is value in having a single string encoding.

Sarge

> On 1 Dec, 2017, at 17:35, Jacob Barrett <jbarr...@pivotal.io> wrote:
> 
> On Fri, Dec 1, 2017 at 4:59 PM Dan Smith <dsm...@pivotal.io> wrote:
> 
>> I think I'm kinda with Mike on this one. The existing string format does
>> seem pretty gnarly. But the complexity of implementing and testing all of
>> the backwards compatibility transcoding that would be required in order to
>> move to the new proposed format seems to be way more work with much more
>> possibility for errors. Do we really expect people to be writing new
>> clients that use DataSerializable? It hasn't happened yet, and we're
>> working on a new protocol that uses protobuf right now.
>> 
> 
> Consider that any new clients written would have to implement all these
> encodings. This is going to make writing new clients using the upcoming new
> protocol laborious. The new protocol does not define object encoding, it
> strictly defines message encoding. Objects sent over the protocol will have
> to be serialized in some format, like PDX or data serializer. We could
> alway develop a better serialization format than what we have now. If we
> don't develop something new then we have to use the old. Wouldn't it be
> nice if the new clients didn't have to deal with legacy encodings?
> 
> If the issue is really the complexity of serialization from the C++ client,
>> maybe the C++ client could always write UTF-16 strings?
>> 
> 
> You can't assume that a client in one language will only be serializing
> strings for it's own consumption. We have many people using strings in PDX
> to transform between C++, .NET and Java.
> 
> The risk is high not to remove this debt. If I am developing a new Ruby
> client I am forced to deal with all 4 of these encodings. Am I really going
> to want to build a Ruby client for Geode, am I going to get these encodings
> correct? I can tell you that getting them correct may be a challenge if the
> current C++ client is any indication, it has a few incorrect assumptions in
> its encoding of ASCII and modified UTF-8.
> 
> I am fine with a compromise that deprecates but doesn't remove the old
> encodings for a few releases. This would give time for users to update. New
> clients written would not be be able to read this old data but could read
> and write new data.
> 
> 
> 
> -Jake

Reply via email to