Re: DISCUSS: Deprecating and replacing current serializable string encodings

Dan Smith Mon, 04 Dec 2017 13:52:43 -0800

The new protocol is currently translating from PDX->JSON before sending
results to the clients so the client doesn't have to understand PDX or
DataSerializable.


There is a lot more to DataSerializable than just how a String is
serialized. And it's not even documented that I am aware of. Just tweaking
the string format is not going to make that much better. Your hypothetical
ruby developer is in trouble with or without this proposed change.

Breaking compatibility is a huge PITA for our users. We should do that when
we are actually giving them real benefits. In this case if we were
switching to some newer PDX format that was actually easy to implement
deserialization logic I could see the argument for breaking compatibility.
Just changing the string format without fixing the rest of the issues
around DataSerializable isn't providing real benefits.

You can't assume that a client in one language will only be serializing
> strings for it's own consumption.
>

I wasn't making that assumption. The suggestion is that the C++ client
would have to deserialize all 4 valid formats, but it could just always
serialize data using the valid UTF-16 format. All other clients should be
able to deserialize that.

-Dan

On Fri, Dec 1, 2017 at 5:35 PM, Jacob Barrett <[email protected]> wrote:

> On Fri, Dec 1, 2017 at 4:59 PM Dan Smith <[email protected]> wrote:
>
> > I think I'm kinda with Mike on this one. The existing string format does
> > seem pretty gnarly. But the complexity of implementing and testing all of
> > the backwards compatibility transcoding that would be required in order
> to
> > move to the new proposed format seems to be way more work with much more
> > possibility for errors. Do we really expect people to be writing new
> > clients that use DataSerializable? It hasn't happened yet, and we're
> > working on a new protocol that uses protobuf right now.
> >
>
> Consider that any new clients written would have to implement all these
> encodings. This is going to make writing new clients using the upcoming new
> protocol laborious. The new protocol does not define object encoding, it
> strictly defines message encoding. Objects sent over the protocol will have
> to be serialized in some format, like PDX or data serializer. We could
> alway develop a better serialization format than what we have now. If we
> don't develop something new then we have to use the old. Wouldn't it be
> nice if the new clients didn't have to deal with legacy encodings?
>
> If the issue is really the complexity of serialization from the C++ client,
> > maybe the C++ client could always write UTF-16 strings?
> >
>
> You can't assume that a client in one language will only be serializing
> strings for it's own consumption. We have many people using strings in PDX
> to transform between C++, .NET and Java.
>
> The risk is high not to remove this debt. If I am developing a new Ruby
> client I am forced to deal with all 4 of these encodings. Am I really going
> to want to build a Ruby client for Geode, am I going to get these encodings
> correct? I can tell you that getting them correct may be a challenge if the
> current C++ client is any indication, it has a few incorrect assumptions in
> its encoding of ASCII and modified UTF-8.
>
> I am fine with a compromise that deprecates but doesn't remove the old
> encodings for a few releases. This would give time for users to update. New
> clients written would not be be able to read this old data but could read
> and write new data.
>
>
>
> -Jake
>

Re: DISCUSS: Deprecating and replacing current serializable string encodings

Reply via email to