The new protocol is currently translating from PDX->JSON before sending results to the clients so the client doesn't have to understand PDX or DataSerializable.
There is a lot more to DataSerializable than just how a String is serialized. And it's not even documented that I am aware of. Just tweaking the string format is not going to make that much better. Your hypothetical ruby developer is in trouble with or without this proposed change. Breaking compatibility is a huge PITA for our users. We should do that when we are actually giving them real benefits. In this case if we were switching to some newer PDX format that was actually easy to implement deserialization logic I could see the argument for breaking compatibility. Just changing the string format without fixing the rest of the issues around DataSerializable isn't providing real benefits. You can't assume that a client in one language will only be serializing > strings for it's own consumption. > I wasn't making that assumption. The suggestion is that the C++ client would have to deserialize all 4 valid formats, but it could just always serialize data using the valid UTF-16 format. All other clients should be able to deserialize that. -Dan On Fri, Dec 1, 2017 at 5:35 PM, Jacob Barrett <jbarr...@pivotal.io> wrote: > On Fri, Dec 1, 2017 at 4:59 PM Dan Smith <dsm...@pivotal.io> wrote: > > > I think I'm kinda with Mike on this one. The existing string format does > > seem pretty gnarly. But the complexity of implementing and testing all of > > the backwards compatibility transcoding that would be required in order > to > > move to the new proposed format seems to be way more work with much more > > possibility for errors. Do we really expect people to be writing new > > clients that use DataSerializable? It hasn't happened yet, and we're > > working on a new protocol that uses protobuf right now. > > > > Consider that any new clients written would have to implement all these > encodings. This is going to make writing new clients using the upcoming new > protocol laborious. The new protocol does not define object encoding, it > strictly defines message encoding. Objects sent over the protocol will have > to be serialized in some format, like PDX or data serializer. We could > alway develop a better serialization format than what we have now. If we > don't develop something new then we have to use the old. Wouldn't it be > nice if the new clients didn't have to deal with legacy encodings? > > If the issue is really the complexity of serialization from the C++ client, > > maybe the C++ client could always write UTF-16 strings? > > > > You can't assume that a client in one language will only be serializing > strings for it's own consumption. We have many people using strings in PDX > to transform between C++, .NET and Java. > > The risk is high not to remove this debt. If I am developing a new Ruby > client I am forced to deal with all 4 of these encodings. Am I really going > to want to build a Ruby client for Geode, am I going to get these encodings > correct? I can tell you that getting them correct may be a challenge if the > current C++ client is any indication, it has a few incorrect assumptions in > its encoding of ASCII and modified UTF-8. > > I am fine with a compromise that deprecates but doesn't remove the old > encodings for a few releases. This would give time for users to update. New > clients written would not be be able to read this old data but could read > and write new data. > > > > -Jake >