Anything that breaks data on disk is also a big PITA. This change would break data on disk.
-- Mike Stolz Principal Engineer, GemFire Product Lead Mobile: +1-631-835-4771 On Mon, Dec 4, 2017 at 1:52 PM, Dan Smith <dsm...@pivotal.io> wrote: > The new protocol is currently translating from PDX->JSON before sending > results to the clients so the client doesn't have to understand PDX or > DataSerializable. > > There is a lot more to DataSerializable than just how a String is > serialized. And it's not even documented that I am aware of. Just tweaking > the string format is not going to make that much better. Your hypothetical > ruby developer is in trouble with or without this proposed change. > > Breaking compatibility is a huge PITA for our users. We should do that when > we are actually giving them real benefits. In this case if we were > switching to some newer PDX format that was actually easy to implement > deserialization logic I could see the argument for breaking compatibility. > Just changing the string format without fixing the rest of the issues > around DataSerializable isn't providing real benefits. > > You can't assume that a client in one language will only be serializing > > strings for it's own consumption. > > > > I wasn't making that assumption. The suggestion is that the C++ client > would have to deserialize all 4 valid formats, but it could just always > serialize data using the valid UTF-16 format. All other clients should be > able to deserialize that. > > -Dan > > On Fri, Dec 1, 2017 at 5:35 PM, Jacob Barrett <jbarr...@pivotal.io> wrote: > > > On Fri, Dec 1, 2017 at 4:59 PM Dan Smith <dsm...@pivotal.io> wrote: > > > > > I think I'm kinda with Mike on this one. The existing string format > does > > > seem pretty gnarly. But the complexity of implementing and testing all > of > > > the backwards compatibility transcoding that would be required in order > > to > > > move to the new proposed format seems to be way more work with much > more > > > possibility for errors. Do we really expect people to be writing new > > > clients that use DataSerializable? It hasn't happened yet, and we're > > > working on a new protocol that uses protobuf right now. > > > > > > > Consider that any new clients written would have to implement all these > > encodings. This is going to make writing new clients using the upcoming > new > > protocol laborious. The new protocol does not define object encoding, it > > strictly defines message encoding. Objects sent over the protocol will > have > > to be serialized in some format, like PDX or data serializer. We could > > alway develop a better serialization format than what we have now. If we > > don't develop something new then we have to use the old. Wouldn't it be > > nice if the new clients didn't have to deal with legacy encodings? > > > > If the issue is really the complexity of serialization from the C++ > client, > > > maybe the C++ client could always write UTF-16 strings? > > > > > > > You can't assume that a client in one language will only be serializing > > strings for it's own consumption. We have many people using strings in > PDX > > to transform between C++, .NET and Java. > > > > The risk is high not to remove this debt. If I am developing a new Ruby > > client I am forced to deal with all 4 of these encodings. Am I really > going > > to want to build a Ruby client for Geode, am I going to get these > encodings > > correct? I can tell you that getting them correct may be a challenge if > the > > current C++ client is any indication, it has a few incorrect assumptions > in > > its encoding of ASCII and modified UTF-8. > > > > I am fine with a compromise that deprecates but doesn't remove the old > > encodings for a few releases. This would give time for users to update. > New > > clients written would not be be able to read this old data but could read > > and write new data. > > > > > > > > -Jake > > >