+1 to making PDX being more usable by adding an API rather than the current flag.
I also think we should make the client-side usage of PDX simpler by always using ReflectionBasedAutoSerializer when no serialization mechanism has been specified see GEODE-2722 <https://issues.apache.org/jira/browse/GEODE-2722>. On Thu, Jan 25, 2018 at 11:07 AM John Blum <jb...@pivotal.io> wrote: > I have always thought/wondered, why not just store the data in serialized > form always. There are several reasons to do so... > > 1. Whenever data is transferred between client & server, between peers, > over the WAN, overflowed to disk or persisted to disk, it must be > serialized. > 2. Naturally it follows that if the data is always stored in serialized > form, it cuts down on de/serialization overhead. > 3. Additionally, there is no need for or reduces the flags and other > configuration settings to configure serialization making it simpler to > understand, simpler to use. > 4. When using PDX, Apache Geode is immediately interoperable between > multiple language clients, primarily Java and .NET/C++, but even other > language clients, e.g. JavaScript, Ruby, etc, where JSON is serialized to > PDX. > 5. PDX is queryable without deserialization. This is HUGE and maybe the > most important reason! > > > > The last 2 points suggest that the default serialization format should be > PDX, and truthfully, I am not really opposed to that. Although, there are > some problems with this. > > A. PDX does not handle cyclic dependencies unlike Java Serialization. > However, Java Serialization has massive overhead and is not interoperable > with native language and other language clients (e.g. JavaScript). > > B. PDX does not handle Deltas unlike DataSerialization. However, even when > using Deltas with DataSerialization, you must deserialize the data to apply > the delta. Quite frankly and ironically, PDX seems better suited to > handling Deltas than DataSerialization, and without deserializing. > > So, I would double down on PDX and forget DataSerialization and Java > Serialization. And by "forget", I mean that Apache Geode never "stores" > DataSerialized or Java Serialized bytes; only PDX! > > Therefore, solve the cyclic dependency problem and introduce proper Delta > handling without deserialization. Then, optimize it! Make PDX the best > serialization option for Java, and specifically for Apache Geode. With 1 > serialization format to worry about there is less to maintain, less data to > convert if the user needs to switch. Flexibility is not always a good > thing. It is easier to build up than to build down if you know what I > mean. > > I have made PDX a first class citizen in *Spring Data for Apache Geode*, in > multiple key functional areas of the framework (e.g. Repositories) and dead > simple to use/enable (i.e. @EnablePdx). > > > > Regarding .NET/C++. Truthfully, I don't really buy the argument that > .NET/C++ users shouldn't have to write Java types. If the data is always > kept serialized, then technically they shouldn't have to, but they are > already writing their Functions in Java. Besides, it is not like every > type needs a Java type, only types that need to be deserialized, if at all. > > If the application consists of both Java and .NET/C++ clients, and the Java > devs want to work with high-level Java types, then they don't really have a > choice. However, we can keep the de/serialization overhead at the point of > access (e.g. in the Function, executed on a particular node, at the time of > access), to a minimum. > > A simple API like... > > JavaType object = pdxInstance.getObject(Class<?> type); > > ... would do the trick. > > The type argument does not need to be the original type that the PDX type > meta-data was created from, either. It could be a "projection". The only > concern Apache Geode has is mapping PDX fields to an instance of > "JavaType", where PDX fields are mapped to writable "JavaType" properties > (perhaps using Reflection here). > > If the JavaType does not contain a property matching a PDX field, no big > deal. This is the basis for our versioned type handling anyhow > (adding/removing a field/property). However, the inverse is a bit more an > interesting problem, the JavaType has a field/property that is not > currently stored in PDX. Perhaps throw an error, or provide a default > value, or whatever. That could be configurable. > > Maybe, just maybe, a user has the ability to provide their own Converter, > with it's own custom behavior... > > interface Converter<T> { > > T convert(PdxInstance pdxInstance); > > } > > class JavaTypeConverter extends Converter<JavaType> { > > JavaType convert(PdxInstance pdxInstance) { ... } > > } > > Then... > > Converter<JavaType> javaTypeConverter = new JavaTypeConverter(); > ... > JavaType object = pdxInstance.getObject(javaTypeConverter); > > > *One final thought...* > > Ultimately, I'd like to see Apache Geode introduce a common > framework/interface for serialization, so that different serialization > strategies, or "providers", could be introduced and used by our users based > on their preferences and/or application's needs. > > Keep in mind, the users data might not just live in Apache Geode, which is > particularly true in an increasingly Microservices world. Other > technologies (e.g. Messaging Buses/Queues) are not going to know PDX. PDX > would be the default, enabled serialization strategy/provider for Apache > Geode, provided by Apache Geode OOTB. This maybe 1 reason to still support > Java Serialization, given it is a universal serialization format between > disparate technologies, but Apache Geode should never store Java Serialized > bytes, only PDX. > > > > Anyway, if you are still with me (sorry about length, just dumping all my > thoughts over the past few years) take all this with a grain of salt (and > maybe a slice of lemon, ;-).I was just thinking out loud and long term, as > both (previously) an engineer on Apache Geode as well as a user. > > Food for thought. > > Regards, > John > > > > On Thu, Jan 25, 2018 at 9:55 AM, Anilkumar Gingade <aging...@pivotal.io> > wrote: > > > Internally, there is an option to override read-serialized flag (to > true); > > the query engine and other components uses this to keep the data in > > serialized form and work with PdxInstance... > > > > public static void setPdxReadSerialized(Cache cache, boolean > > readSerialized); > > > > We had discussed, making this as a public api...so that any thread that > can > > work on PdxInstance can take advantage of it... > > > > -Anil. > > > > > > On Thu, Jan 25, 2018 at 9:42 AM, Jacob Barrett <jbarr...@pivotal.io> > > wrote: > > > > > Bruce, the flag only applies to values serialized with PDX, > > > DataSerializable objects are not effected by this property. > > > > > > I think there is some real value here as a stop gap until we have a > > better > > > solution in Geode 2 where the user can have a per request context that > > > specifies what return type they would like. Consider the user that has > an > > > existing application that uses domain objects but wants to deploy > another > > > application that doesn't to the same Geode cluster. The only option is > to > > > either have all PDX deserialize to domain objects or all returned as > > > PdxInstance. One of the two applications will not work without > > > modification. Changing the behavior described by Addison splits the > > > difference. If the application is, like it is by default, configure to > > > deserialize PDX to the domain object but the domain object is not > > deployed > > > it will now give back the PDX instance rather than failing. > > > > > > An explicit use case is user that has both a Java and .NET application. > > The > > > .NET application does not have any Java domain objects to deploy to the > > > server but does want to query or run server side functions. The Java > > > application has deployed the domain objects to the server and > distributed > > > functions are written expecting those domain objects on the server. The > > > user would have to create Java domain objects for the .NET application > or > > > modify their Java application to expect PdxInstance. > > > > > > > > > -Jake > > > > > > > > > On Thu, Jan 25, 2018 at 7:38 AM Bruce Schuchardt < > bschucha...@apache.org > > > > > > wrote: > > > > > > > +1 > > > > > > > > I've found the current read-serialized property to be pretty useless. > > > > > > > > Having said that, what if the value isn't actually in serialized form > > in > > > > the local cache? Is Geode supposed to serialize it & return it? > What > > > > if it isn't PDX-serialized? Do we return a byte array? > > > > > > > > > > > > On 1/24/18 12:21 PM, Dan Smith wrote: > > > > > Is this really just a workaround for the fact that the > > read-serialized > > > > flag > > > > > applies to the whole cache? I can see that if you have mix of > regions > > > > with > > > > > and without domain classes on the server you might want this > feature. > > > Can > > > > > you provide some more background on your use case? > > > > > > > > > > IMO we should get rid of read-serialized in favor of APIs that let > > the > > > > user > > > > > decide whether they get a domain class or a PdxInstance. > > > > > > > > > > -Dan > > > > > > > > > > On Wed, Jan 24, 2018 at 9:58 AM, Galen O'Sullivan < > > > gosulli...@pivotal.io > > > > > > > > > > wrote: > > > > > > > > > >> Hi Addison, > > > > >> > > > > >> What kind of setup do you have that is causing you to have PDX > > > > serialized > > > > >> objects that cannot be deserialized? Do you have classes that are > > > > present > > > > >> on some servers but not others? > > > > >> > > > > >> This change would break the contract of region.get() , which is > that > > > it > > > > >> returns a value of a type that can be put into the region. > > > > >> > > > > >> Returning something that isn't what the user is expecting to be in > > the > > > > >> region would require users to check for PdxInstances every time > they > > > > get a > > > > >> value -- even though the type annotations say that you can't get a > > > > >> PdxInstance back (except for Region<Object,Object> ). > > > > >> > > > > >> I think it would be better to have a second API that allows users > to > > > get > > > > >> and put PdxInstance objects in regions. That way, if they want to > > > handle > > > > >> the exceptional case where they have a serialized object that > can't > > be > > > > >> deserialized, they can catch the ClassNotFound exception and get > the > > > > >> underlying PdxInstance. > > > > >> > > > > >> I do think that the possibility of a ClassNotFoundException should > > be > > > > >> documented in the Region API. > > > > >> > > > > >> Cheers, > > > > >> Galen > > > > >> > > > > >> On Tue, Jan 23, 2018 at 2:56 PM, Addison Huddy <ahu...@pivotal.io > > > > > > wrote: > > > > >> > > > > >>> Hi Geode Devs, > > > > >>> > > > > >>> I'm proposing the following change to how we handle > deserialization > > > > when > > > > >>> Domain Objects can't be found and pdx-serialize=false. > > > > >>> > > > > >>> https://issues.apache.org/jira/browse/GEODE-4367 > > > > >>> > > > > >>> Looking forward to the discussion. > > > > >>> > > > > >>> \ah > > > > >>> > > > > > > > > > > > > > > > > > -- > -John > john.blum10101 (skype) >