Re: Customizable Serialization

Russell Keith-Magee Tue, 05 Apr 2011 07:49:52 -0700

On Thu, Mar 31, 2011 at 7:38 PM, Vivek Narayanan <m...@vivekn.co.cc> wrote:
> Hi Russ,
>
> Thanks for the suggestions once again, I've thought of changing the
> model for handling nested fields.
>
> Each model can have a no of serializers, and they can be plugged in to
> other serializers
> and in this way nested models can be handled instead of cycling
> through a tree of arbitrary depth.
> Each serializer object would have a ``dump_object()`` method that
> would return a python object.
> This method is called when a serializer is plugged into another in a
> nested model.
>
>>My point is that it should be *possible* to define a "generic"
>>serialization strategy -- after all, that's what Django does right
>>now. If arguments like this do exist, they should essentially be
>>arguments used to instantiate a specific serialization strategy,
>>rather than something baked into the serialization API.
>
> Yes, I'll be adding arguments ``attrs`` and ``exclude`` to the
> __init__ method of the serializer.
>
> Here is how the API can be used for generating the formats in those 2
> examples:
>
> """
> For the first option.
> """
>
> class OriginalOutput(JSONSerializer):
>    wrap_fields = "fields"
>    indent = 4
>
>    def meta2_pk(self, obj):
>        #get pk
>    def meta2_model(self, obj):
>        #get model name


So, by my count, here's a list of your implicit assumptions in this
syntactical expression:

 * the fields are embedded in a substructure within the main model serialization
 * ... and this is something that is sufficiently common that it
deserves first-class representation in your serializer syntax
 * The pk is serialized as a top-level attribute
 * ... and it's serialized using the attribute name 'pk'
 * The model name is serialized as a top-level attribute
 * ... and it's serialized using the attribute name 'model'
 * indentation is something that has been defined as an attribute of
the serializer, rather than as a configurable item decided at
rendering time (like it is currently)

There are also a bunch of implicit rules about the ways foreign keys,
m2ms and so on are rendered. These aren't so concerning, since there
needs to be some defaults if the syntax is going to be even remotely
terse. However, it bears repeating that they are implicit.

> """
> The second option.
> """
>
> class Base(JSONSerializer):
>    """
>    A base serializer with formatting options.
>    """
>    indent = 4
>
> class Editor(Base):
>    """
>    This can be used to serialize a Person model.
>    Note the company field, if no such Serializer object is specified,
> the foreign key is returned.
>
>    By specifying the ``attrs`` argument while initializing the
> object, one can restrict the fields
>    being serialized. Otherwise, all available fields and metadata
> will be serialized.
>    """
>    company = Company(from_field = 'company', attrs = ['name',
> 'founded'])

ok - so:
 * Where is "Company" defined?
 * Why aren't the attributes of Company embedded one level deep?
 * Why is the attribute named company, and then an argument is passed
to the Company class with the value of 'company'?

> class Author(Base):
>    """
>    The dump_object method is called when a serializer is plugged into
> another and returns a python object,
>    this is normally a dict but this can be overridden.
>    """
>    def dump_object(self, obj):
>        return obj.first_name + ' ' + obj.last_name

 * Why is there a difference between a serializer that returns a list,
a serializer that returns a dictionary, and a serializer that returns
a string?

> class BookDetails(Base):
>    """
>    This is the serializer that will yield the final output.
>
>    Since the 'authors' field is M2M, the Author serializer will be
> applied over the list of authors.
>
>    Aliases is a dict that maps attribute names to their labels during
> serialization.
>
>    """
>    wrap_all = "book details"
>    authors = Author(from_field = 'authors')
>    editor = Editor(from_field = 'editor', attrs = ['firstname',
> 'lastname', 'coworkers', 'phone', 'company'])
>    aliases = {
>        'title': 'book_title',
>        'acount': 'author_count'
>    }
>
>    def meta2_acount(self, obj):
>        # return count of authors
>
>
> The serialized data can be obtained by calling BookDetails.serialize()
> or OriginalOutput.serialize().
>
> Some of the options I didn't cover in the above example are:
>
> * A ``fieldset`` option, which is a list of fields to be serialized.
> If it is not set, all fields are part of the
> serializer. This can be additionally restricted during initialization
> as shown above.
>
> * Reverse relations can be used in the same way as other pluggable
> serializers by
> specifying the ``from_field`` argument as the related name during
> initialization.
>
> I hope this will add more flexibility.

I'm sorry, but this seems extremely confused to me.

You have fields with names like "wrap_all" and "wrap_fields". These
are presumably fixed names that will be interpreted by the parser in a
particular way.

You have fields like "author' and "editor" in the same namespace that
correspond to model attributes.

In the same namespace, you also have functions like meta2_account --
names derived from model attributes.

You also have functions like dump_object().

So - four different types of attribute, all in the same namespace, and
all with different behaviors. What happens with name collisions (e.g.,
a model field named "aliases")? What precedence rules exist?

This proposal had a promising start, but to me, it seems to have gone
massively off the rails. At it's core, serialization should be a very
simple process:

 * I have an object.
 * That object has attributes.
 * Turn that list of attributes into a list of serialized properties
 * Each of those serialized properties may be:
    - a flat value
    - a nested value whose structure is determined by the object itself
    - a nested value whose structure is determined by a related object.

And thats it. There are some minor complications in dealing with the
fact that XML has two different ways of rendering attributes, but that
should be a fairly minor extension of the "get me a list of
attributes" use case. I don't see the mapping between the examples
you've given, and this basic set of principles.

It feels to me like you've looking at a specific example that I've
given you, and every time you've found an exception, you've invented a
new keyword or access mechanism to resolve that use case. What I don't
see is a coherent whole -- a thread of similarity in the way you've
approached the problem.

The deadline for GSoC applications is rapidly approaching; while we
don't need to see an absolutely final API proposal, we do at least
need to see promise that you're moving in the right direction. If you
want to pursue this project, I suggest you take another pass at the
test case I gave you, and try to dramatically simplify your proposed
API.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Customizable Serialization

Reply via email to