Re: #7052 - Fixing serialization for content types and auth

Russell Keith-Magee Thu, 05 Nov 2009 20:39:45 -0800

On Thu, Nov 5, 2009 at 11:46 PM, Eric Holscher <eric.holsc...@gmail.com> wrote:
> On Thu, Nov 5, 2009 at 9:29 AM, Russell Keith-Magee <freakboy3...@gmail.com>
> wrote:
>>
>> Hi all,
>>
>> Next on my pony list for v1.2: #7052 - fixing the serializers to work
>> around the problem of serializing dynamically created objects, such as
>> those produced by contrib.auth and contrib.contenttypes. I need some
>> feedback on how much of this solution we need, want, and are
>> comfortable seeing in trunk. Apologies in advance for the long post.
>
> No worries. This is one of my ponies as well.
>>
>> Option 1: Ignore the problem
>> -----------------------------------------
>>
>> Implement the deserializer, but don't try and solve the serialization
>> problem. Treat the lookup syntax for primary keys as a nifty extra you
>> can exploit by hand if need be. Serialization generates integer
>> primary keys, and you can hand modify fixtures to use lookup syntax if
>> you want to.
>
> I think this is a viable option, as you say just to get the code checked in.
> A surprising number of people write fixtures by hand, and this would allow
> them to get nice control over loading. Editing generated ones automatically
> or by hand afterwards also isn't that hard. That said, I think there is a
> better middle ground.
>
>>
>> Option 2: Add a Meta argument for serialization
>> --------------------------------------------------------------------
>>
>> The problem with this approach is that hard-codes a single aspect of
>> serialization into the model. If someone has a different set of
>> requirements for serializing content types under particular
>> circumstances, they will be out of luck.
>
> I agree that this approach feels a bit heavy. I don't know if we really need
> something on the model that tells us how to dump something. Don't we already
> have that data? unique_together and unique=True should provide us enough
> information for making at least a naive implementation.
>
>>
>> Option 3: Add flags/arguments to the serializer to control dynamic dumping
>>
>> ------------------------------------------------------------------------------------------------------------
>>
>> i.e.,
>> ./manage.py dumpdata myapp --format=json --indent=2
>> --lookup=contenttypes.contenttype(app_label,model)
>>
>> It might be possible to simplify this a little by saying that when
>> --lookup=contenttypes.contenttype is specfied, the first
>> unique_together tuple will be used to construct the lookup.
>>
>> This puts complete control in the hand of the user at serialization
>> time. However, the syntax isn't especially elegant, especially given
>> that every single serialization of contenttypes and permissions will,
>> in practice, need to use the --lookup argument.
>
> This is somewhere along the lines of the approach that I was considering.
> I'll explain below.
>
>>
>> Option 4: An all-singing, all-dancing serialization framework rework
>>
>> ------------------------------------------------------------------------------------------------
>>
>> Django's serialization format is fairly limited, and there have been
>> many proposals to add features to the output format (serializing
>> non-model properties, reverse relations, deep relations, etc). I've
>> been holding off on these in favour of a larger rework of the
>> serialization framework.
>>
>> In my minds eye, I have a vision of a serialization framework that
>> would allow for registration of different serialization formats - not
>> just JSON/XML, but the fields and internal structure of a JSON
>> fixture, etc. Describing which fields should be rendered as lookups,
>> how the lookup would be determined, and under what conditions a lookup
>> should be used would all just be a configuration items on a
>> serialization definition.
>>
>> This is obviously a much larger body of work, and certainly wouldn't
>> get done for v1.2 - if only because I haven't done any planning,
>> prototype implementation, or community review.
>
> Yes please, but lets get something usable into 1.2.
>
>>
>> Option 5: Something else
>> -------------------------------------
>>
>> I'm open to any other suggestion.
>>
>> The good news in all this is that Option 1 isn't mutually exclusive to
>> the other options - we can land Option 1 right now and get the
>> advantages of dynamic lookups, and then worry about how to close the
>> loop as a second problem.
>>
>> So - feedback welcome. Which option should we pursue?
>>
>
> Well, since you asked for it ;)
>
> I have solved this in a proof of concept way in my playground code[0]. Pinax
> also took this implementation and adapted it to work with json[1].
>
> The basic approach, is that on serialization, you introspect the model, and
> see if it has any unique or unique_together fields. If there are these
> fields present, then those should be queried and used as the unique
> constraints when outputting the fixture.
>
> This gives us a neat fallback, where if the only unique constraint is the
> Autofield generated, then it automatically just outputs the query as the pk
> query that is used now.
>
> I think this gives us the most amount of automatic ability, outputting
> things where we obviously know they are unique. I think combining this with
> some version of Option 3, allowing users to specify on the command line
> which fields they wish to output, gets us most of the rest of the way.


I can see what you're driving at here, but I have some reservations.

Firstly, not all models with unique fields need to be serialized as
lookups. This introduces a potential efficiency issue. Assigning a
literal PK is much faster than a database lookup. Fixture loading is
already one of the two weak points in testing speed. Introducing
lookups that aren't strictly necessary into fixtures will just make
the matter worse.

There is also potentially a complexity issue. At the moment, circular
dependencies are easy to resolve - assign a numerical PK, and defer
integrity checks to the end of the transaction. Since FK values can
themselves be part of unique_together constraints, you could end up
with a situation where object 1 needs to lookup object 2, but object 2
needs to lookup object 1.

Secondly, not all models that need to be serialized as lookups have
unique fields. It's easy to create a management.py trigger to create
dynamic content, but there's no requirement that the models involved
have unique non-primary keys.

In summary - I think the 'first unique/unique together field'
technique is an excellent way to determine which fields should be used
for serialization once you have determined that a model should be
serialized as a lookup. However, I don't think it's a very good
criterion for determining that a lookup is required in the first
place.

Manually specifying models at the command line (--lookup) is a
longwinded approach, and it's easy to accidentally forget which models
need to be serialized as lookups. However, it doesn't suffer from the
flaws that the automatic unique-based criterion has. It's not pretty,
but it might be enough to tide us over until a full serialization
rewrite can happen.

Yours,
Russ Magee %-)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: #7052 - Fixing serialization for content types and auth

Reply via email to