#33937: Optimize m2m serialization to avoid loading full model instances
--------------------------------------+------------------------------------
     Reporter:  Gordon Wrigley        |                    Owner:  nobody
         Type:  Cleanup/optimization  |                   Status:  new
    Component:  Core (Serialization)  |                  Version:  4.0
     Severity:  Normal                |               Resolution:
     Keywords:  performance           |             Triage Stage:  Accepted
    Has patch:  0                     |      Needs documentation:  0
  Needs tests:  0                     |  Patch needs improvement:  0
Easy pickings:  0                     |                    UI/UX:  0
--------------------------------------+------------------------------------
Changes (by Adam Johnson):

 * type:  Uncategorized => Cleanup/optimization
 * stage:  Unreviewed => Accepted


Old description:

> When not using natural keys, this function
> https://github.com/django/django/blob/main/django/core/serializers/python.py#L64
> loads the full object for every entry in the m2m, when it only actually
> wants the pks which it could get off the m2m intermediate table without
> even joining to the target table.
>
> In my case the table we are m2m'ing to has files in it, so that's a
> weighty fetch.
> We are using django-reversion which stores a serialized version of each
> save.
> On the workload that flagged this up enabling reversion incurs a 300x
> performance hit (from half a second to 2.5 minutes) and it's almost
> entirely because of this.

New description:

 When not using natural keys, the `handle_m2m_field`
 
function([https://github.com/django/django/blob/aed60aee38215e293d6ec2f3c96ec55bb9a62fc2/django/core/serializers/python.py#L64
 source]) loads the full object for every entry in the m2m model, when it
 only needs the pks. The pk's can even be obtained from the m2m
 intermediate table, without joining the target table.

 In my case the table we are m2m'ing to has files in it, so that's a
 weighty fetch.
 We are using django-reversion which stores a serialized version of each
 save.
 On the workload that flagged this up enabling reversion incurs a 300x
 performance hit (from half a second to 2.5 minutes) and it's almost
 entirely because of this.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33937#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070182b1189f0b-f58c7f54-e676-4e96-aaa5-9b217f30571d-000000%40eu-central-1.amazonses.com.

Reply via email to