#36526: bulk_update uses more memory than expected
----------------------------+-----------------------------------------
     Reporter:  Anže Pečar  |                     Type:  Uncategorized
       Status:  new         |                Component:  Uncategorized
      Version:  5.2         |                 Severity:  Normal
     Keywords:              |             Triage Stage:  Unreviewed
    Has patch:  0           |      Needs documentation:  0
  Needs tests:  0           |  Patch needs improvement:  0
Easy pickings:  0           |                    UI/UX:  0
----------------------------+-----------------------------------------
 I recently tried to update a large number of objects with:

 {{{
 things = list(Thing.objects.all()) # A large number of objects e.g. >
 1_000_000
 Thing.objects.bulk_update(things, ["description"], batch_size=300)
 }}}

 The first line above fits into the available memory (~2GB in my case), but
 the second line caused a SIGTERM, even though I had an additional 2GB of
 available memory. This was a bit surprising as I wasn't expecting
 bulk_update to use this much memory since all the objects to update were
 already loaded.

 My solution was:

 {{{
 for batch in batched(things, 300):
      Thing.objects.bulk_update(batch, ["description"], batch_size=300)
 }}}

 The first example `bulk_update` used 2.8GB of memory, but in the second
 example, it only used 62MB.

 [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
 that reproduces the problem with memray results.]

 Looking at the source code of `bulk_update`, the issue seems to be that
 Django builds the `updates` list before starting to execute the queries.
 I'd be happy to contribute a patch that makes the updates list lazy unless
 there are concerns about adding more computation between each update call
 and thus making the transaction longer?

 This might be related to https://code.djangoproject.com/ticket/31202, but
 I decided to open a new issue because I wouldn't mind waiting longer for
 bulk_update to complete, but the SIGTERM surprised me.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36526>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019847286df0-3a296338-c7ce-4362-a83f-c26794e5da35-000000%40eu-central-1.amazonses.com.

Reply via email to