#36526: bulk_update uses more memory than expected
-------------------------------+--------------------------------------
     Reporter:  Anže Pečar     |                    Owner:  (none)
         Type:  Uncategorized  |                   Status:  new
    Component:  Uncategorized  |                  Version:  5.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------
Description changed by Anže Pečar:

Old description:

> I recently tried to update a large number of objects with:
>
> {{{
> things = list(Thing.objects.all()) # A large number of objects e.g. >
> 1_000_000
> Thing.objects.bulk_update(things, ["description"], batch_size=300)
> }}}
>
> The first line above fits into the available memory (~2GB in my case),
> but the second line caused a SIGTERM, even though I had an additional 2GB
> of available memory. This was a bit surprising as I wasn't expecting
> bulk_update to use this much memory since all the objects to update were
> already loaded.
>
> My solution was:
>
> {{{
> for batch in batched(things, 300):
>      Thing.objects.bulk_update(batch, ["description"], batch_size=300)
> }}}
>
> The first example `bulk_update` used 2.8GB of memory, but in the second
> example, it only used 62MB.
>
> [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
> that reproduces the problem with memray results.]
>
> Looking at the source code of `bulk_update`, the issue seems to be that
> Django builds the `updates` list before starting to execute the queries.
> I'd be happy to contribute a patch that makes the updates list lazy
> unless there are concerns about adding more computation between each
> update call and thus making the transaction longer?
>
> This might be related to https://code.djangoproject.com/ticket/31202, but
> I decided to open a new issue because I wouldn't mind waiting longer for
> bulk_update to complete, but the SIGTERM surprised me.

New description:

 I recently tried to update a large number of objects with:

 {{{
 things = list(Thing.objects.all()) # A large number of objects e.g. >
 1_000_000
 Thing.objects.bulk_update(things, ["description"], batch_size=300)
 }}}

 The first line above fits into the available memory (~2GB in my case), but
 the second line caused a SIGTERM, even though I had an additional 2GB of
 available memory. This was a bit surprising as I wasn't expecting
 bulk_update to use this much memory since all the objects to update were
 already loaded.

 My solution was:

 {{{
 for batch in batched(things, 300):
      Thing.objects.bulk_update(batch, ["description"], batch_size=300)
 }}}

 The first example `bulk_update` used 2.8GB of memory, but in the second
 example, it only used 62MB.

 [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
 that reproduces the problem with memray results.]

 This might be related to https://code.djangoproject.com/ticket/31202, but
 I decided to open a new issue because I wouldn't mind waiting longer for
 bulk_update to complete, but the SIGTERM surprised me.

--
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/010701984741476f-5c60724a-77ee-4682-a2d9-16c84a3ab797-000000%40eu-central-1.amazonses.com.

Reply via email to