#36526: bulk_update uses more memory than expected
-------------------------------+--------------------------------------
Reporter: Anže Pečar | Owner: (none)
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by Anže Pečar:
Old description:
> I recently tried to update a large number of objects with:
>
> {{{
> things = list(Thing.objects.all()) # A large number of objects e.g. >
> 1_000_000
> Thing.objects.bulk_update(things, ["description"], batch_size=300)
> }}}
>
> The first line above fits into the available memory (~2GB in my case),
> but the second line caused a SIGTERM, even though I had an additional 2GB
> of available memory. This was a bit surprising as I wasn't expecting
> bulk_update to use this much memory since all the objects to update were
> already loaded.
>
> My solution was:
>
> {{{
> for batch in batched(things, 300):
> Thing.objects.bulk_update(batch, ["description"], batch_size=300)
> }}}
>
> The first example `bulk_update` used 2.8GB of memory, but in the second
> example, it only used 62MB.
>
> [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
> that reproduces the problem with memray results.]
>
> Looking at the source code of `bulk_update`, the issue seems to be that
> Django builds the `updates` list before starting to execute the queries.
> I'd be happy to contribute a patch that makes the updates list lazy
> unless there are concerns about adding more computation between each
> update call and thus making the transaction longer?
>
> This might be related to https://code.djangoproject.com/ticket/31202, but
> I decided to open a new issue because I wouldn't mind waiting longer for
> bulk_update to complete, but the SIGTERM surprised me.
New description:
I recently tried to update a large number of objects with:
{{{
things = list(Thing.objects.all()) # A large number of objects e.g. >
1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)
}}}
The first line above fits into the available memory (~2GB in my case), but
the second line caused a SIGTERM, even though I had an additional 2GB of
available memory. This was a bit surprising as I wasn't expecting
bulk_update to use this much memory since all the objects to update were
already loaded.
My solution was:
{{{
for batch in batched(things, 300):
Thing.objects.bulk_update(batch, ["description"], batch_size=300)
}}}
The first example `bulk_update` used 2.8GB of memory, but in the second
example, it only used 62MB.
[https://github.com/anze3db/django-bulk-update-memory A GitHub repository
that reproduces the problem with memray results.]
This might be related to https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/010701984741476f-5c60724a-77ee-4682-a2d9-16c84a3ab797-000000%40eu-central-1.amazonses.com.