#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a
certain field are the same.
-------------------------------------+-------------------------------------
Reporter: Willem | Owner: nobody
Van Onsem |
Type: New | Status: new
feature |
Component: Database | Version: 5.0
layer (models, ORM) | Keywords: db, bulk_update,
Severity: Normal | case, when
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Django's `.bulk_update(..)` seems to work with sequences of `Case-When`
*per* field and *per* record to update. In other words, if we have three
items we want to update with `pk` being 1, 4 and 5, and we have a field
`yn` and the records for pk `1` and `5` have value `y`, whereas the one
for `pk=4` it is `n`, then this will make a query that looks like:
{{{
UPDATE my_table
SET yn=CASE WHEN id=1 THEN 'y' WHEN id=4 THEN 'n' WHEN id=5 THEN 'y' END
WHERE id IN (1,4,5)
}}}
It thus builds a long sequence of `Case`-`When`s, which is not efficient.
There are for most databases solutions with a temporary table that is then
upserted into the main table. The Django ORM could probably use existent
tools for this where we first make a temporary model, create it with the
migration model, insert in bulk, then upsert with a query, and finally
remove the temporary table. But a problem with this is, we might not have
privileges to create an extra table in that database.
But a low-hanging optimization is to first *group* the values together
that we can group together. This can be done with a `defaultdict` that
maps the value to a list of primary keys for that value. In case the value
is not hashable, we can fallback on the original case-when logic. But for
values like strings, this would reduce the query to:
{{{
UPDATE my_table
SET yn=CASE WHEN id IN (1, 5) THEN 'y' WHEN id=4 THEN 'n' END
WHERE id IN (1,4,5)
}}}
That being said, I think bulk_updates should still be done in a more
efficient manner, perhaps moving it to the engines, since a lot of
databases allow the creation of temporary tables to make upserts a lot
more efficient.
--
Ticket URL: <https://code.djangoproject.com/ticket/35399>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018f0b9ceb1d-0d14d9d3-bb8e-4f4b-adb3-9eafa6ac02af-000000%40eu-central-1.amazonses.com.