#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a
certain field are the same.
-------------------------------------+-------------------------------------
Reporter: Willem Van Onsem | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: db, bulk_update, | Triage Stage:
case, when | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Willem Van Onsem):
Well typically hashing should run linear with the data, so `Value(1)`
should indeed take more time than `1`, but not dramatically more (by the
way, it already hashes `Value(1)`, since it first checks the `if not
hasattr(attr, "resolve_expression")` first, and wraps it into a `Value`,
so that is where the current benmarks originate from. If we would have
done `=Value(random.randint(0, 10))` for this benchmark, it would make no
difference. From the moment it encounters a hash error, it sets the
dictionary to `None`, and thus no longer hashes, and saves these cycles,
it thus will stop looking for hashes if one of the items can not be
hashed.
But probably the main reason why it would be very strange that the hashing
would increase time significantly is that in order to generate an SQL
counterpart of some expression (like F('a') + Value(1)`, it will *also*
take linear). So essentially *building* the SQL query with *all* `Case` /
`When` items, will always take approximately the same effort as hashing
all these items, since both run linear in the "size" of the SQL
expressions (or at least, that is a reasonable assumption), so we will
lose that effort anyway. It will thus probably at most ~double the amount
of effort to generate the query, if there are (close) to no duplicate
expressions available, and my experience is that building the query
itself, often is not really the bottleneck: once to generate the hashes,
and once to generate the query. If there are duplicate expressions, it
saves also on generate parts of the SQL query, which again will probably
not have much impact in a positive or negative way, since that is almost
never the bottleneck.
--
Ticket URL: <https://code.djangoproject.com/ticket/35399#comment:8>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018f119fb843-bec62755-44c4-44f4-b6bf-cc60c723adee-000000%40eu-central-1.amazonses.com.