#33015: QuerySet.extra Use Case: filtering on large lists of tuples
-------------------------------------+-------------------------------------
Reporter: kris-swann | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra, | Triage Stage: Accepted
Documentation |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Keryn Knight):
So, I'm not against documenting the most efficient way to generate a big
old Q (it's ''usually'' an `OR` IME); I suggested it as an aside in
#32946. Mariusz would prefer not to promote using `_connector` (see
comment 1 in that ticket), and though the reasons aren't given, I think I
kind of get them, and it boils down to: ideally, we shouldn't need to...
In an ideal world, throwing Q objects around even at a large scale
''should'' be fast enough to be usable and not need ... tricks. But even
with attempts to improve other bits around that (#32948 ... doesn't make a
huge amount of difference here, as most time is in `add`, avoiding the
cost of using kwargs because they're being artificially inflated for `Q`
etc), there's an unfortunate inflexion point where it becomes bigger than
most request/response time budgets before you even ''run'' the query. The
query given above is a perfect example of that, where the query may take
~2ms (on sqlite locally for me) but the evaluation into a tuple might take
many times more than that (`100ms` for `30, 30` `items_to_fetch` otherwise
sqlite doesn't like the expression length being more than 1000) ... and
for a change of pace that's ''not'' being spent in `Model.__init__` or any
of the usual suspects (cloning methods). It's entirely in the `Query`
building (if you copy the `.query` once it's been built and paste it into
a new QuerySet, the whole thing is `34ms`). As mentioned, I ''have'' seen
this personally in production (luckily in an out of band celery task,
where time mattered less) and that's how I learnt to mitigate it. I don't
''like it'' and I don't ''like'' that it's something others might have to
learn, but even cutting 50% off the costs of `build_filter` and handling Q
objects leaves you probably searching for additional optimisations to make
it feasible in a normal request/response cycle (at the given `100, 100`
example)
I don't have a good answer for it, really. Documenting that "Yeah it's
just not good at huge condition lists" doesn't seem great, but nor does
leaving users without an escape hatch for ''if'' they encounter such
things. If Simon's suggested alternative via the new functionality is well
documented, this use case could be used as a good example of that new
hotness (though, caveat, I tried running it on `main` and got
`AttributeError: 'F' object has no attribute '_output_field_or_none'` ...
as I know nothing about that functionality (it's a neat surprise), I dunno
if that's because it was an untested ''sketch'' or if it should work but
doesn't).
--
Ticket URL: <https://code.djangoproject.com/ticket/33015#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/068.ba0970d5bf223413a81087f88ec9a731%40djangoproject.com.