#33015: QuerySet.extra Use Case: filtering on large lists of tuples
-------------------------------------+-------------------------------------
     Reporter:  kris-swann           |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  2.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  QuerySet.extra,      |             Triage Stage:  Accepted
  Documentation                      |
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Keryn Knight):

 So, I'm not against documenting the most efficient way to generate a big
 old Q (it's ''usually'' an `OR` IME); I suggested it as an aside in
 #32946. Mariusz would prefer not to promote using `_connector` (see
 comment 1 in that ticket), and though the reasons aren't given, I think I
 kind of get them, and it boils down to: ideally, we shouldn't need to...

 In an ideal world, throwing Q objects around even at a large scale
 ''should'' be fast enough to be usable and not need ... tricks. But even
 with attempts to improve other bits around that (#32948 ... doesn't make a
 huge amount of difference here, as most time is in `add`, avoiding the
 cost of using kwargs because they're being artificially inflated for `Q`
 etc), there's an unfortunate inflexion point where it becomes bigger than
 most request/response time budgets before you even ''run'' the query. The
 query given above is a perfect example of that, where the query may take
 ~2ms (on sqlite locally for me) but the evaluation into a tuple might take
 many times more than that (`100ms` for `30, 30` `items_to_fetch` otherwise
 sqlite doesn't like the expression length being more than 1000) ... and
 for a change of pace that's ''not'' being spent in `Model.__init__` or any
 of the usual suspects (cloning methods). It's entirely in the `Query`
 building (if you copy the `.query` once it's been built and paste it into
 a new QuerySet, the whole thing is `34ms`). As mentioned, I ''have'' seen
 this personally in production (luckily in an out of band celery task,
 where time mattered less) and that's how I learnt to mitigate it. I don't
 ''like it'' and I don't ''like'' that it's something others might have to
 learn, but even cutting 50% off the costs of `build_filter` and handling Q
 objects leaves you probably searching for additional optimisations to make
 it feasible in a normal request/response cycle (at the given `100, 100`
 example)

 I don't have a good answer for it, really. Documenting that "Yeah it's
 just not good at huge condition lists" doesn't seem great, but nor does
 leaving users without an escape hatch for ''if'' they encounter such
 things. If Simon's suggested alternative via the new functionality is well
 documented, this use case could be used as a good example of that new
 hotness (though, caveat, I tried running it on `main` and got
 `AttributeError: 'F' object has no attribute '_output_field_or_none'` ...
 as I know nothing about that functionality (it's a neat surprise), I dunno
 if that's because it was an untested ''sketch'' or if it should work but
 doesn't).

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33015#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.ba0970d5bf223413a81087f88ec9a731%40djangoproject.com.

Reply via email to