#33015: QuerySet.extra Use Case: filtering on large lists of tuples
-------------------------------------+-------------------------------------
     Reporter:  kris-swann           |                    Owner:  nobody
         Type:  Uncategorized        |                   Status:  new
    Component:  Database layer       |                  Version:  2.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  QuerySet.extra       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):

 * cc: Keryn Knight (added)


Comment:

 Having been bitten by this same thing multiple times over the years, I'm
 interested.

 The performance profile of this is reproducible for me on the `2.2.9` and
 `3.2.6` tags:
 {{{
 In [5]: %%timeit
    ...: query = Q()
    ...: for v1, v2 in items_to_fetch:
    ...:     query |= Q(val1=v1, val2=v2)
 21.9 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
 In [7]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
 48.2 s ± 1.54 s per loop (mean ± std. dev. of 3 runs, 1 loop each)
 }}}
 but doesn't appear to be a ''big'' issue on `main` since #32940; my
 suspicion is that both `WhereNode` and `Q` were hitting the pathological
 case for `data in self.children` being O(n), in case that rings a bell
 Simon; below is main as of `8208381ba6`; the problem was present at
 `501a8db465` and basically gone at `ff661dbd50`:
 {{{
 In [5]: %%timeit
    ...: query = Q()
    ...: for v1, v2 in items_to_fetch:
    ...:     query |= Q(val1=v1, val2=v2)
 209 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 In [7]: %timeit x = ExampleModel.objects.filter(query)
 644 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 }}}
 Though it's still faster to avoid combining Q objects and kwargs:
 {{{
 In [14]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
 items_to_fetch), _connector=Q.OR)
 15.1 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 }}}
 That's true for `2.2.9` too, but regrettably has less impact on the query
 compilation itself:
 {{{
 In [4]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
 items_to_fetch), _connector=Q.OR)
 22.9 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
 In [5]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
 20.6 s ± 874 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
 }}}

 I don't know that the simple Q based version will ever be as fast as
 `extra` (as used above) or the `ExpressionTuple` (also above), but it
 begins to look ''almost'' acceptable, certainly compared to the previous.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33015#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.7d3d0b1a2d3d23558ba619a9aa3218a8%40djangoproject.com.

Reply via email to