#33015: QuerySet.extra Use Case: filtering on large lists of tuples
-------------------------------------+-------------------------------------
Reporter: kris-swann | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: QuerySet.extra | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Keryn Knight):
* cc: Keryn Knight (added)
Comment:
Having been bitten by this same thing multiple times over the years, I'm
interested.
The performance profile of this is reproducible for me on the `2.2.9` and
`3.2.6` tags:
{{{
In [5]: %%timeit
...: query = Q()
...: for v1, v2 in items_to_fetch:
...: query |= Q(val1=v1, val2=v2)
21.9 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
48.2 s ± 1.54 s per loop (mean ± std. dev. of 3 runs, 1 loop each)
}}}
but doesn't appear to be a ''big'' issue on `main` since #32940; my
suspicion is that both `WhereNode` and `Q` were hitting the pathological
case for `data in self.children` being O(n), in case that rings a bell
Simon; below is main as of `8208381ba6`; the problem was present at
`501a8db465` and basically gone at `ff661dbd50`:
{{{
In [5]: %%timeit
...: query = Q()
...: for v1, v2 in items_to_fetch:
...: query |= Q(val1=v1, val2=v2)
209 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit x = ExampleModel.objects.filter(query)
644 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
}}}
Though it's still faster to avoid combining Q objects and kwargs:
{{{
In [14]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
items_to_fetch), _connector=Q.OR)
15.1 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
}}}
That's true for `2.2.9` too, but regrettably has less impact on the query
compilation itself:
{{{
In [4]: %timeit Q(*(Q(('val1', v1), ('val2', v2)) for v1, v2 in
items_to_fetch), _connector=Q.OR)
22.9 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit -r3 -n1 x = ExampleModel.objects.filter(query)
20.6 s ± 874 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
}}}
I don't know that the simple Q based version will ever be as fast as
`extra` (as used above) or the `ExpressionTuple` (also above), but it
begins to look ''almost'' acceptable, certainly compared to the previous.
--
Ticket URL: <https://code.djangoproject.com/ticket/33015#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/068.7d3d0b1a2d3d23558ba619a9aa3218a8%40djangoproject.com.