#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):
Replying to [comment:4 Márton Salomváry]:
> They might indeed be written more complex than necessary, but they used
to generate SQL we had no performance or other issues with.
>
> I understand you are suggesting rewriting the queries before upgrading
Django from 4.x to 5.0 but given the size and complexity of our code base,
identifying the queries in a 100% reliable way is nearly impossible,
meaning that we have to keep fighting runaway queries and rewriting until
we no longer have production failures.
The thing is that the commit you reference fixes a bug. Using your
examples:
{{{
Alpha <--M2M--> Bravo <--FK-- CharlieBravo --FK--> Charlie
Delta <--M2M--> Bravo
}}}
* Django `4.2x`: `Alpha --> Bravo --> CharlieBravo` (single INNER JOIN)
* Since `5.0.x`:
* `Alpha --> Bravo --> CharlieBravo` (INNER JOIN for first filter)
* `Alpha --> T6 Bravo --> T7 CharlieBravo` (LEFT OUTER JOIN path for
second filter)
> [...] repeating `filter()` calls is often handy if some is only applied
conditionally, like this:
>
> {{{
> qs = qs.filter(some_unconditional_query...)
>
> if some_condition:
> qs = qs.filter(some_conditional_query...)
> }}}
As a side note, this example of chaining multiple `.filter()` calls is
common and supported, but if you need to ensure joins are combined as
efficiently as possible, the equivalent pattern is to build up a list of
conditions and combine them with `&` into a single `filter`, for example:
{{{#!python
filters = [Q(some_unconditional_query)]
if some_condition:
filters.append(Q(some_conditional_query))
result = qs.filter(reduce(operator.and_, filters))
}}}
This is functionally equivalent to chaining, but ensures the ORM generates
queries consistent with the documented behavior for multi-valued
relationships.
> I also wonder why repeating `filter()` and combining expressions like
`Q(...) & Q()` do not result in equivalent queries. Ie. shouldn't these
three always be the same? If not, what is the difference in the semantics?
This is documented in this
[https://docs.djangoproject.com/en/5.2/topics/db/queries/#spanning-multi-
valued-relationships section].
In fact, prior to the change in Django 5.0, usage of expressions passed to
`.filter()` sometimes contradicted the documentation around multi-valued
relationships. The docs have long stated that when spanning multi-valued
relationships, repeated filters can lead to additional joins, and that
conditions should be combined in a single `.filter()` if you want to avoid
that. The recent change aligns behavior with that documented expectation,
even if it changes some queries that happened to work differently before.
> I would appreciate feedback on how the original query does **not** align
with Django's ORM patterns. The documentation does not discourage repeat-
calls to `filter()` in order to narrow down results.
Chaining `.filter()` is a normal pattern, but when multi-valued
relationships are involved the docs note that it can change semantics and
have query/performance implications. If `JOIN` reuse is necessary due to
performance requirements, combining conditions in a single call is the ORM
pattern to follow.
> In any case, I will try to provide more input on how this breaks the
**results** of our queries.
That would certainly help, thank you!
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070198be3a04d1-35252dff-460a-4efb-91b8-a23b090d3b65-000000%40eu-central-1.amazonses.com.