#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
     Reporter:  Márton Salomváry     |                    Owner:  (none)
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  5.0
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:  invalid
     Keywords:  chained-filters      |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

 Replying to [comment:4 Márton Salomváry]:

 > They might indeed be written more complex than necessary, but they used
 to generate SQL we had no performance or other issues with.
 >
 > I understand you are suggesting rewriting the queries before upgrading
 Django from 4.x to 5.0 but given the size and complexity of our code base,
 identifying the queries in a 100% reliable way is nearly impossible,
 meaning that we have to keep fighting runaway queries and rewriting until
 we no longer have production failures.

 The thing is that the commit you reference fixes a bug. Using your
 examples:

 {{{
 Alpha <--M2M--> Bravo <--FK-- CharlieBravo --FK--> Charlie
 Delta <--M2M--> Bravo
 }}}

 * Django `4.2x`: `Alpha --> Bravo --> CharlieBravo` (single INNER JOIN)

 * Since `5.0.x`:
   * `Alpha --> Bravo --> CharlieBravo`  (INNER JOIN for first filter)
   * `Alpha --> T6 Bravo --> T7 CharlieBravo` (LEFT OUTER JOIN path for
 second filter)

 > [...] repeating `filter()` calls is often handy if some is only applied
 conditionally, like this:
 >
 > {{{
 > qs = qs.filter(some_unconditional_query...)
 >
 > if some_condition:
 >   qs = qs.filter(some_conditional_query...)
 > }}}

 As a side note, this example of chaining multiple `.filter()` calls is
 common and supported, but if you need to ensure joins are combined as
 efficiently as possible, the equivalent pattern is to build up a list of
 conditions and combine them with `&` into a single `filter`, for example:

 {{{#!python
 filters = [Q(some_unconditional_query)]
 if some_condition:
     filters.append(Q(some_conditional_query))
 result = qs.filter(reduce(operator.and_, filters))
 }}}

 This is functionally equivalent to chaining, but ensures the ORM generates
 queries consistent with the documented behavior for multi-valued
 relationships.

 > I also wonder why repeating `filter()` and combining expressions like
 `Q(...) & Q()` do not result in equivalent queries. Ie. shouldn't these
 three always be the same? If not, what is the difference in the semantics?

 This is documented in this
 [https://docs.djangoproject.com/en/5.2/topics/db/queries/#spanning-multi-
 valued-relationships section].

 In fact, prior to the change in Django 5.0, usage of expressions passed to
 `.filter()` sometimes contradicted the documentation around multi-valued
 relationships. The docs have long stated that when spanning multi-valued
 relationships, repeated filters can lead to additional joins, and that
 conditions should be combined in a single `.filter()` if you want to avoid
 that. The recent change aligns behavior with that documented expectation,
 even if it changes some queries that happened to work differently before.

 > I would appreciate feedback on how the original query does **not** align
 with Django's ORM patterns. The documentation does not discourage repeat-
 calls to `filter()` in order to narrow down results.

 Chaining `.filter()` is a normal pattern, but when multi-valued
 relationships are involved the docs note that it can change semantics and
 have query/performance implications. If `JOIN` reuse is necessary due to
 performance requirements, combining conditions in a single call is the ORM
 pattern to follow.

 > In any case, I will try to provide more input on how this breaks the
 **results** of our queries.

 That would certainly help, thank you!
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070198be3a04d1-35252dff-460a-4efb-91b8-a23b090d3b65-000000%40eu-central-1.amazonses.com.

Reply via email to