#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Márton Salomváry):
Hi Natalia,
> In general, Django does not make guarantees about the exact SQL that is
generated. This means SQL can change from version to version, and we would
not consider that a bug as long as the semantics of the query remain the
same (which, based on my testing, is the case here).
I think the semantics did change, maybe the example I provided is not
obvious enough to showcase that. Will provide some more cases where we had
to change code.
> That said, we do care about potential performance impacts and/or
regressions.
In our case the performance impact was extreme. We got from queries that
take a few dozens of milliseconds to ones that practically never complete
and had to kill queries manually in order to recover the PostgreSQL
database server.
> I do wonder, however, whether the way the queries are currently written
might be more complex than necessary, which could contribute to these
differences in execution plans.
They might indeed be written more complex than necessary, but they used to
generate SQL we had no performance or other issues with.
I understand you are suggesting rewriting the queries before upgrading
Django from 4.x to 5.0 but given the size and complexity of our code base,
identifying the queries in a 100% reliable way is nearly impossible,
meaning that we have to keep fighting runaway queries and rewriting until
we no longer have production failures.
> For example, this alternative queryset is, IMHO, equivalent and much
more straightfoward:
>
>
> {{{
> Alpha.objects.filter(
> Q(bravos__charlie_bravos__charlie=9999)
> & Exists(
> Delta.objects.filter(
> bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
> )
> )
>)
> }}}
Sure it might be more straightforward, but repeating `filter()` calls is
often handy if some is only applied conditionally, like this:
{{{
qs = qs.filter(some_unconditional_query...)
if some_condition:
qs = qs.filter(some_conditional_query...)
}}}
I also wonder why repeating `filter()` and combining expressions like
`Q(...) & Q()` do not result in equivalent queries. Ie. shouldn't these
three always be the same? If not, what is the difference in the semantics?
{{{
qs1 = qs.filter(foo=bar).filter(baz=qux)
qs2 = qs.filter(Q(foo=bar) & Q(baz=qux))
qs3 = qs.filter(foo=bar, baz=qux)
}}}
> If you encounter a case where a query written in a way that aligns with
Django's ORM patterns (as shown above) returns incorrect results or
performs poorly, please feel free to reopen it with details.
I would appreciate feedback on how the original query does **not** align
with Django's ORM patterns. The documentation does not discourage repeat-
calls to `filter()` in order to narrow down results.
In any case, I will try to provide more input on how this breaks the
**results** of our queries.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070198be0be0d6-b720ec88-5f53-46e2-8add-7f9aa92e83d6-000000%40eu-central-1.amazonses.com.