#36552: Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
-------------------------------------+-------------------------------------
Reporter: Márton Salomváry | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: chained-filters | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Márton Salomváry):
Replying to [comment:5 Natalia Bidart]:
> The thing is that the commit you reference fixes a bug.
I understand that. What I can't decide is whether what I am seeing is
another bug introduced by that bugfix, or I have been successfully relying
on the existence of bug for quite long :)
>
> This is documented in this
[https://docs.djangoproject.com/en/5.2/topics/db/queries/#spanning-multi-
valued-relationships section].
>
> > I would appreciate feedback on how the original query does **not**
align with Django's ORM patterns. The documentation does not discourage
repeat-calls to `filter()` in order to narrow down results.
>
> Chaining `.filter()` is a normal pattern, but when multi-valued
relationships are involved the docs note that it can change semantics and
have query/performance implications. If `JOIN` reuse is necessary due to
performance requirements, combining conditions in a single call is the ORM
pattern to follow.
Oh thanks, now I remember having been bitten by this in the past, but not
often and painfully enough to remember the peculiarities of joins with
multi-valued relationships!
>
> > In any case, I will try to provide more input on how this breaks the
**results** of our queries.
>
> That would certainly help, thank you!
In this example, count is wrongly `2` in Django 5.0 and correctly `1` in
Django 4.x:
https://github.com/salomvary/django-5-filter-
regression/commit/f3cabd4e8987210e9aa17c99bfa622cfc7c580d8
{{{
alpha = Alpha.objects.create(id=7777)
bravo = alpha.bravos.create()
charlie1 = Charlie.objects.create(id=9999)
charlie2 = Charlie.objects.create(id=5555)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie1)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie2)
bravo.deltas.create()
queryset = (
Alpha.objects.filter(
bravos__charlie_bravos__charlie=9999,
)
.filter(
Exists(
Delta.objects.filter(
bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
)
)
)
.annotate(count=Count(
"bravos__charlie_bravos__bravo"
))
.values_list("pk", "count")
)
self.assertEqual(
[(7777, 1)],
list(queryset)
)
}}}
Now I wonder how to spot these in an existing code base, and prevent
shooting myself in the foot in the future. I guess I can monkey-patch
QuerySets to log a warning if repeated `filter()` is used with multi-
valued relationships and force using a single `filter()` and combining
conditions with `&` or `|`.
--
Ticket URL: <https://code.djangoproject.com/ticket/36552#comment:6>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070198be4fe44c-f819a80b-c9e5-4e1e-95ce-ef4d581f0d7b-000000%40eu-central-1.amazonses.com.