#34597: Django ORM query SQL generation changed (and is apparently a lot slower)
-------------------------------------+-------------------------------------
Reporter: Lorand Varga | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Lorand Varga):
* status: closed => new
* resolution: needsinfo =>
Comment:
Adding more information:
Django 2.2 behavior:
{{{
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published" = True
AND "test_app_blog"."api_has_translation" = False
AND NOT (
"test_app_blog"."id" IN (
SELECT
U0."id"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = ("test_app_blog"."id")
)
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
Sort (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual
time=888.530..888.533 rows=4 loops=1)
Sort Key: test_app_blog.updated_at DESC
Sort Method: quicksort Memory: 33kB
-> Index Scan using test_app_blog_is_published_4b47c652_uniq on
test_app_blog (cost=0.42..995257.82 rows=90293 width=1985) (actual
time=72.414..888.496 rows=4 loops=1)
Index Cond: (is_published = true)
Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 191941
SubPlan 1
-> Nested Loop Left Join (cost=0.71..8.57 rows=1 width=4)
(actual time=0.003..0.003 rows=1 loops=179902)
Join Filter: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 0
-> Index Only Scan using test_app_blog_pkey on
test_app_blog u0 (cost=0.42..2.44 rows=1 width=4) (actual
time=0.002..0.002 rows=1 loops=179902)
Index Cond: (id = test_app_blog.id)
Heap Fetches: 27765
-> Index Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..6.07 rows=5 width=8) (actual
time=0.001..0.001 rows=0 loops=179902)
Index Cond: (blog_id = test_app_blog.id)
Planning Time: 0.500 ms
Execution Time: 888.606 ms
}}}
Django 3.2 (and 4.2) behavior:
{{{
EXPLAIN ANALYZE
SELECT
"test_app_blog"."id",
"test_app_blog"."created_at",
"test_app_blog"."updated_at",
"test_app_blog"."is_published",
"test_app_blog"."api_has_translation"
FROM
"test_app_blog"
WHERE
(
"test_app_blog"."is_published"
AND NOT "test_app_blog"."api_has_translation"
AND NOT (
EXISTS(
SELECT
(1) AS "a"
FROM
"test_app_blog" U0
LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
U1."blog_id")
WHERE
(
U1."id" IS NULL
AND U0."id" = "test_app_blog"."id"
AND "test_app_blog"."id" = "test_app_blog"."id"
)
LIMIT
1
)
)
)
ORDER BY
"test_app_blog"."updated_at" DESC
Nested Loop Anti Join (cost=1001.15..131927.02 rows=180586 width=1985)
(actual time=208264.409..3034435.256 rows=4 loops=1)
Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
test_app_blog.id))
Rows Removed by Join Filter: 16853800739
-> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual
time=1.080..17454.750 rows=179902 loops=1)
Filter: (is_published AND (NOT api_has_translation))
Rows Removed by Filter: 27456
-> Materialize (cost=1000.73..21651.35 rows=1 width=4) (actual
time=0.002..9.009 rows=93684 loops=179902)
-> Gather Merge (cost=1000.73..21651.34 rows=1 width=4) (actual
time=6.024..192.937 rows=194102 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Merge Left Join (cost=0.71..20651.20 rows=1 width=4)
(actual time=0.046..87.573 rows=64701 loops=3)
Merge Cond: (u0.id = u1.blog_id)
Filter: (u1.id IS NULL)
Rows Removed by Filter: 19931
-> Parallel Index Only Scan using test_app_blog_pkey
on test_app_blog u0 (cost=0.42..13961.37 rows=86361 width=4) (actual
time=0.018..32.796 rows=69119 loops=3)
Heap Fetches: 33004
-> Index Scan using test_app_translation_51c6d5db on
test_app_translation u1 (cost=0.29..6082.91 rows=60175 width=8) (actual
time=0.025..32.500 rows=58545 loops=3)
Planning Time: 0.882 ms
Execution Time: 3034436.784 ms
}}}
As can be seen, there is a huge increase in execution time.
I can't vouch there isn't something else going on here also but currently
the only thing changed is the django version.
Anything else in the environment is left untouched. (And django2.2
generated query is performant, while django 3.2 query is awfull).
I've managed to use your advice and modify the query to force the subquery
to use IN but that leads to synthax that is more complex, while the django
2.2 synthax is a lot simpler: ~Q(translation=None)
The only synthax that manages to replicate the query correctly (and
performance wise) seems to be
{{{qs.exclude(id__in=Blog.objects.filter(translation=None,
id=OuterRef("id")))}}}}
While not terrible I do think something has been broken in django since
the ~Q(translation=None) seems a lot cleaner and more ORMish.
Thanks,
Lorand.
--
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018862eff40b-49dabfd3-5832-432a-a027-42fa088f1f7d-000000%40eu-central-1.amazonses.com.