Re: [Django] #34597: Queryset (split) exclude's usage of Exists is significantly slower than subquery

Django Mon, 29 May 2023 09:24:41 -0700

#34597: Queryset (split) exclude's usage of Exists is significantly slower than
subquery
-------------------------------------+-------------------------------------
     Reporter:  Lorand Varga         |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------


Comment (by Lorand Varga):

 Simon, I want to say that I really appreciate your support and patience on
 this. If I've been too annoying in my comments, it was not on purpose -
 I'm actually really curios if there is an underlying issue or not with
 django (since there are a lot of interactions going on). I'm not insisting
 for any revert unless needed and your arguments were solid and I'm very
 grateful for your support.

 Getting back to the needy greedy details, turning off the workers does not
 improve anything.
 The NOT(Exists):
 {{{
 SET max_parallel_workers_per_gather = 0
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
     "test_app_blog"."is_published"
     AND NOT "test_app_blog"."api_has_translation"
     AND NOT (
       EXISTS(
         SELECT
           (1) AS "a"
         FROM
           "test_app_blog" U0
           LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
         WHERE
           (
             U1."id" IS NULL
             AND U0."id" = "test_app_blog"."id"
           )
         LIMIT
           1
       )
     )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC


  Nested Loop Anti Join  (cost=1.13..132780.57 rows=180586 width=1985)
 (actual time=211403.273..3034890.176 rows=4 loops=1)
    Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
 test_app_blog.id))
    Rows Removed by Join Filter: 16853800739
    ->  Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
 test_app_blog  (cost=0.42..107115.43 rows=180586 width=1985) (actual
 time=1.080..1356.688 rows=179902 loops=1)
          Filter: (is_published AND (NOT api_has_translation))
          Rows Removed by Filter: 27456
    ->  Materialize  (cost=0.71..22504.89 rows=1 width=4) (actual
 time=0.002..9.083 rows=93684 loops=179902)
          ->  Merge Left Join  (cost=0.71..22504.88 rows=1 width=4) (actual
 time=0.016..140.731 rows=194102 loops=1)
                Merge Cond: (u0.id = u1.blog_id)
                Filter: (u1.id IS NULL)
                Rows Removed by Filter: 59794
                ->  Index Only Scan using test_app_blog_pkey on
 test_app_blog u0  (cost=0.42..15170.42 rows=207267 width=4) (actual
 time=0.007..67.559 rows=207358 loops=1)
                      Heap Fetches: 33518
                ->  Index Scan using test_app_translation_51c6d5db on
 test_app_translation u1  (cost=0.29..6077.43 rows=59809 width=8) (actual
 time=0.005..30.515 rows=59794 loops=1)
  Planning Time: 0.936 ms
  Execution Time: 3034891.393 ms
 (16 rows)
 }}}

 The Exists does work:
 {{{
 SET max_parallel_workers_per_gather = 0
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
     "test_app_blog"."is_published"
     AND NOT "test_app_blog"."api_has_translation"
     AND EXISTS(
         SELECT
           (1) AS "a"
         FROM
           "test_app_translation" U1
         WHERE
           U1."blog_id" = "test_app_blog"."id"
         LIMIT
           1
       )
     )
 ORDER BY
   "test_app_blog"."updated_at" DESC

 Sort  (cost=21037.38..21066.08 rows=11479 width=22) (actual
 time=63.818..63.820 rows=4 loops=1)
   Sort Key: test_app_blog.updated_at DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=1501.37..20263.31 rows=11479 width=22) (actual
 time=17.974..63.805 rows=4 loops=1)
         ->  HashAggregate  (cost=1500.95..1632.70 rows=13175 width=4)
 (actual time=17.018..20.483 rows=13256 loops=1)
               Group Key: u1.blog_id
               Batches: 1  Memory Usage: 1425kB
               ->  Index Only Scan using test_app_translation_51c6d5db on
 test_app_translation u1  (cost=0.29..1351.42 rows=59809 width=4) (actual
 time=0.007..6.721 rows=59794 loops=1)
                     Heap Fetches: 735
         ->  Index Scan using test_app_blog_pkey on test_app_blog
 (cost=0.42..1.47 rows=1 width=22) (actual time=0.003..0.003 rows=0
 loops=13256)
               Index Cond: (id = u1.blog_id)
               Filter: (is_published AND (NOT api_has_translation))
               Rows Removed by Filter: 1
 Planning Time: 0.703 ms
 Execution Time: 63.872 ms
 }}}

 Also played with various values for work_mem and haven't seen any change
 in the postgres planning.

 I want to also mention that David is on to something.
 {{{
 SET enable_material='off';
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
     "test_app_blog"."is_published"
     AND NOT "test_app_blog"."api_has_translation"
     AND NOT (
       EXISTS(
         SELECT
           (1) AS "a"
         FROM
           "test_app_blog" U0
           LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
         WHERE
           (
             U1."id" IS NULL
             AND U0."id" = "test_app_blog"."id"
           )
         LIMIT
           1
       )
     )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC;


  Sort  (cost=143578.39..144029.85 rows=180585 width=22) (actual
 time=691.697..691.802 rows=4 loops=1)
    Sort Key: test_app_blog.updated_at DESC
    Sort Method: quicksort  Memory: 25kB
    ->  Hash Anti Join  (cost=21645.24..125693.23 rows=180585 width=22)
 (actual time=306.930..691.769 rows=4 loops=1)
          Hash Cond: (test_app_blog.id = u0.id)
          ->  Index Scan using test_app_blog_is_published_4b47c652_uniq on
 test_app_blog  (cost=0.42..101768.51 rows=180586 width=22) (actual
 time=0.684..304.445 rows=179902 loops=1)
                Index Cond: (is_published = true)
                Filter: (NOT api_has_translation)
                Rows Removed by Filter: 12043
          ->  Hash  (cost=21644.81..21644.81 rows=1 width=4) (actual
 time=283.265..283.369 rows=194102 loops=1)
                Buckets: 262144 (originally 1024)  Batches: 2 (originally
 1)  Memory Usage: 6145kB
                ->  Gather Merge  (cost=1000.73..21644.81 rows=1 width=4)
 (actual time=6.321..245.263 rows=194102 loops=1)
                      Workers Planned: 2
                      Workers Launched: 2
                      ->  Merge Left Join  (cost=0.71..20644.67 rows=1
 width=4) (actual time=0.067..105.796 rows=64701 loops=3)
                            Merge Cond: (u0.id = u1.blog_id)
                            Filter: (u1.id IS NULL)
                            Rows Removed by Filter: 19931
                            ->  Parallel Index Only Scan using
 test_app_blog_pkey on test_app_blog u0  (cost=0.42..13961.37 rows=86361
 width=4) (actual time=0.030..40.520 rows=69119 loops=3)
                                  Heap Fetches: 33632
                            ->  Index Scan using
 test_app_translation_51c6d5db on test_app_translation u1
 (cost=0.29..6077.43 rows=59809 width=8) (actual time=0.032..42.470
 rows=59627 loops=3)
  Planning Time: 2.751 ms
  Execution Time: 692.507 ms
 }}}

 Not sure yet why my postgres seems restricted (I remember it has "enough"
 RAM) but will get back with a comment once I find out more.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:9>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070188685361c3-48679f05-a66a-4f8a-b99a-1b380e11d929-000000%40eu-central-1.amazonses.com.

Re: [Django] #34597: Queryset (split) exclude's usage of Exists is significantly slower than subquery

Reply via email to