#35865: Queryset aggregation keeps unnecessary SQL joins
-------------------------------------+-------------------------------------
     Reporter:  Ruslan               |                    Owner:  (none)
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

 * easy:  1 => 0
 * stage:  Unreviewed => Accepted
 * summary:  Query.get_count() keeps unnecessary SQL joins => Queryset
     aggregation keeps unnecessary SQL joins

Comment:

 To give you a bit of context here the ORM use to not prune unused
 annotations before Django 4.2 (#28477) and the lack post annotation
 pruning left-over `JOIN` pruning was identified as
 [https://github.com/django/django/pull/16263#issuecomment-1311134646 a
 potential optimization at the time].

 To give a concrete example say you do

 {{{#!python
 Book.objects.annotate(
     author_name=Concat("author__first_name", V(" "), "author_last_name"),
 ).count()
 }}}

 then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL
 would have been

 {{{#!sql
 SELECT COUNT(*) FROM (
     SELECT book.id, (author.first_name || ' ' || author.last_name)
 author_name
     FROM book
     LEFT JOIN author ON (book.author_id = author.id)
 )
 }}}

 and after it is

 {{{#!sql
 SELECT COUNT(*)
 FROM book
 LEFT JOIN author ON (book.author_id = author.id)
 }}}

 Now obviously in this case the M:1 join against author is not necessary in
 this case but it's not always trivial to determine. Take the following
 example

 {{{#!python
 author_qs = Author.objects.annotate(
     book_title=F("books__title")
 )
 author_qs.count()
 }}}

 which results in

 {{{#!sql
 SELECT COUNT(*)
 FROM author
 LEFT JOIN book ON (book.author_id = author.id)
 }}}

 Then in this case we can't prune the 1:M join as it's multi-valued
 (possibly many books for each author) and would return a different value
 from `len(author_qs)`.

 The problem then becomes that JOINs can be only be pruned if these two
 conditions are met

 1. They are not referenced anymore (could be done by decrementing
 reference counts on annotation pruning)
 2. They are not involved in multi-valued relationships (AKA many-to-many
 or reverse many-to-one)

 I'm tentatively accepting as this is an already identified desired
 optimization but it is **far** from being an easy picking, it's in the
 realm of close to wont-fix **very hard** to do correctly.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/35865#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070192c3e0776c-06da7d8f-21f1-46a4-8ec2-f6346ea67be3-000000%40eu-central-1.amazonses.com.

Reply via email to