#34750: Group by removal optimization generates in correct count query
-----------------------------------------+------------------------
Reporter: Toan Vuong | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 4.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
I believe this is related to the work in
https://code.djangoproject.com/ticket/28477 (and follow-up issues/changes
due to that ticket).
Tested on:
Django 4.2.2
OS X 13.4.1
Python 3.9.16
For the Oracle backend:
cx-Oracle 8.3.0 with instantclient 19.8
For the Postgres backend:
psycopg/psycopg-binary 3.1.9
Attached is a sample project, and the relevant query is below:
{{{
def populate_data():
q1 = Question(question_text='t1')
q1.save()
for i in range(10):
c = Choice(question=q1, choice_text='c1', votes=i)
c.save()
# Need to populate the data only once, so comment it out on subsequent
runs
populate_data()
qs = (
Question.objects.values('id', 'question_text')
.annotate(mysum=Sum('choice__votes'))
.annotate(choice__votes_threshold=Case(
When(choice__votes__gt=1, then=Value(1000)),
default=Value(-1)))
)
print(qs.count() == len(qs))
}}}
When issuing the count query (`qs.count()`), Django generates this:
{{{
SELECT COUNT(*) FROM (SELECT "polls_question"."id" AS "col1",
"polls_question"."question_text" AS "col2" FROM "polls_question" LEFT
OUTER JOIN "polls_choice" ON ("polls_question"."id" =
"polls_choice"."question_id") GROUP BY 1
}}}
I've chased it down to
[https://github.com/django/django/blob/main/django/db/models/sql/query.py#L490
this optimization]. This count would return 1 because it's just a simple
join.
However, the actual query is this:
{{{
SELECT "polls_question"."id", "polls_question"."question_text",
SUM("polls_choice"."votes") AS "mysum", CASE WHEN "polls_choice"."votes" >
1 THEN 1000 ELSE -1 END AS "choice__votes_threshold" FROM
"polls_question" LEFT OUTER JOIN "polls_choice" ON ("polls_question"."id"
= "polls_choice"."question_id") GROUP BY "polls_question"."id", 4
}}}
Due to the group by and the varying `choice__votes_threshold`, there
should be 2 rows in the result set.
This _did_ used to work in 3.2.18, and I didn't dig too much into it but I
suspect the optimization was introduced after that version, hence why it
worked.
--
Ticket URL: <https://code.djangoproject.com/ticket/34750>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/010701899d37ec2d-69d77806-5755-43d5-b899-580452e7994d-000000%40eu-central-1.amazonses.com.