#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Florian Apolloner):
> What if we performed a subquery pushdown instead to make compilation
more predictable so we can have the order by clause refer to the union
columns by name?
Maybe, in hindsight that would probably be have the way to implement
unions (etc) in the first place. Maybe now is the time to switch to that
:) What does a subquery mean in terms of performance -- I'd hope the
database wouldn't care to much…
> I don't see `OrderBy.as_sql` usage of template for compilation is an
abuse at all; it's in charge of it's own compilation unit.
That is fair, but then again it still requires help from the outside
because we have to pull it into the select clause when used with distinct.
Which then results in horrible queries like the one outlined by Tim in
https://github.com/django/django/pull/15687#issuecomment-1133798092 (see
"The old SQL") where we have the initial select value and then once with
`IS NULL` and once without because we can't match it up to the initial
select value that is aliased to `last_date` anymore. In practice part of
this is caused by
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L466
where we use a regex to parse the generated SQL from `OrderBy` and match
it back to the select clause, which then obviously fails because the
compilation of `OrderBy` generated two clauses. Imo this is all rather
fragile…
Also, and this is where #33767 comes back into play, we should try as hard
as possible to reuse aliases where possible. Tim's old SQL example
executed on PG nowadays (his is CockroachDB) looks like this:
{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY (SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE
Upper('%Article%') )
GROUP BY U0."author_id") DESC NULLS FIRST
}}}
when it could be
{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY "last_date" DESC NULLS FIRST
}}}
Reusing aliases and references seems like a good idea. It makes the query
imo more readable and also means less parsing overhead for the server
(probably negligible but still).
Last but not least, the reason I discovered this is the new code for
psycopg3 support. Psycopg3 makes use of server side param binding which
introduces a whole new class of issues. To be fair this is not postgres
alone but also informix and others. When actually using server side params
you run into funny things like this: Assume Django ends up generating a
query like this:
{{{
cursor.execute("""
SELECT left(name, %s) AS prefix, count(*) AS number
FROM people
GROUP BY left(name, %s)
ORDER BY left(name, %s)
""",
[2, 2, 2])
}}}
This works perfectly fine with client side merging (simply replace %s with
2), but breaks done once you use server side binding because the server
gets `$1 to $3` for the params and doesn't know that they are actually the
same and as such fails because it no longer knows if the group by is part
of the select clause or not. Mind you, this example is rather simplistic
but Django actually generates those like shown in Tim's example above.
To fix this we have two options:
* Switch to named parameters and introduce even more logic to always
write %(same_param)s in the previous example.
* Switch our usage of order by / group by to use aliases and column
numbers whenever possible.
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018138556ccb-cc3aa485-66ed-4b4c-b1b2-d0d0e3fc85f3-000000%40eu-central-1.amazonses.com.