#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
     Reporter:  Florian Apolloner    |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Florian Apolloner):

 > What if we performed a subquery pushdown instead to make compilation
 more predictable so we can have the order by clause refer to the union
 columns by name?

 Maybe, in hindsight that would probably be have the way to implement
 unions (etc) in the first place. Maybe now is the time to switch to that
 :) What does a subquery mean in terms of performance -- I'd hope the
 database wouldn't care to much…

 > I don't see `OrderBy.as_sql` usage of template for compilation is an
 abuse at all; it's in charge of it's own compilation unit.

 That is fair, but then again it still requires help from the outside
 because we have to pull it into the select clause when used with distinct.
 Which then results in horrible queries like the one outlined by Tim in
 https://github.com/django/django/pull/15687#issuecomment-1133798092 (see
 "The old SQL") where we have the initial select value and then once with
 `IS NULL` and once without because we can't match it up to the initial
 select value that is aliased to `last_date` anymore. In practice part of
 this is caused by
 
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L466
 where we use a regex to parse the generated SQL from `OrderBy` and match
 it back to the select clause, which then obviously fails because the
 compilation of `OrderBy` generated two clauses. Imo this is all rather
 fragile…

 Also, and this is where #33767 comes back into play, we should try as hard
 as possible to reuse aliases where possible. Tim's old SQL example
 executed on PG nowadays (his is CockroachDB) looks like this:

 {{{
  SELECT DISTINCT "ordering_author"."id",
                 "ordering_author"."name",
                 "ordering_author"."editor_id",
                 (SELECT Max(U0."pub_date") AS "last_date"
                  FROM   "ordering_article" U0
                  WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                           AND Upper(U0."headline" :: text) LIKE Upper(
                               '%Article%') )
                  GROUP  BY U0."author_id") AS "last_date",
 FROM   "ordering_author"
 ORDER  BY (SELECT Max(U0."pub_date") AS "last_date"
            FROM   "ordering_article" U0
            WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                     AND Upper(U0."headline" :: text) LIKE
 Upper('%Article%') )
            GROUP  BY U0."author_id") DESC NULLS FIRST
 }}}

 when it could be

 {{{
 SELECT DISTINCT "ordering_author"."id",
                 "ordering_author"."name",
                 "ordering_author"."editor_id",
                 (SELECT Max(U0."pub_date") AS "last_date"
                  FROM   "ordering_article" U0
                  WHERE  ( U0."author_id" = ( "ordering_author"."id" )
                           AND Upper(U0."headline" :: text) LIKE Upper(
                               '%Article%') )
                  GROUP  BY U0."author_id") AS "last_date",
 FROM   "ordering_author"
 ORDER  BY "last_date" DESC NULLS FIRST
 }}}

 Reusing aliases and references seems like a good idea. It makes the query
 imo more readable and also means less parsing overhead for the server
 (probably negligible but still).

 Last but not least, the reason I discovered this is the new code for
 psycopg3 support. Psycopg3 makes use of server side param binding which
 introduces a whole new class of issues. To be fair this is not postgres
 alone but also informix and others. When actually using server side params
 you run into funny things like this: Assume Django ends up generating a
 query like this:
 {{{
 cursor.execute("""
     SELECT left(name, %s) AS prefix, count(*) AS number
     FROM people
     GROUP BY left(name, %s)
     ORDER BY left(name, %s)
     """,
     [2, 2, 2])
 }}}

 This works perfectly fine with client side merging (simply replace %s with
 2), but breaks done once you use server side binding because the server
 gets `$1 to $3` for the params and doesn't know that they are actually the
 same and as such fails because it no longer knows if the group by is part
 of the select clause or not. Mind you, this example is rather simplistic
 but Django actually generates those like shown in Tim's example above.

 To fix this we have two options:
  * Switch to named parameters and introduce even more logic to always
 write %(same_param)s in the previous example.
  * Switch our usage of order by / group by to use aliases and column
 numbers whenever possible.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018138556ccb-cc3aa485-66ed-4b4c-b1b2-d0d0e3fc85f3-000000%40eu-central-1.amazonses.com.

Reply via email to