#25705: Parameters are not adapted or quoted in Query.__str__
-------------------------------------+-------------------------------------
     Reporter:  Dmitry Dygalo        |                    Owner:  Alex
         Type:                       |                   Status:  assigned
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  1                    |  Patch needs improvement:  1
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Alex):

 Replying to [comment:16 Simon Charette]:
 > > I wouldn't consider someone doing str(qs.query) to then pass in to
 raw() (which the docs already warn of SQL injections) a real use case that
 anyone would do.
 >
 > I don't agree on that point. From helping users through various forums
 in most cases the reason why they were bringing up the issue of improper
 quoting was when they tried executing the query as is. The duplicates of
 this ticket seem to support this position (see #17741 and #25092
 [https://stackoverflow.com/questions/31287184/django-orm-magic-bug-
 or-a-feature/31287932#31287932 SO]) and I don't think would be hard to
 find other examples.

 If the worry is `raw(str(qs.query))`, then not having the quoting is a
 security issue.
 In fact I can actually do an SQL injection abusing this lack of quoting.
 {{{
 qs = User.objects.filter(first_name='"test")--', is_staff=True).only('id')
 >>> <QuerySet []>
 list(User.objects.raw(str(qs.query)))
 >>> [<User: test>]
 list(User.objects.raw(str(qs.query)))
 >>> 'SELECT "auth_user"."id" FROM "auth_user" WHERE
 ("auth_user"."first_name" = "test")-- AND "auth_user"."is_staff")'
 }}}
 With proper quoting, as the one done when executing normally, the query is
 `SELECT "auth_user"."id" FROM "auth_user" WHERE ("auth_user"."first_name"
 = \'"test")--\' AND "auth_user"."is_staff")`
 So if `raw(str(qs.query))` is a risk, then quoting the parameters would
 fix it.

 > > Also my recommended choice was not trying to emulate parameter
 quoting, but to use the database library mogrify method in 3 of the 5
 supported backends.
 >
 > You second alternative mentioned
 >
 > > Use the mogrify function in the first three backends, and manually
 quote the parameters in the other two.
 >
 > I ''think'' that would qualify as a form of emulation or best-effort
 wouldn't it? What guarantees do we have that the quoting is appropriate
 for all datatypes that these backends support as parameters? Are we
 feeling confident, knowing that users will attempt to pipe `str(qs.query)`
 into `raw` and that `Query.__str__` always use `DEFAULT_DB_ALIAS` for
 compilation #25947 (even if the proper backend can often only be
 determined as execution time) that we are exposing the right tools to
 users and we can commit to them being safe for the years to come?

 And as I said just after that, that alternative was rejected in the past,
 would be too much effort and I was only mentioning it as the only way to
 fix it in every backend. I should have been clearer that I wasn't pushing
 for this option.

 > IMO the usage of `raw(str(qs.query))`, and the main motive for this
 ticket, is a symptom of a lack of documented way for safely building and
 executing the SQL and parameters from a `QuerySet` object which makes me
 believe the focus should be on documenting `queryset.qs.sql_with_params()`
 first instead.

 That idea was rejected [https://code.djangoproject.com/ticket/34636 14
 months ago], but it could be reconsidered in light of this discussion.
 Also, I didn't actually find `qs.query` being documented, mostly this
 [https://docs.djangoproject.com/en/5.0/ref/models/querysets/#pickling-
 querysets mention] about pickling querysets.

 > As for the printing and copy-pasting into a shell use case we've reached
 an agreement that the only way to see the query with the parameters binded
 correctly is after executing it. Knowing that, and the fact
 `Query.__str__` is still pretty legible even without the exact mogrifying
 that only leaves the copy-pasting into a shell use case. Should we make it
 easier to retrieve the exact SQL associated with a particular queryset
 instead? Something like `QuerySet.executed_query: str | None` that is only
 present on ''evaluated'' querysets.

 I would be happy with this option. An alternative would be an error if the
 queryset hasn't been evaluated instead of None. But I'm not that big of a
 fan of the error idea.

 > These appears like solutions that prevent misuse of `Query.__str__` and
 avoid maintaining correct and safe x-backend mogrifying solutions?

 However, my initial investigation was not deep enough and I found more
 reasons to discard the mogrify option:
 - In mysqlclient, mogrify was added in version
 [https://github.com/PyMySQL/mysqlclient/blob/main/HISTORY.rst#whats-new-
 in-220 2.2.0] and django supports
 [https://docs.djangoproject.com/en/5.0/ref/databases/#mysqlclient 1.4.3]
 onwards, so there would have to be an additional check and a difference in
 behavior.
 - In the MySQL Connector/Python driver I couldn't find a mogrify method.
 Not the recommended driver, but still supported.
 - In psicopg3, `mogrify` is only on the
 
[https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.ClientCursor.mogrify
 ClientCursor] class. While that's the default in Django, server side is
 also [https://docs.djangoproject.com/en/5.0/ref/databases/#server-side-
 parameters-binding supported]. Again, an additional check. Also, I'm not
 sure if this means that these usages of mogrify
 [https://code.djangoproject.com/ticket/25705?replyto=16#comment:12 I
 mentioned] in the postgres backend don't work with server side parameter
 binding.

 > Django is already using the postgres mogrify in its own compose_sql
 function in the
 
[https://github.com/django/django/blob/f302343380c77e1eb5dab3b64dd70895a95926ca/django/contrib/postgres/search.py#L317
 search backend],
 
[https://github.com/django/django/blob/f302343380c77e1eb5dab3b64dd70895a95926ca/django/db/backends/postgresql/schema.py#L46
 schema queries] and
 
[https://github.com/django/django/blob/f302343380c77e1eb5dab3b64dd70895a95926ca/django/db/backends/postgresql/base.py#L98
 ensuring the role of the connection]

 My proposal is to close this as a wontfix and work on the
 `QuerySet.executed_query` idea and maybe reconsider documenting
 `sql_with_params`
-- 
Ticket URL: <https://code.djangoproject.com/ticket/25705#comment:17>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070190b7c4320d-058c250f-9228-4666-abc0-2d5f2184b4b9-000000%40eu-central-1.amazonses.com.

Reply via email to