I've been working on the Kolo debugging tool and as part of that I've also
run into this issue. Generating unexpected queries when monitoring a django
project was a nasty surprise.

In Kolo's case I was also able to work around it with a monkeypatch, but
not needing this would be a nice performance win.

I also want to point to a similar case in Django where I think the default
behaviour is more helpful - that of forms. Calling `str` on a form can
trigger form validation, but calling `repr` avoids this. I'd be in favour
of moving the current `__repr__` implementation for `QuerySet` into
`__str__` and having an implementation of `__repr__` guaranteed not to
generate extra queries.

Regards,
Ian

On Fri, 15 Apr 2022 at 16:03, 'Kevin Weaver' via Django developers
(Contributions to Django itself) <django-developers@googlegroups.com> wrote:

> I know this thread is more than a year old, but I was just bitten by this
> as well. We use the Elastic APM Python agent
> <https://github.com/elastic/apm-agent-python> which, like Sentry, calls
> `repr()` on local variables that it sends as part of stack traces. We use 
> Django
> REST Framework <https://www.django-rest-framework.org/> and had a
> serializer that looked something like this (fake) example:
>
> ```
> class AccountSerializer(ModelSerializer):
>   owner = PrimaryKeyRelatedField(queryset=User.objects.all())
>   days_since_joined = serializers.SerializerMethodField()
>
>   def get_days_since_joined(self, obj):
>     try:
>       ...
>     except Exception:
>       logger.exception(...)
> ```
>
> What happened was:
>
>    1. We have several places in the application that return a paginated
>    list of, say, 50 accounts, using this serializer.
>    2. Someone introduced the log line to catch exceptions when
>    calculating `days_since_joined`, which is called for each of the 50
>    accounts returned on the page. Due to a bug, the exception was almost
>    always raised.
>    3. Each time the exception is raised, the Elastic APM Python agent
>    generates a stack trace and calls `repr()` on local variables in each stack
>    frame. One of those variables in the `get_days_since_joined` frame is the
>    `AccountSerializer` object `self`, which includes the `owner` property.
>    Calling `repr()` on the `owner` property evaluates the `User.objects.all()`
>    QuerySet, which fetches the first 21 User objects from the database.
>    4. We now had `SELECT ... FROM user LIMIT 21` being called in a loop
>    50 times each time someone tries to load a page of accounts, suddenly
>    consuming significantly more database CPU and slowing down the application
>    considerably until queries start timing out.
>
> To prevent this from happening again, we ended up applying something very
> similar to the suggestions above as a monkey patch:
>
> ```
> orig = QuerySet.__repr__
>
> def __repr__(self):
>   if settings.DEBUG or self._result_cache:
>     return orig(self)
>
>   return f"<{self.__class__.__name__} [not evaluated]>"
>
> QuerySet.__repr__ = __repr__
> ```
>
> If folks are willing to reconsider their positions on this issue, I am
> more than willing to do what is necessary to contribute this change
> upstream, so that others are not impacted by this in the future.
>
> Thank you,
>
> Kevin
>
> On Tuesday, March 23, 2021 at 9:02:39 AM UTC-4 Adam Johnson wrote:
>
>> Would James' suggestion of reusing the result cache in __repr__ have
>> solved your issue? I would like to see that first.
>>
>> I'm not against the DEBUG-only fetching but there hasn't been any
>> suggestion what to show instead that could be of use.
>>
>> One can also mitigate against bad queries particularly well by setting a
>> statement timeout at the database level (at least MySQL and PostgreSQL
>> support this), which most users will generally want at a scale where repr()
>> becomes a problem.
>>
>> On Tue, 23 Mar 2021 at 12:40, Joachim Jablon <ewjo...@gmail.com> wrote:
>>
>>> Just been bitten by that same bug (combination of Sentry, using a
>>> Queryset.as_manager() that creates an unfiltered queryset as a local
>>> variable in the stack, a create signal that errored provoking a stacktrace
>>> that contained the queryset, a table that is always filtered by a field,
>>> and sorted by another field and an index that behaves poorly without the
>>> aformentionned filter).
>>>
>>> So would a contribution that would avoid evaluating the uncached
>>> queryset on repr when DEBUG is False likely to be accepted ? If so, I'm
>>> ready to get my hands dirty :)
>>>
>>> Cheers !
>>>
>>> Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a
>>> écrit :
>>>
>>>> I agree with Matt on that. Avoiding executing the queries on production
>>>> would be helpful!
>>>>
>>>> Let me share my case. I use django-rest-framework in a specific project
>>>> and DRF has a feature: a serializer has a fancier string representation
>>>> when printed. I was using a serializer with a queryset in a view that had
>>>> an exception further, which caused the serializer to be logged and thus,
>>>> the queryset to be executed.
>>>>
>>>> There are more details in this discussion:
>>>> https://github.com/encode/django-rest-framework/discussions/7782
>>>>
>>>> The short story is: I was logging this serializer passively and it was
>>>> causing the execution of a query to a whole table with millions of records,
>>>> without any sorting optimization, creating hard to debug performance
>>>> problems.
>>>>
>>>> I understand it is an unusual situation, but repeating Matt's words:
>>>> Django should be reliable in production for any combination of those
>>>> conditions.
>>>>
>>>> Thanks!
>>>>
>>>> On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:
>>>>
>>>>> Perhaps we ought to just keep the current behavior when DEBUG is True
>>>>> (it seems so obvious now, I can't believe it wasn't the first thing I
>>>>> suggested)? Django does lots of helpful things in DEBUG mode at the 
>>>>> expense
>>>>> of performance. I think this would be an innocuous change for most people.
>>>>>
>>>>> It is not the most important thing to remove behavior that most of
>>>>>> users use because we want to fix an edge case that was reported twice in
>>>>>> the last six years.
>>>>>>
>>>>>
>>>>> I don't consider any of those *individual *conditions to trigger the
>>>>> problem "off the beaten path" for a bigger production Django site. All of
>>>>> them *combined *is obviously extremely rare, but it will effect
>>>>> someone else eventually*. *It doesn't sit well with me to not fix the
>>>>> issue. Django should be reliable in production for any combination of 
>>>>> those
>>>>> conditions.
>>>>>
>>>>> On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak
>>>>> wrote:
>>>>>>
>>>>>> W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro
>>>>>> napisał:
>>>>>>>
>>>>>>> Yes, it's a complicated issue, but isn't the SQL query the ultimate
>>>>>>> representation of which methods are called or not?
>>>>>>>
>>>>>>> Having the query evaluated during debugging has been shown to be
>>>>>>> harmful in certain situations, isn't that the most important thing to 
>>>>>>> fix?
>>>>>>>
>>>>>>
>>>>>> Current behavior is extremely valuable IMO. It is not the most
>>>>>> important thing to remove behavior that most of users use because we want
>>>>>> to fix an edge case that was reported twice in the last six years. I 
>>>>>> agree
>>>>>> that we can clarify this in docs. SQL query is not a solutions because 
>>>>>> not
>>>>>> all of ORM users know SQL. Moreover the current `repr()` shows values 
>>>>>> from
>>>>>> DB that we'll miss showing only SQL. You can check in the Django
>>>>>> documentation how users use the current `repr()` e.g. in tutorials.
>>>>>>
>>>>>> Best,
>>>>>> Mariusz
>>>>>>
>>>>> --
>>>
>> You received this message because you are subscribed to the Google Groups
>>> "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-develop...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Adam
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/00cabdaf-c9ef-4403-b43a-8d50eeffa42an%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/00cabdaf-c9ef-4403-b43a-8d50eeffa42an%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFv-zfK9VmXb4v-5-y_tG_-SPbRBda33GzxB1GX06dvRVS6jsA%40mail.gmail.com.
  • Re:... Michel Sabchuk
    • ... Joachim Jablon
      • ... 'Adam Johnson' via Django developers (Contributions to Django itself)
        • ... 'Kevin Weaver' via Django developers (Contributions to Django itself)
          • ... Ian Foote

Reply via email to