Re: Stop QuerySet repr from executing queries

'Kevin Weaver' via Django developers (Contributions to Django itself) Thu, 21 Apr 2022 12:45:41 -0700

I know this thread is more than a year old, but I was just bitten by this 
as well. We use the Elastic APM Python agent 
<https://github.com/elastic/apm-agent-python> which, like Sentry, calls 
`repr()` on local variables that it sends as part of stack traces. We use 
Django 
REST Framework <https://www.django-rest-framework.org/> and had a 
serializer that looked something like this (fake) example:


```
class AccountSerializer(ModelSerializer):
  owner = PrimaryKeyRelatedField(queryset=User.objects.all())
  days_since_joined = serializers.SerializerMethodField()

  def get_days_since_joined(self, obj):
    try:
      ...
    except Exception:
      logger.exception(...)
```

What happened was:

   1. We have several places in the application that return a paginated 
   list of, say, 50 accounts, using this serializer.
   2. Someone introduced the log line to catch exceptions when calculating 
   `days_since_joined`, which is called for each of the 50 accounts returned 
   on the page. Due to a bug, the exception was almost always raised.
   3. Each time the exception is raised, the Elastic APM Python agent 
   generates a stack trace and calls `repr()` on local variables in each stack 
   frame. One of those variables in the `get_days_since_joined` frame is the 
   `AccountSerializer` object `self`, which includes the `owner` property. 
   Calling `repr()` on the `owner` property evaluates the `User.objects.all()` 
   QuerySet, which fetches the first 21 User objects from the database.
   4. We now had `SELECT ... FROM user LIMIT 21` being called in a loop 50 
   times each time someone tries to load a page of accounts, suddenly 
   consuming significantly more database CPU and slowing down the application 
   considerably until queries start timing out.

To prevent this from happening again, we ended up applying something very 
similar to the suggestions above as a monkey patch:

```
orig = QuerySet.__repr__

def __repr__(self):
  if settings.DEBUG or self._result_cache:
    return orig(self)

  return f"<{self.__class__.__name__} [not evaluated]>"

QuerySet.__repr__ = __repr__
```

If folks are willing to reconsider their positions on this issue, I am more 
than willing to do what is necessary to contribute this change upstream, so 
that others are not impacted by this in the future.

Thank you,

Kevin

On Tuesday, March 23, 2021 at 9:02:39 AM UTC-4 Adam Johnson wrote:

> Would James' suggestion of reusing the result cache in __repr__ have 
> solved your issue? I would like to see that first.
>
> I'm not against the DEBUG-only fetching but there hasn't been any 
> suggestion what to show instead that could be of use.
>
> One can also mitigate against bad queries particularly well by setting a 
> statement timeout at the database level (at least MySQL and PostgreSQL 
> support this), which most users will generally want at a scale where repr() 
> becomes a problem.
>
> On Tue, 23 Mar 2021 at 12:40, Joachim Jablon <ewjo...@gmail.com> wrote:
>
>> Just been bitten by that same bug (combination of Sentry, using a 
>> Queryset.as_manager() that creates an unfiltered queryset as a local 
>> variable in the stack, a create signal that errored provoking a stacktrace 
>> that contained the queryset, a table that is always filtered by a field, 
>> and sorted by another field and an index that behaves poorly without the 
>> aformentionned filter).
>>
>> So would a contribution that would avoid evaluating the uncached queryset 
>> on repr when DEBUG is False likely to be accepted ? If so, I'm ready to get 
>> my hands dirty :)
>>
>> Cheers !
>>
>> Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a 
>> écrit :
>>
>>> I agree with Matt on that. Avoiding executing the queries on production 
>>> would be helpful!
>>>
>>> Let me share my case. I use django-rest-framework in a specific project 
>>> and DRF has a feature: a serializer has a fancier string representation 
>>> when printed. I was using a serializer with a queryset in a view that had 
>>> an exception further, which caused the serializer to be logged and thus, 
>>> the queryset to be executed.
>>>
>>> There are more details in this discussion: 
>>> https://github.com/encode/django-rest-framework/discussions/7782
>>>
>>> The short story is: I was logging this serializer passively and it was 
>>> causing the execution of a query to a whole table with millions of records, 
>>> without any sorting optimization, creating hard to debug performance 
>>> problems.
>>>
>>> I understand it is an unusual situation, but repeating Matt's words: 
>>> Django should be reliable in production for any combination of those 
>>> conditions. 
>>>
>>> Thanks!
>>>
>>> On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:
>>>
>>>> Perhaps we ought to just keep the current behavior when DEBUG is True 
>>>> (it seems so obvious now, I can't believe it wasn't the first thing I 
>>>> suggested)? Django does lots of helpful things in DEBUG mode at the 
>>>> expense 
>>>> of performance. I think this would be an innocuous change for most people. 
>>>>
>>>> It is not the most important thing to remove behavior that most of 
>>>>> users use because we want to fix an edge case that was reported twice in 
>>>>> the last six years.
>>>>>
>>>>
>>>> I don't consider any of those *individual *conditions to trigger the 
>>>> problem "off the beaten path" for a bigger production Django site. All of 
>>>> them *combined *is obviously extremely rare, but it will effect 
>>>> someone else eventually*. *It doesn't sit well with me to not fix the 
>>>> issue. Django should be reliable in production for any combination of 
>>>> those 
>>>> conditions.
>>>>
>>>> On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak 
>>>> wrote:
>>>>>
>>>>> W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro 
>>>>> napisał:
>>>>>>
>>>>>> Yes, it's a complicated issue, but isn't the SQL query the ultimate 
>>>>>> representation of which methods are called or not?
>>>>>>
>>>>>> Having the query evaluated during debugging has been shown to be 
>>>>>> harmful in certain situations, isn't that the most important thing to 
>>>>>> fix?
>>>>>>
>>>>>
>>>>> Current behavior is extremely valuable IMO. It is not the most 
>>>>> important thing to remove behavior that most of users use because we want 
>>>>> to fix an edge case that was reported twice in the last six years. I 
>>>>> agree 
>>>>> that we can clarify this in docs. SQL query is not a solutions because 
>>>>> not 
>>>>> all of ORM users know SQL. Moreover the current `repr()` shows values 
>>>>> from 
>>>>> DB that we'll miss showing only SQL. You can check in the Django 
>>>>> documentation how users use the current `repr()` e.g. in tutorials.
>>>>>
>>>>> Best,
>>>>> Mariusz
>>>>>
>>>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Adam
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/00cabdaf-c9ef-4403-b43a-8d50eeffa42an%40googlegroups.com.

Re: Stop QuerySet repr from executing queries

Reply via email to