Re: Optimization for get_search_results() in admin

2021-10-24 Thread Jacob Walls
 

Hi list,

I came to suggest a resolution for the above: the potential for impossible 
(or resource-gobbling) queries in admin changelist searches. (Ticket 
)

Several folks have proposed the same solution: AND’ing Q objects together 
rather than chaining filters. Last time we had a PR, the author was pointed 
here (hi OP!) in light of the subtle behavior change that would crop up in 
searches over multi-valued relationships. We would be moving from the 
second example query in "spanning multi-valued relationships 
"
 
to the first example query, which will return fewer rows.

I see three +1s above in this thread. I write to add:

1 — I think this is an implementation detail. The admin changelist docs do 
not specify which of the two documented patterns for filtering multivalued 
relationships is used.

2 — Intuition is ambiguous here. I’m sure there could be convincing 
examples for each query pattern. That leads me to say it’s better for the 
problematic behavior to be opt-in. If folks really want the old behavior 
they can customize and then impose some limits to prevent excessive joins.

New PR  with tests. Came here 
to summarize the state of play before opening a substantially similar PR to 
OP’s. Thoughts, input, desires all welcomed.

Cheers,

Jacob


PS — there is another accepted ticket 
 for the same root cause 
framed as a request to just limit the number of terms in admin changelist 
searches. I think we could wontfix it if we instead address the root cause.

On Monday, March 18, 2019 at 9:25:29 AM UTC-4 Adam Johnson wrote:

> Hi,
>
> I've also seen this behaviour before at YPlan, we sometimes got the "mysql 
> can't join more than 61 tables" error or had the database slow to a crawl. 
> Some of the admin classes required custom get_search_results functions to 
> fix this.
>
> I have only skimmed the old tickets and PR's but I think that a breaking 
> change to subtly changing the search behaviour is worth it to prevent this 
> footgun of doom is a good compromise.
>
> Adam
>
>
> On Tue, 5 Mar 2019 at 19:35, Joel M  wrote:
>
>> I don't know whether anyone will see this, as it's my first time posting 
>> here, but I have to second this.
>>
>> This problem brought down our site yesterday when an unsuspecting admin 
>> user tried a 3-word search in an admin that had several search fields 
>> involving joins. The result was that the number of joins was multiplied by 
>> the number of search terms. The database machine literally ran out of space 
>> making the temporary table (it had been about 10% full -- this query 
>> occupied many times the size of the entire database), and crashed 
>> everything. It works fine with just 1 word. As a workaround, we blocked 
>> nonessential users from being able to make that search and will be removing 
>> less-important fields from the search_fields in a future code update. But 
>> it took awhile to figure out what the problem was, and this could happen in 
>> any admin with enough related search fields or search terms. Eventually, I 
>> came up with the same fix petros suggests in their PR.
>>
>> I know there may be users who expect the previous query behavior, but 
>> this is a gotcha that can hide for awhile before somebody accidentally 
>> breaks everything.
>>
>> More related tickets: https://code.djangoproject.com/ticket/16063 (with 
>> a similar solution), https://code.djangoproject.com/ticket/27864 
>> (mitigate by limiting number of search terms)
>>
>> On Friday, September 23, 2016 at 1:49:14 PM UTC-4, petros.m...@gmail.com 
>> wrote:
>>>
>>> Hello,
>>>
>>> I would like to open a discussion on the change I have proposed with 
>>> pull request #7277 .
>>>
>>> To bring the discussion here, the problem with current implementation of 
>>> get_search_results() is that, when search fields include fields through 
>>> reverse relationships, it produces queries that the more words are used in 
>>> the search term the more inefficient they are. This inefficiency comes from 
>>> duplicate left joins with same tables which are in that case introduced by 
>>> Django ORM. There have been relevant reports by others before, which you 
>>> can see in tickets #16603 
>>>  and #25789 . However, 
>>> judging by the fact that for five years there has not been a solution for 
>>> this, it seems that it is not an easy fix.
>>>
>>> That said, this inefficiency can easily get the system down, as users in 
>>> the admin can use a few words in the search term, either deliberately or by 
>>> mistake, e.g. by accidentally copying & pasting a whole sentence or 
>>> paragraph. In my case, with just 3 words in the search term

Re: Optimization for get_search_results() in admin

2021-10-24 Thread 'Adam Johnson' via Django developers (Contributions to Django itself)
I remain +1 to the change

I can see how we could claim this is "an implementation detail". But I
think we should also be sympathetic to the idea that this change may break
many workflows

If folks really want the old behavior they can customize and then impose
> some limits to prevent excessive joins.
>

Perhaps we should have a ModelAdmin flag available to activate the old
behaviour, that could be immediately deprecated. This would help anyone
transitioning whose workflow is immediately broken. An alternative would be
to provide a code snippet in the documentation to implement the old
behaviour. But however it's done, it would be good to not leave upgrading
projects "high and dry" when they discover the implication of the changed
search results.

On Sun, 24 Oct 2021 at 18:57, Jacob Walls  wrote:

> Hi list,
>
> I came to suggest a resolution for the above: the potential for impossible
> (or resource-gobbling) queries in admin changelist searches. (Ticket
> )
>
> Several folks have proposed the same solution: AND’ing Q objects together
> rather than chaining filters. Last time we had a PR, the author was pointed
> here (hi OP!) in light of the subtle behavior change that would crop up in
> searches over multi-valued relationships. We would be moving from the
> second example query in "spanning multi-valued relationships
> "
> to the first example query, which will return fewer rows.
>
> I see three +1s above in this thread. I write to add:
>
> 1 — I think this is an implementation detail. The admin changelist docs do
> not specify which of the two documented patterns for filtering multivalued
> relationships is used.
>
> 2 — Intuition is ambiguous here. I’m sure there could be convincing
> examples for each query pattern. That leads me to say it’s better for the
> problematic behavior to be opt-in. If folks really want the old behavior
> they can customize and then impose some limits to prevent excessive joins.
>
> New PR  with tests. Came
> here to summarize the state of play before opening a substantially similar
> PR to OP’s. Thoughts, input, desires all welcomed.
>
> Cheers,
>
> Jacob
>
>
> PS — there is another accepted ticket
>  for the same root cause
> framed as a request to just limit the number of terms in admin changelist
> searches. I think we could wontfix it if we instead address the root cause.
>
> On Monday, March 18, 2019 at 9:25:29 AM UTC-4 Adam Johnson wrote:
>
>> Hi,
>>
>> I've also seen this behaviour before at YPlan, we sometimes got the
>> "mysql can't join more than 61 tables" error or had the database slow to a
>> crawl. Some of the admin classes required custom get_search_results
>> functions to fix this.
>>
>> I have only skimmed the old tickets and PR's but I think that a breaking
>> change to subtly changing the search behaviour is worth it to prevent this
>> footgun of doom is a good compromise.
>>
>> Adam
>>
>>
>> On Tue, 5 Mar 2019 at 19:35, Joel M  wrote:
>>
>>> I don't know whether anyone will see this, as it's my first time posting
>>> here, but I have to second this.
>>>
>>> This problem brought down our site yesterday when an unsuspecting admin
>>> user tried a 3-word search in an admin that had several search fields
>>> involving joins. The result was that the number of joins was multiplied by
>>> the number of search terms. The database machine literally ran out of space
>>> making the temporary table (it had been about 10% full -- this query
>>> occupied many times the size of the entire database), and crashed
>>> everything. It works fine with just 1 word. As a workaround, we blocked
>>> nonessential users from being able to make that search and will be removing
>>> less-important fields from the search_fields in a future code update. But
>>> it took awhile to figure out what the problem was, and this could happen in
>>> any admin with enough related search fields or search terms. Eventually, I
>>> came up with the same fix petros suggests in their PR.
>>>
>>> I know there may be users who expect the previous query behavior, but
>>> this is a gotcha that can hide for awhile before somebody accidentally
>>> breaks everything.
>>>
>>> More related tickets: https://code.djangoproject.com/ticket/16063 (with
>>> a similar solution), https://code.djangoproject.com/ticket/27864
>>> (mitigate by limiting number of search terms)
>>>
>>> On Friday, September 23, 2016 at 1:49:14 PM UTC-4, petros.m...@gmail.com
>>> wrote:

 Hello,

 I would like to open a discussion on the change I have proposed with
 pull request #7277 .

 To bring the discussion here, the problem with current implementation
 of get_search_results() is that, when search fields include fields through
 rever