date:20210303

Re: About the ORM icontains operator's disadvantage on PostgreSQL performance and query results.

2021-03-03 Thread Mesut Öncel

Does anyone have a different opinion or will this topic be closed?

Mesut Öncel , 1 Mar 2021 Pzt, 13:27 tarihinde şunu
yazdı:

> Thank you for your detailed explanation. You are right, they have tried to
> shape the database suitable for this structure for a long time, but
> removing the expression indexes will not cause a crisis. People and
> products using the database created by Django will already have to create a
> standard index and then an expression index for "UPPER". I have presented
> the gin index as an example, but indexes can be produced for full text
> search in different structures. But regardless, an expression index must be
> added for the "UPPER" function. Creating an expression gin index in
> addition to a standard gin index can be costly and even cause performance
> problems in large databases. Of course, it is also necessary to consider
> its reindex. As a result, my main expectation is not to ignore the current
> structure of PostgreSQL. Or if you are using "icontains", be sure to
> include an expression string because we should say we prefer not to use
> "ilike". :)
>
> Hannes Ljungberg , 1 Mar 2021 Pzt, 11:31
> tarihinde şunu yazdı:
>
>> I kind of agree that using `UPPER` instead of `ILIKE` for `__icontains`
>> on PostgreSQL isn’t optimal. But it is quite easy to create a functional
>> trigram GIN-index which use `UPPER` to allow these lookups to use an index.
>> This will be even easier in Django 3.2 where you can create functional
>> indexes in your model definitions.
>>
>> The recommended way to handle case insensitive searches in PostgreSQL is
>> by using functional indexes (
>> https://www.postgresql.org/docs/current/indexes-expressional.html).  But
>> with trigram indexes this isn’t needed and they are required to get index
>> scans for leading wildcard searches. Note that trigram GIN indexes do have
>> drawbacks, for example they don’t support the `=`-operator and will not
>> support search strings with less than 3 characters.
>>
>> I’ve been in a similar position as you and to solve this I created a
>> custom lookup which used `ILIKE` and used that instead of `__icontains`.
>>
>> It might make sense to change the `__icontains`  and `__iendswith`
>> lookups to use `ILIKE` instead of `UPPER` but I’m not really sure that it’s
>> justified to break the indexed queries people already have in place. That
>> is where people have the GIN trigram `UPPER` index opposed from the regular
>> GIN trigram index. The performance issue you’re describing doesn’t really
>> change if we use `ILIKE` over `UPPER`. You need to install an index anyway.
>>
>> Regarding your issue with Turkish characters I think that it works
>> because `ILIKE` internally uses some form of `LOWER` and `LOWER('İstanbul')
>> = LOWER('istanbul')` would’ve worked in your case. As James wrote this
>> behaviour depends on your configured locale. I think the one way to do
>> these kind of searches without changing the locale is to use `tsvector` and
>> `tsquery` with a Turkish configuration, you can even make them unaccented
>> to allow matching both “Istanbul” and “İstanbul” with the search string
>> “istanbul”.
>>
>>
>> måndag 1 mars 2021 kl. 07:06:59 UTC+1 skrev mesuto...@gmail.com:
>>
>>> Hi James,
>>> Thanks for your explanations. However, I wanted to explain the
>>> disadvantage of using "UPPER like"  instead of "ilike" in icontains,
>>> istartswith and iendswith. The performance problem should not be ignored.
>>>
>>> James Bennett , 1 Mar 2021 Pzt, 04:05 tarihinde
>>> şunu yazdı:
>>>
 On Sun, Feb 28, 2021 at 2:39 AM Tom Forbes  wrote:
 >
 > Thank you for the clarification! I think the biggest argument for
 this change is the fact that uppercasing Unicode can cause incorrect
 results to be returned.
 >
 > Given that we now have much better support for custom index types,
 perhaps we should change this? We need a custom expression index anyway, so
 it might not be a huge ask to say “now you should use a gin index”?

 It's worth pointing out that case mapping and transformation in
 Unicode is difficult and complex. I wrote up an intro to the problem a
 while back:

 https://www.b-list.org/weblog/2018/nov/26/case/

 One thing that's important to note is that there is no generic
 one-size-fits-all-languages option that Django can just do by default
 and get the right results. For example, a case mapping that does the
 right thing for Turkish will do the wrong thing for (to pick a random
 example) French, and vice-versa. Unicode itself provides a basic "hope
 for the best" set of default case mappings that do the right thing for
 many cased scripts, but also is clear in saying that you may need to
 use a locale-specific mapping to get what you really want.

 Postgres has the ability to configure locale, and when configured it
 does the "right thing" -- for example, when the locale is tr_TR or
 another Turkish loc

Help with ticket #28426

2021-03-03 Thread Bence Gáspár

Hi,

I would like to work on this ticket
https://code.djangoproject.com/ticket/28426. I am fairly new in the
community and to Django development as well but this ticket seems not so
Django specific and I am confident I can implement it. I have done some
research already and found that we need to implement our own prompting for
username and password. I would like to ask a couple of questions about this
ticket.

urllib.request.urlretrieve() itself can be deprecated according to docs so
I think we should move from using this function to using
urllib.request.urlopen(). As I see this should not be that hard and I would
like to do this in a separate PR if you agree.

My question is based around basic auth. I would like to write some unit
tests for it but I am not sure how to. I am not familiar with using
LiveServer tests. Does it have the ability to provide basic auth interface?

Am I right that for handling basic auth we should first try the download
without credentials, then if we get 401 we should ask the user for username
and password for the server and try with the given credentials in basic
auth?

For the credential prompts I would like to lift this function to our code:
https://docs.python.org/3/library/urllib.request.html#urllib.request.FancyURLopener.prompt_user_passwd
it is part of the FancyURLopener that is deprecated, but this function
seams clean and good code.

I hope that I found the right forum to ask my questions.

 Thanks
Bence

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CALgZer5ofD54mBhSLrRkG-zPL5KtQCw7OgM9OHP1LfNS1OpvjQ%40mail.gmail.com.

Re: About the ORM icontains operator's disadvantage on PostgreSQL performance and query results.

Help with ticket #28426

2 matches

Site Navigation

Mail list logo

Footer information