On Wed, 2026-03-25 at 07:40 -0700, Mark Dilger wrote:
> pg_trgm appears to be lossy, with recheck logic.  I would think you
> just need to make it give answers which at least include everything
> that a regex would match, and then allow recheck to prune that down. 
> My concern is having pg_trgm give less than all the answers, so that
> after recheck you get fewer results than a seqscan would have
> returned.  Would switching to casefold be strictly broader than
> regex?

I think the precise question would be: "are there any two characters
that lowercase to the same character but do not casefold to the same
character?".

I don't have a counterexample, so perhaps using casefold would still be
fine.

Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?

> Sorry if this misses something discussed upthread.  I'm clearly
> assuming here that you don't mind that such a change necessitates a
> REINDEX. 

That's a concern. It may depend on how big the impact would be -- for
libc I don't think it would matter because lowercasing and casefolding
are the same thing.

Regards,
        Jeff Davis



Reply via email to