Re: Use CASEFOLD() internally rather than LOWER()

Mark Dilger Wed, 25 Mar 2026 17:01:56 -0700

On Wed, Mar 25, 2026 at 2:02 PM Jeff Davis <[email protected]> wrote:


> I think the precise question would be: "are there any two characters
> that lowercase to the same character but do not casefold to the same
> character?".
>

I don't know.  I'll set up a test to iterate across all locales across all
character pairs... no, I didn't find any on my system.  Other searching
suggests that the Turkish and Azerbaijani locale do have this
characteristic, with I (U+0049) lowercasing to ı (U+0131) and case folding
to i (U+0069) while ı (U+0131) lowercases to ı (U+0131) but also case folds
to ı (U+0131).  I have not confirmed that empirically, though.


> I don't have a counterexample, so perhaps using casefold would still be
> fine.
>
> Thoughts? Should we enhance regexes to consider more than two case
> variants first, or should we proceed with some of these patches (and/or
> a similar change to pg_trgm)?
>

I don't want to take a strong position either way.  I'm still wrapping my
head around the various implications of the proposed changes, and don't
feel I have a complete picture yet.

-- 

*Mark Dilger*

Re: Use CASEFOLD() internally rather than LOWER()

Reply via email to