On Wed, Mar 25, 2026 at 2:02 PM Jeff Davis <[email protected]> wrote:
> I think the precise question would be: "are there any two characters > that lowercase to the same character but do not casefold to the same > character?". > I don't know. I'll set up a test to iterate across all locales across all character pairs... no, I didn't find any on my system. Other searching suggests that the Turkish and Azerbaijani locale do have this characteristic, with I (U+0049) lowercasing to ı (U+0131) and case folding to i (U+0069) while ı (U+0131) lowercases to ı (U+0131) but also case folds to ı (U+0131). I have not confirmed that empirically, though. > I don't have a counterexample, so perhaps using casefold would still be > fine. > > Thoughts? Should we enhance regexes to consider more than two case > variants first, or should we proceed with some of these patches (and/or > a similar change to pg_trgm)? > I don't want to take a strong position either way. I'm still wrapping my head around the various implications of the proposed changes, and don't feel I have a complete picture yet. -- *Mark Dilger*
