On Tue, Mar 3, 2026 at 1:01 PM Jeff Davis <[email protected]> wrote:
> On Sat, 2026-02-28 at 14:27 +0100, Daniel Verite wrote: > > I tried 0001 with a non-UTF8 database and got quickly stuck: > > Attached new versions. I moved the encoding check into the SQL-callable > casefold() function, and other callers use str_casefold(). That > slightly simplifies what happens in ILIKE, also. > > I removed the citext changes. citext has somewhat of a legacy status, I > think, so I'm not sure it makes sense to try to modernize or change it. > Also, some SQL-language functions in citext use LOWER(), so the changes > aren't enough: we'd need to make the SQL CASEFOLD function callable in > other encodings, and also run a citext upgrade script to change the > definitions. > > Note that these changes affect the result of some expressions (e.g. > ILIKE), so could theoretically make an expression index or predicate > index inconsistent. > Thanks for the patches! After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm still uses str_tolower() for trigram extraction (trgm_op.c:352 and :948). With builtin collations, these produce different results.
WIP-v3-0001-Demonstrate-inconsistency-in-gin-index-vs-seq-sca.patch-WIP
Description: Binary data
