Re: pgsql: Perform provider-specific initialization in new functions.

Jeff Davis Mon, 13 Apr 2026 14:08:33 -0700

On Sat, 2026-03-07 at 16:36 +0500, Andrey Borodin wrote:
> >  1. Your fix addresses it, and would also add some safety against
> > other edge cases we haven't caught yet. The only time it would take
> > effect is for very early initialization
> 
> Maybe let's sprinkle with asserts like "I'm walsender"? Or at least
> "I'm not a user backend"?


That seems like excessive coupling that's hard to explain.

> > , but there is nonzero risk of
> > inconsistency because the same value would get a different hash
> > before
> > and after CheckMyDatabase().
> 
> Sounds scary, actually. I heard of several corruptions that started
> with
> bogus cache entries.

Yeah, I'd prefer not take this approach.

> >  2. We could hardcode texthashfunc() to use C_COLLATION_OID. That
> > wouldn't match the column collation, but it would avoid the crash,
> > and
> > might technically still be fine: the default collation is always
> > deterministic, and all deterministic collations have the same
> > equality
> > semantics as "C". Even if the proper hashtext() is used somewhere
> > else,
> > then it uses "C" hashing semantics for all deterministic
> > collations.
> > The problem here is that we'd like to allow the default collation
> > to be
> > nondeterministic in the future (Peter has mentioned this a few
> > times),
> > so relying on this assumption is fragile. 

Attached. I think this is the least-invasive patch to apply to master
now, because it doesn't change any assumptions.

The assumption "any deterministic collation will do" is still the same,
it just chooses C_COLLATION_OID rather than DEFAULT_COLLATION_OID. That
has two benefits:

1. Fixes your issue, because C_COLLATION_OID is always available.
2. Faster than a default collation based on libc or ICU.

Note that you may still have other problems trying to do interesting
things before CheckMyDatabase(), so I'm not necessarily endorsing that,
but this patch seems good regardless.

> 
I don't see a reason to backport this, but if someone else does then I
could be convinced.

Thoughts?

Regards,
        Jeff Davis

From 193a85477460fb1c46b60d39a78ac9d849f58c72 Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Mon, 13 Apr 2026 12:09:40 -0700
Subject: [PATCH v1] catcache.c: always use C_COLLATION_OID.

Previously, texthashfast/texteqfast used DEFAULT_COLLATION_OID. As the
comments stated, that was arbitrary anyway -- if the collation
actually mattered, it should use the column's actual collation. (In
the catalog, some text columns are the default collation and some are
"C".)

When any deterministic collation will do, it's best to consistently
use the simplest and fastest one, so this commit chooses
C_COLLATION_OID.

The original report was to allow the catalog cache to be used in early
code paths before CheckMyDatabase(), such as GUC processing in the
walsender. Using C_COLLATION_OID solves that problem as well, because
it's always available.

Reported-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
 src/backend/utils/cache/catcache.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 87ed5506460..a8e7bf649d2 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -205,6 +205,10 @@ nameeqfast(Datum a, Datum b)
 	char	   *ca = NameStr(*DatumGetName(a));
 	char	   *cb = NameStr(*DatumGetName(b));
 
+	/*
+	 * Catalogs only use deterministic collations, so ignore column collation
+	 * and use fast path.
+	 */
 	return strncmp(ca, cb, NAMEDATALEN) == 0;
 }
 
@@ -213,6 +217,10 @@ namehashfast(Datum datum)
 {
 	char	   *key = NameStr(*DatumGetName(datum));
 
+	/*
+	 * Catalogs only use deterministic collations, so ignore column collation
+	 * and use fast path.
+	 */
 	return hash_bytes((unsigned char *) key, strlen(key));
 }
 
@@ -244,17 +252,20 @@ static bool
 texteqfast(Datum a, Datum b)
 {
 	/*
-	 * The use of DEFAULT_COLLATION_OID is fairly arbitrary here.  We just
-	 * want to take the fast "deterministic" path in texteq().
+	 * Catalogs only use deterministic collations, so ignore column collation
+	 * and use "C" locale for efficiency.
 	 */
-	return DatumGetBool(DirectFunctionCall2Coll(texteq, DEFAULT_COLLATION_OID, a, b));
+	return DatumGetBool(DirectFunctionCall2Coll(texteq, C_COLLATION_OID, a, b));
 }
 
 static uint32
 texthashfast(Datum datum)
 {
-	/* analogously here as in texteqfast() */
-	return DatumGetInt32(DirectFunctionCall1Coll(hashtext, DEFAULT_COLLATION_OID, datum));
+	/*
+	 * Catalogs only use deterministic collations, so ignore column collation
+	 * and use "C" locale for efficiency.
+	 */
+	return DatumGetInt32(DirectFunctionCall1Coll(hashtext, C_COLLATION_OID, datum));
 }
 
 static bool
-- 
2.43.0

Re: pgsql: Perform provider-specific initialization in new functions.

Reply via email to