Consider collation when proving subquery uniqueness

rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality
operator from each join clause to query_is_distinct_for(), discarding
the operator's input collation.  query_is_distinct_for() then verified
opfamily compatibility but never checked collations, so a DISTINCT /
GROUP BY / set-op operating under one collation was trusted to prove
uniqueness for a comparison performed under an unrelated collation.
As with the recent fix in relation_has_unique_index_for(), this is
unsound for nondeterministic collations and yields wrong query results
in any optimization that consumes the proof.

Fix by carrying each clause's operator input collation into
query_is_distinct_for() and validating it at every check-site against
the subquery target expression's collation.

Back-patch to all supported branches.  query_is_distinct_for() is
declared in an installed header, so on stable branches the existing
two-list signature is retained as a thin wrapper that forwards to a
new collation-aware entry point; external callers continue to receive
the historical collation-blind answer.

Author: Richard Guo <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: 
https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8zoa5q4coi8zuvedsbk...@mail.gmail.com
Backpatch-through: 14

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/574581b50ac9c63dd9e4abebb731a3b67e5b50f6

Modified Files
--------------
src/backend/optimizer/plan/analyzejoins.c      | 121 ++++++++++-------
src/include/nodes/pathnodes.h                  |  14 ++
src/include/optimizer/planmain.h               |   2 +-
src/test/regress/expected/collate.icu.utf8.out | 181 +++++++++++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql      |  58 ++++++++
src/tools/pgindent/typedefs.list               |   1 +
6 files changed, 324 insertions(+), 53 deletions(-)

Reply via email to