Dietmar Schindler wrote: > On > https://www.gnu.org/software/gnulib/manual/html_node/Collating-Elements-vs_002e-Characters.html > there are two inaccuracies: > > 1. "For example, the German collates as the collating element 's' followed by > another collating element 's'." - Here the example character is simply > missing; it should perhaps read: > "For example, the German 'ß' (small sharp s) collates as the collating > element 's' followed by another collating element 's'." > > 2. "For example, the Spanish 'll' collates after 'l' and before 'm'." - This > was true until April 1994; see https://en.wikipedia.org/wiki/Ll#Spanish.
Thanks for the reports. Fixed through the two attached patches. Regarding the second one: Instead of Spanish 'll', possible examples are (according to the collation rules in glibc): - Czech ch (see https://en.wikipedia.org/wiki/Ch_(digraph)#Czech ) - Welsh ch dd ff ng ll ph rh th - Albanian dh gj ll nj rr sh th xh zh - Uzbek g' o' sh ch - Filipono ng Bruno
From 25ce2d6b5fad4726ea1fc4d9bdf492505de0086e Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Wed, 22 Sep 2021 23:07:24 +0200 Subject: [PATCH 1/2] doc: Don't assume that the output format is TeX-based or info. Reported by Dietmar Schindler in <https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>. * doc/regex.texi (Collating Elements vs. Characters): Assume a texinfo version that groks UTF-8 encoded ISO-8859-1 characters. --- ChangeLog | 8 ++++++++ doc/regex.texi | 8 +------- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/ChangeLog b/ChangeLog index a64141820..6cdba97f8 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,11 @@ +2021-09-22 Bruno Haible <br...@clisp.org> + + doc: Don't assume that the output format is TeX-based or info. + Reported by Dietmar Schindler in + <https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>. + * doc/regex.texi (Collating Elements vs. Characters): Assume a texinfo + version that groks UTF-8 encoded ISO-8859-1 characters. + 2021-09-21 Paul Eggert <egg...@cs.ucla.edu> regex: sync with glibc diff --git a/doc/regex.texi b/doc/regex.texi index 91d6bb7b4..19a12cfa3 100644 --- a/doc/regex.texi +++ b/doc/regex.texi @@ -385,13 +385,7 @@ as a unit of collation.'' This generalizes the notion of a character in two ways. First, a single character can map into two or more collating -elements. For example, the German -@tex -``\ss'' -@end tex -@ifinfo -``es-zet'' -@end ifinfo +elements. For example, the German ``ß'' collates as the collating element @samp{s} followed by another collating element @samp{s}. Second, two or more characters can map into one collating element. For example, the Spanish @samp{ll} collates after -- 2.25.1
From 3148eb10eda7b771a08692b6165c8c5541172417 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Wed, 22 Sep 2021 23:19:22 +0200 Subject: [PATCH 2/2] doc: Fix outdated statement about Spanish collation. Reported by Dietmar Schindler in <https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>. * doc/regex.texi (Collating Elements vs. Characters): Choose another example of a digraph with special collation. --- ChangeLog | 6 ++++++ doc/regex.texi | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 6cdba97f8..1c7390625 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,11 @@ 2021-09-22 Bruno Haible <br...@clisp.org> + doc: Fix outdated statement about Spanish collation. + Reported by Dietmar Schindler in + <https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>. + * doc/regex.texi (Collating Elements vs. Characters): Choose another + example of a digraph with special collation. + doc: Don't assume that the output format is TeX-based or info. Reported by Dietmar Schindler in <https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>. diff --git a/doc/regex.texi b/doc/regex.texi index 19a12cfa3..c8a691ebc 100644 --- a/doc/regex.texi +++ b/doc/regex.texi @@ -388,8 +388,8 @@ two ways. First, a single character can map into two or more collating elements. For example, the German ``ß'' collates as the collating element @samp{s} followed by another collating element @samp{s}. Second, two or more characters can map into one -collating element. For example, the Spanish @samp{ll} collates after -@samp{l} and before @samp{m}. +collating element. For example, the Czech @samp{ch} collates after +@samp{h} and before @samp{i}. Since POSIX's ``collating element'' preserves the essential idea of a ``character,'' we use the latter, more familiar, term in this document. -- 2.25.1