Dietmar Schindler wrote:
> On 
> https://www.gnu.org/software/gnulib/manual/html_node/Collating-Elements-vs_002e-Characters.html
>  there are two inaccuracies:
> 
> 1. "For example, the German collates as the collating element 's' followed by 
> another collating element 's'." - Here the example character is simply 
> missing; it should perhaps read:
>    "For example, the German 'ß' (small sharp s) collates as the collating 
> element 's' followed by another collating element 's'."
> 
> 2. "For example, the Spanish 'll' collates after 'l' and before 'm'." - This 
> was true until April 1994; see https://en.wikipedia.org/wiki/Ll#Spanish.

Thanks for the reports. Fixed through the two attached patches.

Regarding the second one: Instead of Spanish 'll', possible examples are
(according to the collation rules in glibc):
- Czech    ch (see https://en.wikipedia.org/wiki/Ch_(digraph)#Czech )
- Welsh    ch dd ff ng ll ph rh th
- Albanian dh gj ll nj rr sh th xh zh
- Uzbek    g' o' sh ch
- Filipono ng

Bruno

From 25ce2d6b5fad4726ea1fc4d9bdf492505de0086e Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Wed, 22 Sep 2021 23:07:24 +0200
Subject: [PATCH 1/2] doc: Don't assume that the output format is TeX-based or
 info.

Reported by Dietmar Schindler in
<https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>.

* doc/regex.texi (Collating Elements vs. Characters): Assume a texinfo
version that groks UTF-8 encoded ISO-8859-1 characters.
---
 ChangeLog      | 8 ++++++++
 doc/regex.texi | 8 +-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a64141820..6cdba97f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2021-09-22  Bruno Haible  <br...@clisp.org>
+
+	doc: Don't assume that the output format is TeX-based or info.
+	Reported by Dietmar Schindler in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>.
+	* doc/regex.texi (Collating Elements vs. Characters): Assume a texinfo
+	version that groks UTF-8 encoded ISO-8859-1 characters.
+
 2021-09-21  Paul Eggert  <egg...@cs.ucla.edu>
 
 	regex: sync with glibc
diff --git a/doc/regex.texi b/doc/regex.texi
index 91d6bb7b4..19a12cfa3 100644
--- a/doc/regex.texi
+++ b/doc/regex.texi
@@ -385,13 +385,7 @@ as a unit of collation.''
 
 This generalizes the notion of a character in
 two ways.  First, a single character can map into two or more collating
-elements.  For example, the German
-@tex
-``\ss''
-@end tex
-@ifinfo
-``es-zet''
-@end ifinfo
+elements.  For example, the German ``ß''
 collates as the collating element @samp{s} followed by another collating
 element @samp{s}.  Second, two or more characters can map into one
 collating element.  For example, the Spanish @samp{ll} collates after
-- 
2.25.1

From 3148eb10eda7b771a08692b6165c8c5541172417 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Wed, 22 Sep 2021 23:19:22 +0200
Subject: [PATCH 2/2] doc: Fix outdated statement about Spanish collation.

Reported by Dietmar Schindler in
<https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>.

* doc/regex.texi (Collating Elements vs. Characters): Choose another
example of a digraph with special collation.
---
 ChangeLog      | 6 ++++++
 doc/regex.texi | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6cdba97f8..1c7390625 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2021-09-22  Bruno Haible  <br...@clisp.org>
 
+	doc: Fix outdated statement about Spanish collation.
+	Reported by Dietmar Schindler in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>.
+	* doc/regex.texi (Collating Elements vs. Characters): Choose another
+	example of a digraph with special collation.
+
 	doc: Don't assume that the output format is TeX-based or info.
 	Reported by Dietmar Schindler in
 	<https://lists.gnu.org/archive/html/bug-gnulib/2021-09/msg00095.html>.
diff --git a/doc/regex.texi b/doc/regex.texi
index 19a12cfa3..c8a691ebc 100644
--- a/doc/regex.texi
+++ b/doc/regex.texi
@@ -388,8 +388,8 @@ two ways.  First, a single character can map into two or more collating
 elements.  For example, the German ``ß''
 collates as the collating element @samp{s} followed by another collating
 element @samp{s}.  Second, two or more characters can map into one
-collating element.  For example, the Spanish @samp{ll} collates after
-@samp{l} and before @samp{m}.
+collating element.  For example, the Czech @samp{ch} collates after
+@samp{h} and before @samp{i}.
 
 Since POSIX's ``collating element'' preserves the essential idea of
 a ``character,'' we use the latter, more familiar, term in this document.
-- 
2.25.1

Reply via email to