Re: index sorting in texi2any in C issue with spaces

Eli Zaretskii Sun, 04 Feb 2024 02:55:52 -0800

> Date: Sun, 4 Feb 2024 11:42:52 +0100
> From: pertu...@free.fr
> Cc: Gavin Smith <gavinsmith0...@gmail.com>, bug-texinfo@gnu.org
> 
> On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > I think en_US.utf-8 is (or at least can be by default) a combination
> > of @documentlanguage and @documentencoding.
> 
> I try to make the index collation as independent as possible of
> @documentencoding and output encoding.  Here the utf-8 is meant to
> provide a sorting 'independent' of the encoding.


Why is that a good idea?  Presumably, a manual whose language is
provided by @documentlanguage is indeed written in that language, and
so the collation should be according to that language?  Or what am I
missing?

If we want collation which uses only codepoints, disregarding any
collation weights defined by the Unicode TR10, we could use
en_US.utf-8, but then, as Gavin says, using glibc collation function
you get more than you asked, because weights are not ignored.  So we
need to use something else in the C variant of collation code, AFAIU.

> Regarding the language for now the aim was to have something as
> similar as the Perl output, which is obtained without a locale.  The
> choice of en_US was motivated by that aim.  I looked at the
> /usr/lib/locale/*/LC_COLLATE files on my debian GNU/Linux and there was
> no "en.utf-8", which would have been my first choice, so I used
> "en_US.utf-8".

I don't know enough about what Perl does in the module you are using.
"Obtained without a locale" means what exactly? a collation order that
only considers the Unicode codepoints of the characters?  Or does it
mean something else?  If it only considers the codepoints, then
collation in C using glibc functions will NOT produce the same order
even under en_US.utf-8, AFAIU.

Re: index sorting in texi2any in C issue with spaces

Reply via email to