Re: index sorting in texi2any in C issue with spaces

2024-02-15 Thread Patrice Dumas
On Wed, Feb 14, 2024 at 08:41:49PM +, Gavin Smith wrote: > Maybe a bug in the Swedish locale (from glibc?) if it is not using > correct collation rules for that language? Actually, it was a bug in texi2any, it works as expected now. -- Pat

Re: index sorting in texi2any in C issue with spaces

2024-02-14 Thread Gavin Smith
On Tue, Feb 13, 2024 at 05:37:03PM +0100, Patrice Dumas wrote: > On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote: > > On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote: > > > removed at any time. Also calling it something else, like > > > XS_STRXFRM_COLLATION_LOCALE. > > >

Re: index sorting in texi2any in C issue with spaces

2024-02-14 Thread Patrice Dumas
On Tue, Feb 13, 2024 at 05:37:03PM +0100, Patrice Dumas wrote: > As a side note, there is a test in optional tests > other/index_collation_test_collation_locale_sv that tests > XS_STRXFRM_COLLATION_LOCALE with sv_SE.utf8, which seems to exist on > my debian, as /usr/lib/locale/sv_SE.utf8/LC_COLLATE

Re: index sorting in texi2any in C issue with spaces

2024-02-13 Thread Patrice Dumas
On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote: > On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote: > > removed at any time. Also calling it something else, like > > XS_STRXFRM_COLLATION_LOCALE. > > That's fine by me, either accessing it with an obscure testing variable >

Re: index sorting in texi2any in C issue with spaces

2024-02-06 Thread Patrice Dumas
On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote: > > > There wouldn't any harm in implementing it as an option. We'd have to > > > decide if it went via strxfrm_l, Unicode::Collate::Locale, or configurable > > > for either. > > > > I think that we should decide it now in order to have

Re: index sorting in texi2any in C issue with spaces

2024-02-05 Thread Gavin Smith
On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote: > > > However, if there is a > > > possibility to get variable elements set to "non-ignorable" in C, > > > possibly by using an hardcoded locale of en_US, it will not possible to > > > get automatically both the correct and more rapid o

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Patrice Dumas
On Sun, Feb 04, 2024 at 03:42:22PM +, Gavin Smith wrote: > > COLLATION_LANGUAGE would be an argument to use > for Unicode::Collate::Locale to get language-specific tailoring, which > in language-independent terms means to use the UCA with tailoring, with > variable collation elements treated a

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Patrice Dumas
On Sun, Feb 04, 2024 at 08:38:28PM +, Gavin Smith wrote: > > > > strcmp is always used as a transformation on the string is done with > > strxfrm_l for the collation in C. If USE_UNICODE_COLLATION=0 the string > > is not transformed, which amounts to using strcmp on the original > > string.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith
On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote: > Thanks. This is very confusing to me, then, as it is not told that way > in perllocale, especially the section: > https://perldoc.perl.org/perllocale#Category-LC_COLLATE%3A-Collation%3A-Text-Comparisons-and-Sorting > There is more i

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith
On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote: > >offer much more powerful solutions to collation issues. > > > > - from "man perlop".) > > Thanks. This is very confusing to me, then, as it is not told that way > in perllocale, especially the section: > https://perldoc.p

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Werner LEMBERG
>> (Note that "cmp" is documented not to work with "use locale" for UTF-8 >> strings: [...] > > Thanks. This is very confusing to me, then, as it is not told that way > in perllocale, especially the section: [...] Perhaps Bruno Haible can help? Werner

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Patrice Dumas
On Sun, Feb 04, 2024 at 03:42:22PM +, Gavin Smith wrote: > On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote: > > On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote: > > > An alternative is not to have such a variable but just to have an option > > > to collate according to

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Eli Zaretskii
> From: Gavin Smith > Date: Sun, 4 Feb 2024 15:58:28 + > Cc: pertu...@free.fr, bug-texinfo@gnu.org > > On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote: > > > An alternative is not to have such a variable but just to have an option > > > to collate according to the user's locale.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith
On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote: > > An alternative is not to have such a variable but just to have an option > > to collate according to the user's locale. Then the user would run e.g. > > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8 >

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith
On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote: > On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote: > > An alternative is not to have such a variable but just to have an option > > to collate according to the user's locale. Then the user would run e.g. > > "LC_COLLATE=ll_

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Werner LEMBERG
>> An alternative is not to have such a variable but just to have an >> option to collate according to the user's locale. Then the user >> would run e.g. "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use >> collation from the ll_LL.UTF-8 locale. They would have to have the >> locale installed that

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Patrice Dumas
On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote: > Here is my updated thinking on the possibilities > > 1) lexicographic sorting on unicode strings (corresponds to > USE_UNICODE_COLLATION=0 currently) > 2) unicode default sorting obtained by Unicode::

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread pertusus
On Sun, Feb 04, 2024 at 12:07:17PM +0100, Andreas Schwab wrote: > On Feb 04 2024, Eli Zaretskii wrote: > > > If we want collation which uses only codepoints, disregarding any > > collation weights defined by the Unicode TR10, we could use > > en_US.utf-8, but then, as Gavin says, using glibc colla

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread pertusus
On Sun, Feb 04, 2024 at 12:55:36PM +0200, Eli Zaretskii wrote: > > Date: Sun, 4 Feb 2024 11:42:52 +0100 > > From: pertu...@free.fr > > Cc: Gavin Smith , bug-texinfo@gnu.org > > > > On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote: > > > I think en_US.utf-8 is (or at least can be by de

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Patrice Dumas
On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote: > An alternative is not to have such a variable but just to have an option > to collate according to the user's locale. Then the user would run e.g. > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8 > locale.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Andreas Schwab
On Feb 04 2024, Eli Zaretskii wrote: > If we want collation which uses only codepoints, disregarding any > collation weights defined by the Unicode TR10, we could use > en_US.utf-8, but then, as Gavin says, using glibc collation function > you get more than you asked, because weights are not ignor

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Eli Zaretskii
> Date: Sun, 4 Feb 2024 11:42:52 +0100 > From: pertu...@free.fr > Cc: Gavin Smith , bug-texinfo@gnu.org > > On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote: > > I think en_US.utf-8 is (or at least can be by default) a combination > > of @documentlanguage and @documentencoding. > > I

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread pertusus
On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote: > > From: Gavin Smith > > Date: Thu, 1 Feb 2024 22:16:07 + > > Cc: Patrice Dumas , bug-texinfo@gnu.org > > > > On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote: > > > > Date: Wed, 31 Jan 2024 23:11:02 +0100 > > > > Fr

Re: index sorting in texi2any in C issue with spaces

2024-02-01 Thread Eli Zaretskii
> From: Gavin Smith > Date: Thu, 1 Feb 2024 22:16:07 + > Cc: Patrice Dumas , bug-texinfo@gnu.org > > On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote: > > > Date: Wed, 31 Jan 2024 23:11:02 +0100 > > > From: Patrice Dumas > > > > > > > Moreover, en_US.utf-8 will use collation ap

Re: index sorting in texi2any in C issue with spaces

2024-02-01 Thread Gavin Smith
On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote: > > Date: Wed, 31 Jan 2024 23:11:02 +0100 > > From: Patrice Dumas > > > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English. > > > There may be language-specific "tailoring" for other languages (e.g. > > > Swedis

Re: index sorting in texi2any in C issue with spaces

2024-02-01 Thread Patrice Dumas
On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote: > > Date: Wed, 31 Jan 2024 23:11:02 +0100 > > From: Patrice Dumas > > > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English. > > > There may be language-specific "tailoring" for other languages (e.g. > > > Swedis

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Eli Zaretskii
> Date: Wed, 31 Jan 2024 23:11:02 +0100 > From: Patrice Dumas > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English. > > There may be language-specific "tailoring" for other languages (e.g. > > Swedish) that the user may wish to use instead. Hence, it may be > > a good idea

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Eli Zaretskii
> From: Gavin Smith > Date: Wed, 31 Jan 2024 20:10:56 + > > It seems like a pretty obscure interface. It is barely > documented - newlocale is in the Linux Man Pages but not the > glibc manual, and strxfrm_l was only in the Posix standard > (https://pubs.opengroup.org/onlinepubs/9699919799/f

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Patrice Dumas
On Wed, Jan 31, 2024 at 08:10:56PM +, Gavin Smith wrote: > On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote: > > Hello, > > > > I implemented index sorting in C with XS interface in texi2any. > > When unicode collation is wanted, based on my understanding of > > Eli suggestions, a

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Gavin Smith
On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote: > Hello, > > I implemented index sorting in C with XS interface in texi2any. > When unicode collation is wanted, based on my understanding of > Eli suggestions, a collation locale is set to "en_US.utf-8", by > newlocale (LC_COLLATE_M

index sorting in texi2any in C issue with spaces

2024-01-31 Thread Patrice Dumas
Hello, I implemented index sorting in C with XS interface in texi2any. When unicode collation is wanted, based on my understanding of Eli suggestions, a collation locale is set to "en_US.utf-8", by newlocale (LC_COLLATE_MASK, "en_US.utf-8", 0) and then strxfrm_l is used (which should be the same