On Wed, Feb 14, 2024 at 08:41:49PM +, Gavin Smith wrote:
> Maybe a bug in the Swedish locale (from glibc?) if it is not using
> correct collation rules for that language?
Actually, it was a bug in texi2any, it works as expected now.
--
Pat
On Tue, Feb 13, 2024 at 05:37:03PM +0100, Patrice Dumas wrote:
> On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote:
> > On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote:
> > > removed at any time. Also calling it something else, like
> > > XS_STRXFRM_COLLATION_LOCALE.
> >
>
On Tue, Feb 13, 2024 at 05:37:03PM +0100, Patrice Dumas wrote:
> As a side note, there is a test in optional tests
> other/index_collation_test_collation_locale_sv that tests
> XS_STRXFRM_COLLATION_LOCALE with sv_SE.utf8, which seems to exist on
> my debian, as /usr/lib/locale/sv_SE.utf8/LC_COLLATE
On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote:
> On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote:
> > removed at any time. Also calling it something else, like
> > XS_STRXFRM_COLLATION_LOCALE.
>
> That's fine by me, either accessing it with an obscure testing variable
>
On Mon, Feb 05, 2024 at 06:14:16PM +, Gavin Smith wrote:
> > > There wouldn't any harm in implementing it as an option. We'd have to
> > > decide if it went via strxfrm_l, Unicode::Collate::Locale, or configurable
> > > for either.
> >
> > I think that we should decide it now in order to have
On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote:
> > > However, if there is a
> > > possibility to get variable elements set to "non-ignorable" in C,
> > > possibly by using an hardcoded locale of en_US, it will not possible to
> > > get automatically both the correct and more rapid o
On Sun, Feb 04, 2024 at 03:42:22PM +, Gavin Smith wrote:
>
> COLLATION_LANGUAGE would be an argument to use
> for Unicode::Collate::Locale to get language-specific tailoring, which
> in language-independent terms means to use the UCA with tailoring, with
> variable collation elements treated a
On Sun, Feb 04, 2024 at 08:38:28PM +, Gavin Smith wrote:
> >
> > strcmp is always used as a transformation on the string is done with
> > strxfrm_l for the collation in C. If USE_UNICODE_COLLATION=0 the string
> > is not transformed, which amounts to using strcmp on the original
> > string.
On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote:
> Thanks. This is very confusing to me, then, as it is not told that way
> in perllocale, especially the section:
> https://perldoc.perl.org/perllocale#Category-LC_COLLATE%3A-Collation%3A-Text-Comparisons-and-Sorting
> There is more i
On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote:
> >offer much more powerful solutions to collation issues.
> >
> > - from "man perlop".)
>
> Thanks. This is very confusing to me, then, as it is not told that way
> in perllocale, especially the section:
> https://perldoc.p
>> (Note that "cmp" is documented not to work with "use locale" for UTF-8
>> strings: [...]
>
> Thanks. This is very confusing to me, then, as it is not told that way
> in perllocale, especially the section: [...]
Perhaps Bruno Haible can help?
Werner
On Sun, Feb 04, 2024 at 03:42:22PM +, Gavin Smith wrote:
> On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote:
> > On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote:
> > > An alternative is not to have such a variable but just to have an option
> > > to collate according to
> From: Gavin Smith
> Date: Sun, 4 Feb 2024 15:58:28 +
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
>
> On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > > An alternative is not to have such a variable but just to have an option
> > > to collate according to the user's locale.
On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > An alternative is not to have such a variable but just to have an option
> > to collate according to the user's locale. Then the user would run e.g.
> > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
>
On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote:
> On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote:
> > An alternative is not to have such a variable but just to have an option
> > to collate according to the user's locale. Then the user would run e.g.
> > "LC_COLLATE=ll_
>> An alternative is not to have such a variable but just to have an
>> option to collate according to the user's locale. Then the user
>> would run e.g. "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use
>> collation from the ll_LL.UTF-8 locale. They would have to have the
>> locale installed that
On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote:
> Here is my updated thinking on the possibilities
>
> 1) lexicographic sorting on unicode strings (corresponds to
> USE_UNICODE_COLLATION=0 currently)
> 2) unicode default sorting obtained by Unicode::
On Sun, Feb 04, 2024 at 12:07:17PM +0100, Andreas Schwab wrote:
> On Feb 04 2024, Eli Zaretskii wrote:
>
> > If we want collation which uses only codepoints, disregarding any
> > collation weights defined by the Unicode TR10, we could use
> > en_US.utf-8, but then, as Gavin says, using glibc colla
On Sun, Feb 04, 2024 at 12:55:36PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 4 Feb 2024 11:42:52 +0100
> > From: pertu...@free.fr
> > Cc: Gavin Smith , bug-texinfo@gnu.org
> >
> > On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > > I think en_US.utf-8 is (or at least can be by de
On Thu, Feb 01, 2024 at 10:16:07PM +, Gavin Smith wrote:
> An alternative is not to have such a variable but just to have an option
> to collate according to the user's locale. Then the user would run e.g.
> "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> locale.
On Feb 04 2024, Eli Zaretskii wrote:
> If we want collation which uses only codepoints, disregarding any
> collation weights defined by the Unicode TR10, we could use
> en_US.utf-8, but then, as Gavin says, using glibc collation function
> you get more than you asked, because weights are not ignor
> Date: Sun, 4 Feb 2024 11:42:52 +0100
> From: pertu...@free.fr
> Cc: Gavin Smith , bug-texinfo@gnu.org
>
> On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > I think en_US.utf-8 is (or at least can be by default) a combination
> > of @documentlanguage and @documentencoding.
>
> I
On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > From: Gavin Smith
> > Date: Thu, 1 Feb 2024 22:16:07 +
> > Cc: Patrice Dumas , bug-texinfo@gnu.org
> >
> > On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > > > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > > > Fr
> From: Gavin Smith
> Date: Thu, 1 Feb 2024 22:16:07 +
> Cc: Patrice Dumas , bug-texinfo@gnu.org
>
> On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > > From: Patrice Dumas
> > >
> > > > Moreover, en_US.utf-8 will use collation ap
On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > From: Patrice Dumas
> >
> > > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > > There may be language-specific "tailoring" for other languages (e.g.
> > > Swedis
On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > From: Patrice Dumas
> >
> > > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > > There may be language-specific "tailoring" for other languages (e.g.
> > > Swedis
> Date: Wed, 31 Jan 2024 23:11:02 +0100
> From: Patrice Dumas
>
> > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > There may be language-specific "tailoring" for other languages (e.g.
> > Swedish) that the user may wish to use instead. Hence, it may be
> > a good idea
> From: Gavin Smith
> Date: Wed, 31 Jan 2024 20:10:56 +
>
> It seems like a pretty obscure interface. It is barely
> documented - newlocale is in the Linux Man Pages but not the
> glibc manual, and strxfrm_l was only in the Posix standard
> (https://pubs.opengroup.org/onlinepubs/9699919799/f
On Wed, Jan 31, 2024 at 08:10:56PM +, Gavin Smith wrote:
> On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote:
> > Hello,
> >
> > I implemented index sorting in C with XS interface in texi2any.
> > When unicode collation is wanted, based on my understanding of
> > Eli suggestions, a
On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote:
> Hello,
>
> I implemented index sorting in C with XS interface in texi2any.
> When unicode collation is wanted, based on my understanding of
> Eli suggestions, a collation locale is set to "en_US.utf-8", by
> newlocale (LC_COLLATE_M
Hello,
I implemented index sorting in C with XS interface in texi2any.
When unicode collation is wanted, based on my understanding of
Eli suggestions, a collation locale is set to "en_US.utf-8", by
newlocale (LC_COLLATE_MASK, "en_US.utf-8", 0)
and then strxfrm_l is used (which should be the same
31 matches
Mail list logo