Re: index sorting in texi2any in C issue with spaces

Eli Zaretskii Thu, 01 Feb 2024 22:57:26 -0800

> From: Gavin Smith <gavinsmith0...@gmail.com>
> Date: Thu, 1 Feb 2024 22:16:07 +0000
> Cc: Patrice Dumas <pertu...@free.fr>, bug-texinfo@gnu.org
> 
> On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > > From: Patrice Dumas <pertu...@free.fr>
> > > 
> > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > > > There may be language-specific "tailoring" for other languages (e.g.
> > > > Swedish) that the user may wish to use instead.  Hence, it may be
> > > > a good idea to allow use of a user-specified locale for collation 
> > > > through
> > > > the C code.
> > > 
> > > That would not be difficult to implement as a customization variable.
> > > What about COLLATION_LANGUAGE?
> > 
> > What would be the possible values of this variable, and in what format
> > will those values be specified?
> 
> I imagine it would be a locale name for passing to newlocale and thence
> to strxfrm_l.  What Patrice implemented hardcord the name "en_US.utf-8"
> but this would be a possible value.


I think en_US.utf-8 is (or at least can be by default) a combination
of @documentlanguage and @documentencoding.

> (If there are locale names on MS-Windows that are different, it would
> be fine to support them the same way, only the invocation of texi2any
> would vary to use a different locale name.)

Yes, we will need to come up with something like that.  (And yes, the
names of locales on Windows are different, and can also take several
different formats.  For example, the equivalent of en_US can be either
"English_United States" or "en-US" [with a dash, not underscore], and
there's also a numerical locale ID -- e.g. 0x409 for en_US.)

> An alternative is not to have such a variable but just to have an option
> to collate according to the user's locale.  Then the user would run e.g.
> "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> locale.  They would have to have the locale installed that was appropriate
> for whichever manual they were processing (assuming the "variable weighting"
> option is appropriate.)

What would be the default then, though?  AFAIR, we decided by default
to use en_US.utf-8 for collation, with the purpose of making the
sorting locale-independent by default, so that Info manuals produced
with the default settings are identical regardless of the user's
locale.

> It is probably not justified to provide an interface to the flags of
> CompareStringW on MS-Windows if we can't provide the same functionality
> with strcoll/strxfrm/strxfrm_l.

Agreed.  I mentioned that only for completeness, and as an
illustration of the fact that the APIs for controlling this stuff are
extremely platform-dependent, although the underlying ideas and
algorithms are the same.

> It seems not very important to provide more of these collation options
> for indices as it is not something users are complaining about.

Right.

Re: index sorting in texi2any in C issue with spaces

Reply via email to