> From: Gavin Smith <gavinsmith0...@gmail.com> > Date: Thu, 1 Feb 2024 22:16:07 +0000 > Cc: Patrice Dumas <pertu...@free.fr>, bug-texinfo@gnu.org > > On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote: > > > Date: Wed, 31 Jan 2024 23:11:02 +0100 > > > From: Patrice Dumas <pertu...@free.fr> > > > > > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English. > > > > There may be language-specific "tailoring" for other languages (e.g. > > > > Swedish) that the user may wish to use instead. Hence, it may be > > > > a good idea to allow use of a user-specified locale for collation > > > > through > > > > the C code. > > > > > > That would not be difficult to implement as a customization variable. > > > What about COLLATION_LANGUAGE? > > > > What would be the possible values of this variable, and in what format > > will those values be specified? > > I imagine it would be a locale name for passing to newlocale and thence > to strxfrm_l. What Patrice implemented hardcord the name "en_US.utf-8" > but this would be a possible value.
I think en_US.utf-8 is (or at least can be by default) a combination of @documentlanguage and @documentencoding. > (If there are locale names on MS-Windows that are different, it would > be fine to support them the same way, only the invocation of texi2any > would vary to use a different locale name.) Yes, we will need to come up with something like that. (And yes, the names of locales on Windows are different, and can also take several different formats. For example, the equivalent of en_US can be either "English_United States" or "en-US" [with a dash, not underscore], and there's also a numerical locale ID -- e.g. 0x409 for en_US.) > An alternative is not to have such a variable but just to have an option > to collate according to the user's locale. Then the user would run e.g. > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8 > locale. They would have to have the locale installed that was appropriate > for whichever manual they were processing (assuming the "variable weighting" > option is appropriate.) What would be the default then, though? AFAIR, we decided by default to use en_US.utf-8 for collation, with the purpose of making the sorting locale-independent by default, so that Info manuals produced with the default settings are identical regardless of the user's locale. > It is probably not justified to provide an interface to the flags of > CompareStringW on MS-Windows if we can't provide the same functionality > with strcoll/strxfrm/strxfrm_l. Agreed. I mentioned that only for completeness, and as an illustration of the fact that the APIs for controlling this stuff are extremely platform-dependent, although the underlying ideas and algorithms are the same. > It seems not very important to provide more of these collation options > for indices as it is not something users are complaining about. Right.