Den 2021-07-03 kl. 04:50 skrev G. Branden Robinson: > It seems that the EU has standardized on "no additional > inter-sentence space" in its typography, so our Czech, German, > French, Italian, and Swedish localization files all say .ss 12 0
I've always wondered about this. Does anyone know to what extent "additional inter-sentence space" has been used in Europe prior to this? I'm personally thinking about Swedish primarily, but it would be interesting with a closer look at this with regards to any of the European languages. Den 2021-07-03 kl. 16:44 skrev G. Branden Robinson: >> - The LANG variable is considered a legacy feature, and advertising >> legacy features is usually not a good idea. Advertising a >> more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates >> the previous problem, making the user wonder whether the "_IT" >> part matters and what effect it might have, and whether ".UTF-8" >> is the right choice and if so, whether ".UTF-8" here is sufficient >> to assure correct processing of the character encoding in the >> file - which it likely isn't. The user might also wonder which >> effect, if any, the LC_TIME and LC_NUMERIC features contained >> in LC_ALL might have, and if those effects, if any, are beneficial >> or detrimental, and whether it might be better to set one of the >> other LC_* variables instead, and if so, which one. It's not >> readily apparent which of the variables to set because none of >> them are designed for the purpose. > > These are all fair points and I will chew on them, and would like > to solicit the views of others on this as well. > > The LANG point is the weakest; I highlighted it in my mail only > because it was shorter and easier to type--laziness again. I am > aware of the prescribed precedence of the POSIX locale-related > environment variables. I'd like to chip in my agreement with Ingo on this point, generally. To me, -mfr feels less opaque, less surprising and less fragile than LC_ALL/LC_CTYPE=fr_FR.UTF-8. LC_CTYPE=fr_FR.UTF-8 also seems, as Ingo says, to imply that groff will treat input as UTF-8. That's what Heirloom troff does. On the one hand, this lends some credibility to the idea of using the LC_ variables for this purpose. On the other hand, this groff ignoring the UTF-8 part of LC_CTYPE all the more surprising. If groff should continue to use LC_CTYPE to determine input language, should it not also use it to determine input encoding?