Hi Branden, G. Branden Robinson wrote on Sat, Jul 03, 2021 at 12:50:07PM +1000:
[ autodetection ] > Important to note here--it doesn't. groff doesn't detect this--it has > to be told. Which is a good thing. Even when a document contains only a single language, detecting it automatically may not be reliable. Even if a document contains mostly text in one language, that doesn't imply the author designed it for use with that language's macro set. Relying on a specific macro set is a choice by the author of a document and has to be treated as such. > I revamped groff input localization a few months ago. It occurred to me > that the mechanism groff had innovated for this purpose (specify options > like -mfr for French) was duplicative of an existing and much more > widely understood infrastructure for tackling such issues: locale(7). That's not duplication at all but a totally different topic which has almost nothing to do with what we are talking about. The locale(7) system is a systems for users to specify user preferences, for example which character set and encoding they want to use *when interacting with programs* and which language they want programs to use when displaying messages and when parsing user input. That is not at all related to which macro set a document author decided to use for a document that the user wishes to process. For example, i almost always work with an en_US.UTF-8 locale with some exceptions for low-level work where is use the POSIX locale instead. But that doesn't mean that i never want to process French or German documents. Yes, setting a fake locale when calling a program is possible, so a *workaround* does exist, even though it certainly feels awkward. Besides, this is a bad trap. Why should any user expect that whatever locale they may have set according to their personal preferences silently cripples formatting of documents they process, and that they have to go an extra mile for modifying the locale in the environment of their formatting commands? > I have anticipated, but not yet heard, a protest The reason you didn't is trivial: i missed your change... :-( > along the lines that just because a (for instance) French document > is being typeset, the user might not want to change their locale > to begin with "fr". You have this argument backwards. I don't think "let's allow users to be lazy" is a good argument. Instead, my point would be that you are abusing the locale system for the wrong purpose. > C. Instead of saying something like "groff -mit", we can use a standard > environment variable to assert the locale. For groff's purposes, > simply "LANG=it" will suffice. How is "LANG=it groff" better than "groff -mit"? It is not shorter nor clearer. I can easily tell you how it is worse. - There is a risk that it inadvertently creeps in from the user's environment even if the user never intended to set it. - The roff ecosystem is famous for using pipelines, and making sure that in a pipeline, the right programs run with the right environment variables can be tricky and error-prone, whereas setting command line options on programs in a pipeline is easy and reliable. - There is a risk that the environment variables habe undesirable and unintended side effects on some programs in the pipeline because not all programs run in a roff pipeline must necessarily be programs distributed with the respective core roff package. - The LC_ variables are unreasonably powerful for this purpose because they have never been designed for it. The only decision needed here is whether to run a macro package, and which one, whereas the LC_ variables carry much more information. Accepting and parsing irrelevant information and requiring needlessly complicated syntax both cause complexity, which in general increases the risk of both user confusion and program misbehaviour and bugs. - The LANG variable is considered a legacy feature, and advertising legacy features is usually not a good idea. Advertising a more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates the previous problem, making the user wonder whether the "_IT" part matters and what effect it might have, and whether ".UTF-8" is the right choice and if so, whether ".UTF-8" here is sufficient to assure correct processing of the character encoding in the file - which it likely isn't. The user might also wonder which effect, if any, the LC_TIME and LC_NUMERIC features contained in LC_ALL might have, and if those effects, if any, are beneficial or detrimental, and whether it might be better to set one of the other LC_* variables instead, and if so, which one. It's not readily apparent which of the variables to set because none of them are designed for the purpose. This is not an outright request of a revert, but an invitation to reconsider whether this is really a useful and desirable change. Yours, Ingo