On Tue Nov 5, 2024 at 8:15 PM CET, onf wrote: > as the title says. If I use UTF-8 via preconv and request > .hy 2 > .hpf hyphen.cs > will that work, given that the file is using the Latin-2 encoding > for characters with diacritics? If not, what changes need to be done?
I did a little bit of testing. The hyphenation patterns work correctly with UTF-8, but ONLY if Latin2 is loaded, like so: .mso latin2.tmac and hyphenation codes must be specified, like so for Latin2 Czech: .hcode <e1> <e1> <c1> <e1> .hcode <e8> <e8> <c8> <e8> .hcode <ef> <ef> <cf> <ef> .hcode <e9> <e9> <c9> <e9> .hcode <ec> <ec> <cc> <ec> .hcode <ed> <ed> <cd> <ed> .hcode <f2> <f2> <d2> <f2> .hcode <f3> <f3> <d3> <f3> .hcode <f8> <f8> <d8> <f8> .hcode <b9> <b9> <a9> <b9> .hcode <bb> <bb> <ab> <bb> .hcode <fa> <fa> <da> <fa> .hcode <f9> <f9> <d9> <f9> .hcode <fd> <fd> <dd> <fd> .hcode <be> <be> <ae> <be> Without loading the latin2.tmac file, it doesn't hyphenate correctly. Given that latin2.tmac specifies a bunch of translations which convert Latin2 bytes into respective character codes, e.g.: .trin \[char248]\[r ah] my guess is that these translations enable the Latin2 bytes in hyphen.cs to be converted to their character counterparts, which the UTF-8 codes are converted to as well, so that in the end both input methods result in the same glyph. [Pardon my inadequate terminology.] Latin1 characters continue working even when loading Latin2 as long as they are specified as the respective UTF-8 codes. My conclusion is that, given the intricacies of all this, loading the appropriate localization file is THE way to setup hyphenation correctly. I feel like splitting the hyphenation part of localization files off (into hycs.tmac etc.) would be beneficial in that one could load the hyphenation settings for a given language without all the localization strings. Groff's documentation of hyphenation could then be updated with a simple mention of .mso hycs.tmac before specifying the technical details (.hy, .hla, .hpf, ...) which ordinary users won't need to deal with. ~ onf