Hi Carsten, > Now the error message is "hyphenation code must be ordinary > character". So I understand that the only correct file enocding for > .hcode with umlauts is latin1 (ISO 8859-1)? Or is there any chance to > use 7-bit input like \[uXXXX]? > > $ printf ".hcode ä ä"|preconv -e utf-8|troff > > Prints error "hyphenation code must be ordinary character"
No, it looks like you're right. `info groff' says -- Request: .hcode c1 code1 [c2 code2 ...] Set the hyphenation code of character C1 to CODE1, that of C2 to CODE2, etc. A hyphenation code must be a single input character (not a special character) other than a digit or a space. To make hyphenation work, hyphenation codes must be set up. At start-up, groff only assigns hyphenation codes to the letters `a'-`z' (mapped to themselves) and to the letters `A'-`Z' (mapped to `a'-`z'); all other hyphenation codes are set to zero. Normally, hyphenation patterns contain only lowercase letters which should be applied regardless of case. In other words, the words `FOO' and `Foo' should be hyphenated exactly the same way as the word `foo' is hyphenated, and this is what `hcode' is good for. Words which contain other letters won't be hyphenated properly if the corresponding hyphenation patterns actually do contain them. For example, the following `hcode' requests are necessary to assign hyphenation codes to the letters `ÄäÖöÜüß' (this is needed for German): .hcode ä ä Ä ä .hcode ö ö Ö ö .hcode ü ü Ü ü .hcode ß ß Without those assignments, groff treats German words like `Kindergärten' (the plural form of `kindergarten') as two substrings `kinderg' and `rten' because the hyphenation code of the umlaut a is zero by default. There is a German hyphenation pattern which covers `kinder', so groff finds the hyphenation `kin-der'. The other two hyphenation points (`kin-der-gär-ten') are missed. This request is ignored if it has no parameter. So it isn't happy with the \[] that preconv is producing. $ echo .hcode ä ä | preconv -e utf-8 .lf 1 - .hcode \[u00E4] \[u00E4] $ Werner, is it a preconv bug that it doesn't produce ISO-8859-1 (latin1) output where possible, e.g. ä rather than \[u00E4], given that's groff's default input encoding? It stops it being used for .hcode. One could post-process preconv's output if \u[00..] doesn't occur without meaning a byte of that value. $ echo .hcode ä ä | > preconv -e utf-8 | > perl -pe 's/\\\[u00([\dABCDEF]{2})]/chr hex $1/ge' | > recode iso-8859-1..dump UCS2 Mne Description 002E . full stop 006C l latin small letter l 0066 f latin small letter f 0020 SP space 0031 1 digit one 0020 SP space 002D - hyphen-minus 000A LF line feed (lf) 002E . full stop 0068 h latin small letter h 0063 c latin small letter c 006F o latin small letter o 0064 d latin small letter d 0065 e latin small letter e 0020 SP space 00E4 a: latin small letter a with diaeresis 0020 SP space 00E4 a: latin small letter a with diaeresis 000A LF line feed (lf) $ Cheers, Ralph.