Hello Werner,
> it's probably something which should be done later.
Well, I'm on it now because I feel more comfortable dealing with the CJK
width and other smaller problems once the logic of decomposition and
combined characters is done right.
> Currently, groff only recognizes a very limited set of
> ligatures (fi, ff, etc.)
Ah, good example! This is already an precedent where groff combines adjacent
input nodes. When the user doesn't want the ligature, he can use \& as
a separator between the two, right? So I imagine that noone will
object if troff combines
x\[u0302]\[u0301]
into
\[u0078_0302_0301]
> > In the first case I would put the composition into troff.
>
> OK. With other words, it won't be handled yet.
>
> > In the second case into preconv (i.e. preconv would translate
> > <U+0078><U+0302><U+0301> to \[x u0302 u0301] but would leave alone
> > x\[u0302]\[u0301]).
>
> This would be perfect.
Hmm? Why do you qualify the second approach as "perfect", when the
other one is more in line with the mechanics how ligatures work?
I just wish to know which of the two is preferrable.
> If I understand you correctly, your approach
> will be table-driven, this is, a combining character following a base
> character will automatically be converted to the \[xxx yyy ...] form,
> right?
Yes, sure. The input stream of Unicode characters already tells us, through
the UnicodeData table, which characters are combining and decorate the
preceding base character.
Bruno
_______________________________________________
Groff mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/groff