Re: [Groff] unicode support - where to compose?

Werner LEMBERG Wed, 22 Feb 2006 22:43:08 -0800

> When an input file contains the character <U+1EBF>, preconv
> transforms it to \[u1EBF], and troff transforms it to a single glyph
> u0065_0302_0301.  Fine.
> 
> But when an input file contains the characters
> <U+0078><U+0302><U+0301>, preconv transforms it to
> x\[u0302]\[u0301], and troff produces three distinct glyphs x,
> u0302, u0301.  This is wrong.


Hmm.

> But should the composition be handled within preconv or within
> troff?  In other words, what should happen if the input file
> contains
> 
>                x\[u0302]\[u0301]  ?

groff doesn't do anything special yet, and I must admit that I haven't
thought about this problem.  Or rather, I've delayed it :-)

> Is groff allowed to combine these three input nodes into a single
> one?

Not yet.

> Or is there some principle in the groff input language that would
> force groff to consider these as three different units?

There isn't such a limit in the input language but in the GNU troff
engine itself.  Currently, groff only recognizes a very limited set of
ligatures (fi, ff, etc.) which can't be extended dynamically.  This
has to be fixed in the future, but it's probably something which
should be done later.

> In the first case I would put the composition into troff.

OK.  With other words, it won't be handled yet.

> In the second case into preconv (i.e. preconv would translate
> <U+0078><U+0302><U+0301> to \[x u0302 u0301] but would leave alone
> x\[u0302]\[u0301]).

This would be perfect.  If I understand you correctly, your approach
will be table-driven, this is, a combining character following a base
character will automatically be converted to the \[xxx yyy ...] form,
right?


    Werner


_______________________________________________
Groff mailing list
Groff@gnu.org
http://lists.gnu.org/mailman/listinfo/groff

Re: [Groff] unicode support - where to compose?

Reply via email to