Bug#684276: debiandoc2text: incorrect text wrap with UTF-8 multi-byte chars

Osamu Aoki Thu, 09 Aug 2012 05:24:21 -0700

Hi,

On Thu, Aug 09, 2012 at 03:28:14PM +0400, Sergey Alyoshin wrote:
> 2012/8/9 Osamu Aoki <os...@debian.org>:
> >> Do You think it would be hard to fix text wrap to count multi-byte chars?
> >
> > Generically for UTF-8 multibyte chars --> impossible.
> 
> Why, not worth the efforts?


Most of UTF-8 chars are not following simple rule as Russian nor
English.  Do you know Asian languages rarely use space to separate
words.  This is only my thought based on my limited knowledge.

If you think it is worh, please go ahead to impliment such thing.  I am
open to any useful patches.

> > Just for Russian                      --> maybe possible
> >                                           It may be easier to do this in 
> > KOR-8
> 
> I can convert utf-8 .sgml to koi8-r with iconv, then use
> 'debiandoc2text -l ru-ru.koi8-r' on it and convert resulted .txt
> to desired utf-8. Only one problem: utf-8 chars (e.g. ä and á in names) will
> be transliterated.

transliterated ? You mean changed?  That may happen.  Anyway, I do not
know russian much.  So if you want this to be fixed, you need to come up
with solution.  As long as it does not break other languages, it is
likely to be accepted.

But I really think you should think about improving Docbook-xml support
of russian.  This debiandoc* tool chain is deprecated since upstream is
dead.  I am mere Debian package maintainer.

Regards,

Osamu


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#684276: debiandoc2text: incorrect text wrap with UTF-8 multi-byte chars

Reply via email to