Hi,

This is known problem of this program.

I know this because I use this for Japanese :-)

The work around is:
 Use html 1 page output in UTF-8.
 Convert it to plain text using w3m which is UTF-8 aware.

That works japanese and Chinese.

On Wed, Aug 08, 2012 at 03:39:55PM +0400, Sergey Alyoshin wrote:
> Package: debiandoc-sgml
> Version: 1.2.17
> 
> Text wrap is incorrect for multi-byte chars, if there is only several such
> chars in paragraph it is not very noticeable:
>                                                        79 chars wrap border 
> =>|
>      Dieses Dokument kann in jeder Form frei weiterverteilt und/oder          
> |
>      modifiziert werden, solange Änderungen klar dokumentiert werden.         
> |
>      Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä                            
> |
>      Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä                            
> |
>      Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä                                            
> |
> 
> But if nearly all chars (except spaces, and punctuation) are multy-byte,
> then is is very noticeable:
>                                                        79 chars wrap border 
> =>|
>      Этот документ является свободно                                          
> |
>      распространяемым и может быть                                            
> |
>      изменён, если ваши изменения явно                                        
> |
>      описаны.                                                                 
> |

If you check the internal code of this program, it asumes 8 bit code
such as latin-1 and also assumes words are separated by space like
English.  Its assumption is 1 byte=1 character.

w3m being Japanese centric, it may treat russian character to be double
width.  What you may be reporting may be because of this.  (Old Japanese
system used to display Russian character with double width....)

I think you may get different result with other html to plain text
conversion tool.

Osamu


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to