B> I think it is perfectly reasonable to treat all characters the same in
B> this regard. You might desire some different behavior e.g. if you edit
B> western text with variable width fonts, so 'l' takes much less space
B> than 'W', but such behavior is a feature request, don't you think?

No no no!
All Chinese characters are two ASCII characters wide upon display.
All ASCII   characters are one ASCII character  wide upon display.
That is all I am asking for.  Probably related to:
wcswidth (3) - determine columns needed for a fixed-size wide character string

The challenge: download e.g., http://jidanni.org/me/index.html .
Note how it fits nicely in your 80 column wide UTF-8 capable editor.
Now dare to run tidy -utf8 on it. Disaster. Many lines now go way off the
edge of the editor. That's why I invented dantidy, below.

B> I see a feature request that ... -raw

Forget about -raw. The problem occurs with just -utf8. LC_ALL=...
doesn't help too. Anyway, please just count each Chinese character as
two ASCII characters when determining if you have reached the wrap
column or not.

An _unrelated_ issue is how many bytes a Chinese character uses in a
file on your disk. In UTF-8 it uses 3. In big5 it uses 2. Don't get
fooled by that!

$ cat dantidy #what I am forced to use
#!/bin/sh -e
perl -C -pwe 's/[^[:ascii:]]/sprintf "\\x{%04x}",ord $&/ge'|#my uni2ascii(1)
tidy --gnu-emacs yes --indent-spaces 1 --indent auto -utf8 \
--tidy-mark no --wrap-attributes yes --wrap-script-literals yes -quiet|
perl -C -pwe 's/\\x\{([[:xdigit:]]{4})\}/chr eval "0x$1"/eg' #my ascii2uni(1)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to