tidy -utf8 -raw -q will change (qp-encoded bytes:)
< =E8=A9=B1=E5=B1=80=E7=A2=BC=E6=A0=B8=E9=85=8D=E7=
> =E8=A9=B1=E5=B1=80=E7=A2=BC=E6 =B8=E9=85=8D=E7=8F=
I.e., changing 0xA0 to SPC., and probably just in special
parts of certain HTML files, just to be stealthy :-(

And tidy -utf8 miscalculates how wide Chinese Unicode characters are.
The are three bytes wide, but that doesn't matter.
What matters is that they are two characters wide on the screen.
perl -C -nwe '$l=$_; s/[^[:ascii:]]/12/g; print $l if length>80'
finds lines of with Chinese Unicode characters that would wrap.
tidy probably thinks that Chinese Unicode characters are like some
other Unicode characters, just as wide as ASCII on the screen or
something. Or maybe it thinks in big5, two bytes cause two ASCII
widths on the screen, so why not for UTF-8 too or something.

Anyway, it seems tidy must not be involved in my site upgrade from big5
to utf-8, nor afterward for maintaining my utf-8 site, unless somebody
buys me a 9999 character wide terminal.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to