Package: wget Version: 1.10.2+1.11.beta1-1 Followup-For: Bug #411290 Turns out wget url-encodes bytes 128-159 (which are control characters in some 8-bit encodings). This is wrong as: 1. They are not controls in other 8-bit encodings. 2. In utf-8 this makes no sense and generates invalid utf-8 sequences.
--- src/url.c.old 2007-11-18 09:09:51.000000000 +0400 +++ src/url.c 2007-11-18 09:26:59.000000000 +0400 @@ -1261,8 +1261,8 @@ 0, 0, 0, 0, 0, 0, 0, 0, /* p q r s t u v w */ 0, 0, 0, 0, W, 0, 0, C, /* x y z { | } ~ DEL */ - C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 128-143 */ - C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, /* 144-159 */ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, P.S. As newer wget does not show the bug for non-existent urls, another url to reproduce the bug is: http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]