Package: wget
Version: 1.10.2+1.11.beta1-1
Followup-For: Bug #411290

Turns out wget url-encodes bytes 128-159 (which are control characters
in some 8-bit encodings). This is wrong as:
1. They are not controls in other 8-bit encodings.
2. In utf-8 this makes no sense and generates invalid utf-8 sequences.

--- src/url.c.old       2007-11-18 09:09:51.000000000 +0400
+++ src/url.c   2007-11-18 09:26:59.000000000 +0400
@@ -1261,8 +1261,8 @@
   0,  0,  0,  0,   0,  0,  0,  0,   /* p   q   r   s    t   u   v   w   */
   0,  0,  0,  0,   W,  0,  0,  C,   /* x   y   z   {    |   }   ~   DEL */
 
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 128-143 */
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 144-159 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
 


P.S. As newer wget does not show the bug for non-existent urls, another
url to reproduce the bug is:
http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to