tags 418058 - unreproducible retitle 418058 iconv: half-smart on ascii compatible code conversion (shift-jis) thanks
Let me update with better data since original report was contaminated with other bugs such as groff. (Thanks Aurelien Jarno to checking them.) Bug: The \ and ~ (ascii 92 126)are not handled right by iconv under SHIFT-JIS (SJIS). The conversion error of iconv itself over printable 7 bit character was tested with attached script with its result in diff.txt. The conversion error also occurs on GB for # and ~ (ascii 35 126). Please ask Chinese speaking people for GB situation. But I am almost certain this is quite likely bug too. NB: As for Japanese, I remember EUC-JP used to have similar problem. Rationale: iconv should not to smart guessing for 7 bit section of each traditional encodings which was ASCII compatible. They should be same in that 7 bit section. For all popular C/perl/shell/... programs written originally in shift-jis should not break if iconv is used to convert them in UTF-8. Details: For shift-jis, iconv tries to map character 0x5c to UTF-8 YEN mark. That mapping to UTF-8 YEN mark should be done frim the yen mark code in 16bit (full width character section) and not for this 7 bit one hich is 0x5c. This is very bad for any program. Another issue is 0x7e '~'. This is translated to upper bar. Although some Japanese old PC (pre-IBM compatible, NEC 98 ans Sharp MZ machines which used to run IBM-incompatible MS-DOS, I think) had upper bar shaped font for ~ and keyboard, converting this ~ in data to UTF-8 upper bar breaks URLs data stored on shift-jis machines. These cosmetic differenceis were just font difference. The code point should not be moved for these. Osamu
ascii.tar.gz
Description: Binary data