On Tue, Mar 15, 2005 at 10:44:28AM +0000, Ross Paterson wrote: > On Mon, Mar 14, 2005 at 07:38:09PM -0600, John Goerzen wrote: > > I've got some gzip (and Ian Lynagh's Inflate) code that breaks under > > the new hugs with: > > > > <handle>: IO.getContents: protocol error (invalid character encoding) > > > > What is going on, and how can I fix it? > > A Haskell 98 Handle is a character stream, and doesn't support binary > I/O. This would have bitten you sooner or later on systems that do CRLF > conversion, but Hugs is now much stricter, because character streams now > use the encoding determined by the current locale (for the C locale, that > means ASCII only).
Do you have a list of functions which behave differently in the new release to how they did in the previous release? (I'm not interested in changes that will affect only whether something compiles, not how it behaves given it compiles both before and after). Simons, Malcolm, are there any such functions in the new ghc/nhc98? Also, are you all agreed that the hugs interpretation of the report is correct, and thus ghc at least is buggy in this respect? (I'm afraid I haven't been able to test nhc98 yet). Finally, the hugs behaviour seems a little odd to me. The below shows 4 cases where iconv complains when asked to convert utf8 to utf8, but hugs only gives an error in one of them. In the others it just truncates the input. Is this really correct? It also seems to behave the same for me regardless of whether I export LC_CTYPE to en_GB.UTF-8 or C. Thanks Ian printf "\x00\x7F" > inp1 printf "\x00\x80" > inp2 printf "\x00\xC4" > inp3 printf "\xFF\xFF" > inp4 printf "\xb1\x41\x00\x03\x65\x6d\x70\x74\x79\x00\x03\x00\x00\x00\x00\x00" > inp5 echo 'main = do xs <- getContents; print xs' > run.hs for i in `seq 1 5`; do runhugs run.hs < inp$i; done for i in `seq 1 5`; do runghc6 run.hs < inp$i; done for i in `seq 1 5`; do echo $i; iconv -f utf8 -t utf8 < inp$i; done which gives me the following output: $ for i in `seq 1 5`; do runhugs run.hs < inp$i; done "\NUL\DEL" "\NUL" "\NUL" "" " Program error: <stdin>: IO.getContents: protocol error (invalid character encoding) $ for i in `seq 1 5`; do runghc6 run.hs < inp$i; done "\NUL\DEL" "\NUL\128" "\NUL\196" "\255\255" "\177A\NUL\ETXempty\NUL\ETX\NUL\NUL\NUL\NUL\NUL" $ for i in `seq 1 5`; do echo $i; iconv -f utf8 -t utf8 < inp$i; done 1 2 iconv: illegal input sequence at position 1 3 iconv: incomplete character or shift sequence at end of buffer 4 iconv: illegal input sequence at position 0 5 iconv: illegal input sequence at position 0 $ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]