On Wed, Apr 30, 2014 at 12:11:22AM +0200, Axel Beckert wrote: > Control: tag -1 - moreinfo > > Hi Julian, > > thanks for the prompt feedback!
And yours! :-) > Julian Gilbey wrote: > > I've tried this in an xterm (xfce4-terminal) and in a console window > > (tty1), both with my default locale (en_GB.UTF-8) and in the C locale, > > and the same happens with all of these combinations. I'm not sure > > how I would determine the character set I'm using, though. > > Sometimes the terminal emulator lets you set this. I used an uxterm > and now also tried xfce4-terminal from Wheezy (and then ssh'ed into > the Sid machine for testing), which both use UTF-8 as character set by > default. > > Your file, at least how it arrived by mail here, contains an > ISO-Latin-1 character, which shows as circled question mark on an > UTF-8 using terminal if you just do a "cat a.html". (Can you confirm > that for your terminals?) Ah, so that is presumably why you dion't see the same as me: it was garbled in transit. I'm attaching a gzipped version; hopefully this will reach you intact: it should be UTF-8 encoded. And maybe this is what links is then doing: it is trying to interpret both bytes of the UTF-8 file separately. (In the context in which I was originally using it, the file was a MIME attachment, and the MIME headers specified the UTF-8 encoding.) So if links can handle UTF-8 encoded files, it would be very useful to also have a command-line flag to specify the encoding. Julian
a.html.gz
Description: Binary data