A user of the Debian package of pymsnt reported the following bug in
pymsnt:

        [2007-08-03 18:15:00] INFO :: [EMAIL PROTECTED] ::  :: 
contactPersonalChanged :: msn.msnw.NotificationClient :: {'personal': 'In 
qualche caso, la distinzione tra un ammasso globulare ed uno galattico pu\xf2 
non risultare del tutto immediata:', 'self': 'instance', 'userHandle': '[EMAIL 
PROTECTED]'}
        ...
        File "/usr/share/pymsnt/src/legacy/msn/msnw.py", line 447, in 
contactPersonalChanged
            self.factory.msncon.contactStatusChanged(userHandle)
          File "/usr/share/pymsnt/src/legacy/glue.py", line 486, in 
contactStatusChanged
            status = msnContact.personal.decode("utf-8")
          File "encodings/utf_8.py", line 16, in decode

Note that the contact's status message contains a byte with the value
0xf2. This is not a valid UTF-8 sequence, but if the string is
interpreted as being in latin1 (or a variant like windows-1252) then it
corresponds with the ò character.

I did some testing myself and found that a contact of mine who uses
version 8.1 of the official MSN client on Windows, who put an ò
character in his status message did not trigger this exception for me.
But perhaps the bug reporter's contact was using a different version of
the client, that uses latin1 instead.

The bug submitter suggested changing line 486 of glue.py to read:

        status = msnContact.personal.decode ('utf-8', 'replace')

Which seems reasonable. Another option is to try something like:

        status = None
        for e in ('utf-8', 'windows-1252'):
                try:
                        status = msnContact.personal.decode (e)
                except UnicodeDecodeError:
                        continue
        if status == None: status = msnContact.personal.decode ('utf-8', 
'replace')

i.e., try a sequence of known-possible encodings, and fall back to utf-8
in replace mode if the all fail.

Further details may be found at <http://bugs.debian.org/435853>. Please
keep [EMAIL PROTECTED] CC'd in replies so that the
messages go back to the Debian bug tracking system.

-- 
Sam Morris
http://robots.org.uk/

PGP key id 1024D/5EA01078
3412 EA18 1277 354B 991B  C869 B219 7FDB 5EA0 1078

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to