[issue17505] email.header.Header.unicode does not decode header

Hrvoje Nikšić Thu, 21 Mar 2013 00:46:59 -0700

New submission from Hrvoje Nikšić:

The __unicode__ method is documented to "return the header as a Unicode 
string". For this to be useful, I would expect it to decode a string such as 
"=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=" into a Unicode string that can be displayed 
to the user, in this case u'\u8fd9\u662f\u4e2d\u6587\u6d4b\u8bd5\uff01'.


However, unicode(header) returns the not so useful 
u"=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=". Looking at the code of __unicode__, it 
appears that the code does attempt to decode the header into Unicode, but this 
fails for Headers initialized from a single MIME-quoted string, as is done by 
the message parser. In other words, __unicode__ is failing to call 
decode_header.

Here is a minimal example demonstrating the problem:

>>> msg = email.message_from_string('Subject: 
>>> =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\nfoo\n')
>>> unicode(msg['subject'])
u'=?gb2312?b?1eLKx9bQzsSy4srUo6E=?='

Expected output of the last line:
u'\u8fd9\u662f\u4e2d\u6587\u6d4b\u8bd5\uff01'

To get the fully decoded Unicode string, one must use something like:
>>> u''.join(unicode(s, c) for s, c in 
>>> email.header.decode_header(msg['subject']))

which is unintuitive and hard to teach to new users of the email package. (And 
looking at the source of __unicode__ it's not even obvious that it's correct — 
it appears that a space must be added before us-ascii-coded chunks. The joining 
is non-trivial.)

The same problem occurs in Python 3.3 with str(msg['subject']).

----------
components: email
messages: 184856
nosy: barry, hniksic, r.david.murray
priority: normal
severity: normal
status: open
title: email.header.Header.__unicode__ does not decode header
versions: Python 2.7

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17505>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17505] email.header.Header.__unicode__ does not decode header

Reply via email to

[issue17505] email.header.Header.unicode does not decode header