Roy H. Han wrote: > On Wed, Feb 25, 2009 at 8:39 AM, <[email protected]> wrote: [Top-posting corrected] >> John Machin <[email protected]> wrote: >>> On Feb 25, 11:07=A0am, "Roy H. Han" <[email protected]> >>> wrote: >>>> Dear python-list, >>>> >>>> I'm having some trouble decoding an email header using the standard >>>> imaplib.IMAP4 class and email.message_from_string method. >>>> >>>> In particular, email.message_from_string() does not seem to properly >>>> decode unicode characters in the subject. >>>> >>>> How do I decode unicode characters in the subject? >>> You don't. You can't. You decode str objects into unicode objects. You >>> encode unicode objects into str objects. If your input is not a str >>> object, you have a problem. >> I can't speak for the OP, but I had a similar (and possibly >> identical-in-intent) question. Suppose you have a Subject line that >> looks like this: >> >> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= >> =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= >> >> How do you get the email module to decode that into unicode? The same >> question applies to the other header lines, and the answer is it isn't >> easy, and I had to read and reread the docs and experiment for a while >> to figure it out. I understand there's going to be a sprint on the >> email module at pycon, maybe some of this will get improved then. >> >> Here's the final version of my test program. The third to last line is >> one I thought ought to work given that Header has a __unicode__ method. >> The final line is the one that did work (note the kludge to turn None >> into 'ascii'...IMO 'ascii' is what deocde_header _should_ be returning, >> and this code shows why!) >> >> ------------------------------------------------------------------- >> from email import message_from_string >> from email.header import Header, decode_header >> >> x = message_from_string("""\ >> To: test >> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= >> =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= >> >> this is a test. >> """) >> >> print x >> print "--------------------" >> for key, header in x.items(): >> print key, 'type', type(header) >> print key+":", unicode(Header(header)).decode('utf-8') >> print key+":", decode_header(header) >> print key+":", ''.join([s.decode(t or 'ascii') for (s, t) in >> decode_header(header)]).encode('utf-8') >> ------------------------------------------------------------------- >> >> >> From nobody Wed Feb 25 08:35:29 2009 >> To: test >> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= >> =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= >> >> this is a test. >> >> -------------------- >> To type <type 'str'> >> To: test >> To: [('test', None)] >> To: test >> Subject type <type 'str'> >> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= >> =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= >> Subject: [("'u' Obselete type", None), ("-- it is identical to 'd'. (7)", >> 'iso-8859-1')] >> Subject: 'u' Obselete type-- it is identical to 'd'. (7) >> >> > Thanks for writing back, RDM and John Machin. Tomorrow I'll try the > code you suggested, RDM. It looks quite helpful and I'll report the > results. > > In the meantime, John asked for more data. The sender's email client > is Microsoft Outlook 11. The recipient email client is Lotus Notes. > > > > Actual Subject > =?us-ascii?Q?Inteum_C/SR_User_Tip:__Quick_Access_to_Recently_Opened_Inteu?=\r\n\t=?us-ascii?Q?m_C/SR_Records?= > > Expected Subject > Inteum C/SR User Tip: Quick Access to Recently Opened Inteum C/SR Records > > X-Mailer > Microsoft Office Outlook 11 > > X-MimeOLE > Produced By Microsoft MimeOLE V6.00.2900.5579 > >>> from email.header import decode_header >>> print decode_header("=?us-ascii?Q?Inteum_C/SR_User_Tip:__Quick_Access_to_Recently_Opened_Inteu?=\r\n\t=?us-ascii?Q?m_C/SR_Records?=") [('Inteum C/SR User Tip: Quick Access to Recently Opened Inteum C/SR Records', 'us-ascii')] >>>
regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list
