[issue38232] empty local-part in addr_spec displayed incorrectly
New submission from Andrei Troie : Given an (RFC-legal) email address with the local part consisting of a quoted empty string (e.g. 'Nobody <""@example.org>'), when I call the 'addr_spec' property, the result no longer includes the quoted empty string (so, in this case, addr_spec would return '@example.org'). -- components: email files: example_parser.py messages: 352852 nosy: andreitroiebbc, barry, r.david.murray priority: normal severity: normal status: open title: empty local-part in addr_spec displayed incorrectly type: behavior versions: Python 3.6, Python 3.7, Python 3.8 Added file: https://bugs.python.org/file48617/example_parser.py ___ Python tracker <https://bugs.python.org/issue38232> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38232] empty local-part in addr_spec displayed incorrectly
Change by Andrei Troie : -- versions: +Python 3.9 ___ Python tracker <https://bugs.python.org/issue38232> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38232] empty local-part in addr_spec displayed incorrectly
Andrei Troie added the comment: As far as I understand it, this is due to the following code in email.headerregistry.Address.addr_spec (in 3.8 and below): if len(nameset) > len(nameset-parser.DOT_ATOM_ENDS): lp = parser.quote_string(self.username) or, in the current version on master: lp = self.username if not parser.DOT_ATOM_ENDS.isdisjoint(lp): lp = parser.quote_string(lp) Both of these tests will not work with the empty string since the empty string is always disjoint from anything, so it will never get quoted. -- ___ Python tracker <https://bugs.python.org/issue38232> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError
New submission from Andrei Troie : The following will cause a KeyError on email.message.get() import email import email.policy text = "Subject: =?us-ascii?X?somevalue?=" eml = email.message_from_string(text, policy=email.policy.default) eml.get('Subject') This is caused by the fact that the code in _encoded_words.py assumes the content-transfer-encoding of an encoded-word is always 'q' or 'b' (after lowercasing): https://github.com/python/cpython/blob/aca8c406ada3bb547765b262bed3ac0cc6be8dd3/Lib/email/_encoded_words.py#L178 I realise it's probably a silly edge case and I haven't (yet) encountered something like this in the wild, but it does seem contrary to the spirit of the email library to raise an exception like this that can propagate all the way to email.message.get(). -- components: email messages: 353624 nosy: aft90, barry, r.david.murray priority: normal severity: normal status: open title: invalid content-transfer-encoding in encoded-word causes KeyError type: crash versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38332> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError
Change by Andrei Troie : -- keywords: +patch pull_requests: +16094 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16503 ___ Python tracker <https://bugs.python.org/issue38332> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError
Andrei Troie added the comment: I agree with you that according to the RFC, the cte can of course only be "B" or "Q". My point is that, in my example, if you try to do that you get a KeyError propagating all the way down to email.message.get(), which I believe is incorrect. Consider an encoded word which is syntactically incorrect in a different way, like if for instance it's missing the terminating '?=': '=?UTF-8?Q?somevalue' Currently, this case will cause _encoded_words.py to throw a ValueError on this line: _, charset, cte, cte_string, _ = ew.split('?') Which is then caught by _header_value_parser.get_encoded_word() and handled appropriately. To me this is the same kind of thing. I agree that an exception should be thrown, I just don't think it should propagate all the way back to the caller of email.message.get(). On a separate note, I agree with you that perhaps _encoded_words.decode() should throw more specific exceptions instead of ValueError and KeyError but that's a separate thing. I can fix that if you prefer. -- ___ Python tracker <https://bugs.python.org/issue38332> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com