[issue38232] empty local-part in addr_spec displayed incorrectly

2019-09-20 Thread Andrei Troie


New submission from Andrei Troie :

Given an (RFC-legal) email address with the local part consisting of a quoted 
empty string (e.g. 'Nobody <""@example.org>'), when I call the 'addr_spec' 
property, the result no longer includes the quoted empty string (so, in this 
case, addr_spec would return '@example.org').

--
components: email
files: example_parser.py
messages: 352852
nosy: andreitroiebbc, barry, r.david.murray
priority: normal
severity: normal
status: open
title: empty local-part in addr_spec displayed incorrectly
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file48617/example_parser.py

___
Python tracker 
<https://bugs.python.org/issue38232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38232] empty local-part in addr_spec displayed incorrectly

2019-09-20 Thread Andrei Troie


Change by Andrei Troie :


--
versions: +Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38232] empty local-part in addr_spec displayed incorrectly

2019-09-23 Thread Andrei Troie


Andrei Troie  added the comment:

As far as I understand it, this is due to the following code in 
email.headerregistry.Address.addr_spec (in 3.8 and below):

if len(nameset) > len(nameset-parser.DOT_ATOM_ENDS):
lp = parser.quote_string(self.username)

or, in the current version on master:

lp = self.username
if not parser.DOT_ATOM_ENDS.isdisjoint(lp):
lp = parser.quote_string(lp)

Both of these tests will not work with the empty string since the empty string 
is always disjoint from anything, so it will never get quoted.

--

___
Python tracker 
<https://bugs.python.org/issue38232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError

2019-09-30 Thread Andrei Troie


New submission from Andrei Troie :

The following will cause a KeyError on email.message.get()

import email
import email.policy

text = "Subject: =?us-ascii?X?somevalue?="
eml = email.message_from_string(text, policy=email.policy.default)
eml.get('Subject')

This is caused by the fact that the code in _encoded_words.py assumes the 
content-transfer-encoding of an encoded-word is always 'q' or 'b' (after 
lowercasing): 
https://github.com/python/cpython/blob/aca8c406ada3bb547765b262bed3ac0cc6be8dd3/Lib/email/_encoded_words.py#L178

I realise it's probably a silly edge case and I haven't (yet) encountered 
something like this in the wild, but it does seem contrary to the spirit of the 
email library to raise an exception like this that can propagate all the way to 
email.message.get().

--
components: email
messages: 353624
nosy: aft90, barry, r.david.murray
priority: normal
severity: normal
status: open
title: invalid content-transfer-encoding in encoded-word causes KeyError
type: crash
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38332>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError

2019-10-01 Thread Andrei Troie


Change by Andrei Troie :


--
keywords: +patch
pull_requests: +16094
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16503

___
Python tracker 
<https://bugs.python.org/issue38332>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38332] invalid content-transfer-encoding in encoded-word causes KeyError

2019-10-01 Thread Andrei Troie


Andrei Troie  added the comment:

I agree with you that according to the RFC, the cte can of course only be "B" 
or "Q". My point is that, in my example, if you try to do that you get a 
KeyError propagating all the way down to email.message.get(), which I believe 
is incorrect. 

Consider an encoded word which is syntactically incorrect in a different way, 
like  if for instance it's missing the terminating '?=':

'=?UTF-8?Q?somevalue'

Currently, this case will cause _encoded_words.py to throw a ValueError on this 
line:

_, charset, cte, cte_string, _ = ew.split('?')

Which is then caught by _header_value_parser.get_encoded_word() and handled 
appropriately.

To me this is the same kind of thing. I agree that an exception should be 
thrown, I just don't think it should propagate all the way back to the caller 
of email.message.get().

On a separate note, I agree with you that perhaps _encoded_words.decode() 
should throw more specific exceptions instead of ValueError and KeyError but 
that's a separate thing. I can fix that if you prefer.

--

___
Python tracker 
<https://bugs.python.org/issue38332>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com