from:"Alessandro Vesely"

[issue30011] HTMLParser class is not thread safe

2017-04-07 Thread Alessandro Vesely


New submission from Alessandro Vesely:

SYMPTOM:
When used in a multithreaded program, instances of a class derived from 
HTMLParser may convert an entity or leave it alone, in an apparently random 
fashion.

CAUSE:
The class has a static attribute, entitydefs, which, on first use, is 
initialized from None to a dictionary of entity definitions.  Initialization is 
not atomic.  Therefore, instances in concurrent threads assume that 
initialization is complete and catch a KeyError if the entity at hand hasn't 
been set yet.  In that case, the entity is left alone as if it were invalid.

WORKAROUND:
class Dummy(HTMLParser):
"""this class is defined here so that we can initialize its base 
class"""
def __init__(self):
HTMLParser.__init__(self)

# Initialize HTMLParser by loading htmlentitydefs
dummy = Dummy()
dummy.feed('')
del dummy, Dummy

--
components: Library (Lib)
messages: 291256
nosy: ale2017
priority: normal
severity: normal
status: open
title: HTMLParser class is not thread safe
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue30011>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30011] HTMLParser class is not thread safe

2017-04-14 Thread Alessandro Vesely

Alessandro Vesely added the comment:

On Fri 14/Apr/2017 19:44:29 +0200 Serhiy Storchaka wrote:
> 
> Changes by Serhiy Storchaka :
> 
> 
> --
> pull_requests: +1272

Thank you for your fix, Serhiy.  It makes the class behave consistently.
 However, busy processes are going to concurrently build multiple
temporary entitydefs objects before one of them wins, which is probably
worse than the greedy starting that such lazy initialization tries to
avoid in the first place.  Doesn't that design deserve a comment in the
code, at least?

Greetings
Ale

--

___
Python tracker 
<http://bugs.python.org/issue30011>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30011] HTMLParser class is not thread safe

2017-04-15 Thread Alessandro Vesely


Alessandro Vesely added the comment:

Serhiy's analysis is correct.  If anything more than a comment is going
to make its way to the code, I'd suggest to move dictionary building to
its own function, so that it can be called either on first use --like
now-- or before threading if the user is concerned.

I agree there is nothing wrong with multiple builds.  My point is just a
minor, bearable inefficiency.  It can be neglected.  Its most annoying
case is probably with test suites, which are more likely to shoot up a
bunch of new threads all at once.

Greetings
Ale

--

___
Python tracker 
<http://bugs.python.org/issue30011>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

2017-02-06 Thread Alessandro Vesely


New submission from Alessandro Vesely:

Comments are allowed almost everywhere in an email message, and should be 
eliminated before attributing any meaning to a field.  In the words of RFC5322, 
any CRLF that appears in FWS is semantically "invisible".

In particular, some note that comments can be used to deceive an email filter.  
For example, like so:

Content-Disposition: attachment;
 filename=''attached%2E";
 filename*1*="%62";
 filename*2=(fool filters)at

(I don't know which, if any, email clients would execute that batch...)

Anyway, removing comments is needed for any structured header field.  One is 
usually interested in the unfolded, de-commented value.  It is difficult to do 
correctly, because of nesting and quoting possibilities.

This issue seems to be ignored, except for address lists (there is a 
getcomment() member in AddrlistClass).  Why?

--
components: email
messages: 287119
nosy: ale2017, barry, r.david.murray
priority: normal
severity: normal
status: open
title: RFC822-comments in email header fields can fool, e.g., get_filename()
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue29462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

2017-02-06 Thread Alessandro Vesely


Alessandro Vesely added the comment:

Neither I found CFWS in rfc2231.  In addition, rfc 2045 (Introduction) says 
that Content-Disposition —where filename is defined— cannot include comments.  
However, Content-Type can include RFC 822 comments, so the filename should be 
de-commented in case it is inferred from the name parameter there.

I'm rather new to Python, and sticking to version 2 because of the packages I 
work with.  I see Python3's email has a much more robust design.  Does this 
mean Python2 cannot get fixed?

I attach a de_comment() function, copied from the one I mentioned this morning. 
 The rest of the file shows its intended use.  (Oops, it removes comments even 
from where they are not supposed to be allowed ;-)
Having that kind of functionality in email.utils would make it easier to read 
Message's, no?

--
Added file: http://bugs.python.org/file46551/attachments.py

___
Python tracker 
<http://bugs.python.org/issue29462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

2017-02-07 Thread Alessandro Vesely


Alessandro Vesely added the comment:

We can close this, then.  Let's hope migration to Python3 isn't going to last 
forever...

Thank you for your cooperation

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue29462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30011] HTMLParser class is not thread safe

[issue30011] HTMLParser class is not thread safe

[issue30011] HTMLParser class is not thread safe

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

[issue29462] RFC822-comments in email header fields can fool, e.g., get_filename()

6 matches

Site Navigation

Mail list logo

Footer information