[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Yotam Medini added the comment: The HTMLParser.py fails when inside ... it can fooled by JavaScript with less-than '<' conditional expressions. In the attached example: $ tar tvzf lt-in-script-example.tgz | cut -c24- 796 2010-09-30 16:52 h2t.py 23678 2010-09-30 16:39 t.html here's what happens: $ python h2t.py t.html /tmp/t.txt HTMLParser: /home/yotam/src/wog/HTMLParser.bug/HTMLParser.py Traceback (most recent call last): File "h2t.py", line 31, in text = html2text(f_html.read()) File "h2t.py", line 23, in html2text te = TextExtractor(html) File "h2t.py", line 15, in __init__ self.feed(html) File "/home/yotam/src/wog/HTMLParser.bug/HTMLParser.py", line 108, in feed self.goahead(0) File "/home/yotam/src/wog/HTMLParser.bug/HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) File "/home/yotam/src/wog/HTMLParser.bug/HTMLParser.py", line 229, in parse_starttag endpos = self.check_for_whole_start_tag(i) File "/home/yotam/src/wog/HTMLParser.bug/HTMLParser.py", line 304, in check_for_whole_start_tag self.error("malformed start tag") File "/home/yotam/src/wog/HTMLParser.bug/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: malformed start tag, at line 396, column 332 I have a suggested patch HTMLParser.diff fixing this problem, soon to be attached. -- yotam -- nosy: +yotam Added file: http://bugs.python.org/file19072/lt-in-script-example.tgz ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Yotam Medini added the comment: The attached suggested patch fixes the problems shown in msg117762. -- Added file: http://bugs.python.org/file19073/HTMLParser.diff ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Changes by Yotam Medini : Added file: http://bugs.python.org/file20231/endtag-space.html ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Changes by Yotam Medini : Added file: http://bugs.python.org/file20232/dollar-extra.html ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Yotam Medini added the comment: Suggested fix for the attached cases: lt-in-script-example.tgz endtag-space.html dollar-extra.html -- Added file: http://bugs.python.org/file20233/ltscr-endtag-dollarext.diff ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43107] RotatingFileHandler with multi processes creates too small backup files
New submission from Yotam Medini : Using RotatingFileHandler by multi-processes when they reach a point for rotation (roll-over) they continue on separate files. Two problems: 1. Eventually some backup files are left with sizes much less than the maxBytes configuration. 2. Intertwining events are not logged together, but separated by processes. -- components: Library (Lib) messages: 386165 nosy: yotam priority: normal severity: normal status: open title: RotatingFileHandler with multi processes creates too small backup files type: enhancement versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue43107> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com