New submission from Evan <[email protected]>:
Relevant base python library--
C:\Users\User\AppData\Local\Programs\Python\Python38\lib\_markupbase.py
The issue- After parsing over 900 SEC filings using beautifulsoup4, I get this
user warning.
UserWarning: unknown status keyword 'ERF' in marked section
warnings.warn(msg)
Followed by a traceback
....
File
"C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py",
line 325, in __init__
self._feed()
....
File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\_markupbase.py",
line 160, in parse_marked_section
if not match:
UnboundLocalError: local variable 'match' referenced before assignment
It's probably to due to malformed input from on of the docs.
144 lines into _markupbase lib we have:
# Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if
word]>content<![endif]>
def parse_marked_section(self, i, report=1):
rawdata= self.rawdata
assert rawdata[i:i+3] == '<![', "unexpected call to
parse_marked_section()"
sectName, j = self._scan_name( i+3, i )
if j < 0:
return j
if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
# look for standard ]]> ending
match= _markedsectionclose.search(rawdata, i+3)
elif sectName in {"if", "else", "endif"}:
# look for MS Office ]> ending
match= _msmarkedsectionclose.search(rawdata, i+3)
else:
self.error('unknown status keyword %r in marked section' %
rawdata[i+3:j])
if not match:
return -1
if report:
j = match.start(0)
self.unknown_decl(rawdata[i+3: j])
return match.end(0)
`match` should be set to None in the fall-through else statement right before
`if not match`.
----------
components: Library (Lib)
messages: 363234
nosy: SanJacintoJoe
priority: normal
severity: normal
status: open
title: Bug in html parsing module triggered by malformed input
type: compile error
versions: Python 3.8
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39833>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com