Serhiy Storchaka added the comment:
Yes, there is a bug. When decoding_fgets() encounter non-UTF-8 bytes, it fails
and free input buffer in error_ret(). But since tok->cur != tok->inp, next call
of tok_nextc() reads freed memory.
if (tok->cur != tok->inp) {
return Py_CHARMASK(*tok->cur++); /* Fast path */
}
If Python is not crashed here, new buffer is allocated and assigned to
tok->buf, then PyTokenizer_Get returns error, parsetok() calculates the
position of the error
err_ret->offset = (int)(tok->cur - tok->buf);
but tok->cur points inside old freed buffer, and the offset becomes too large
integer. err_input() tries to decode the part of the string before error with
the "replace" error handler, but since the position was wrongly calculated, it
reads out of allocated memory.
Proposed patch fixes the issue. It sets tok->done and pointers in case of
decoding error, so they now are in consistent state. It also removes some
duplicated or dead code.
----------
stage: -> patch review
Added file: http://bugs.python.org/file40965/issue25388.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25388>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com