[issue39113] PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate Pair
New submission from william.ayd : With the attached extension module, if I run the following in the REPL: >>> import libtest >>> >>> libtest.error_if_not_utf8("foo") 'foo' >>> libtest.error_if_not_utf8("\ud83d") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 0: surrogates not allowed >>> libtest.error_if_not_utf8("foo") 'foo' Things seem OK. But the next invocation of >>> libtest.error_if_not_utf8("\ud83d") Then causes a segfault. Note that the order of the input seems important; simply repeating the call with the invalid surrogate doesn't cause the segfault -- files: testmodule.c messages: 358755 nosy: william.ayd priority: normal severity: normal status: open title: PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate Pair Added file: https://bugs.python.org/file48798/testmodule.c ___ Python tracker <https://bugs.python.org/issue39113> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39113] PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate Pair
william.ayd added the comment: Hmm my mistake - thanks! -- ___ Python tracker <https://bugs.python.org/issue39113> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35041] urllib.parse.quote safe Parameter Not Optional
New submission from william.ayd : The safe parameter in urllib.parse.quote is documented as optional. However, the following will raise TypeError: 'NoneType' object is not iterable: urllib.parse.quote("/", safe=None) whereas explicitly providing an iterable will allow the function to succeed: urllib.parse.quote("/", safe=[]) -- messages: 328229 nosy: william.ayd priority: normal severity: normal status: open title: urllib.parse.quote safe Parameter Not Optional versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue35041> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35041] urllib.parse.quote safe Parameter Not Optional
william.ayd added the comment: Semantics aside is it still the intended behavior that these calls should work: urllib.parse.quote("/", safe='') AND urllib.parse.quote("/", safe=[]) But that this should raise? urllib.parse.quote("/", safe=None) IMO seems counterintuitive -- ___ Python tracker <https://bugs.python.org/issue35041> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35041] urllib.parse.quote safe Parameter Not Optional
william.ayd added the comment: Hmm well I still personally feel that the implementation is somewhat off mores than the documentation. Specifically I think it is confusing that it accepts an empty iterable but not one containing elements. This is fine: urllib.parse.quote("/", safe=[]) Though this isn't: urllib.parse.quote("/", safe=['/']) Even though the following two calls are fine (though with different return values as expected): urllib.parse.quote("/", safe='') urllib.parse.quote("/", safe='/') It might go against the spirit of duck typing but I find it very nuanced that empty iterables are allowed but if non-empty it must be a string. Would it not make more sense to raise if a non-String type is passed? -- ___ Python tracker <https://bugs.python.org/issue35041> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35041] urllib.parse.quote safe Parameter Not Optional
william.ayd added the comment: What if we instead just raised for anything that isn't a string or a byte? The docstring for quote suggests that it should only accept str or byte objects for safe, though it doesn't enforce that: https://github.com/python/cpython/blob/121eb1694cab14df857ba6abe9839654cada15cf/Lib/urllib/parse.py#L791 Seems like it would be easier to enforce that rather than trying to accept any arbitrary iterable. -- ___ Python tracker <https://bugs.python.org/issue35041> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com