[issue39113] PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate Pair

2019-12-20 Thread william.ayd


New submission from william.ayd :

With the attached extension module, if I run the following in the REPL:

>>> import libtest
>>>
>>> libtest.error_if_not_utf8("foo")
'foo'
>>> libtest.error_if_not_utf8("\ud83d")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 
0: surrogates not allowed
>>> libtest.error_if_not_utf8("foo")
'foo'

Things seem OK. But the next invocation of

>>> libtest.error_if_not_utf8("\ud83d")

Then causes a segfault. Note that the order of the input seems important; 
simply repeating the call with the invalid surrogate doesn't cause the segfault

--
files: testmodule.c
messages: 358755
nosy: william.ayd
priority: normal
severity: normal
status: open
title: PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate 
Pair
Added file: https://bugs.python.org/file48798/testmodule.c

___
Python tracker 
<https://bugs.python.org/issue39113>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39113] PyUnicode_AsUTF8AndSize Sometimes Segfaults With Incomplete Surrogate Pair

2019-12-20 Thread william.ayd


william.ayd  added the comment:

Hmm my mistake - thanks!

--

___
Python tracker 
<https://bugs.python.org/issue39113>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35041] urllib.parse.quote safe Parameter Not Optional

2018-10-21 Thread william.ayd


New submission from william.ayd :

The safe parameter in urllib.parse.quote is documented as optional. However, 
the following will raise TypeError: 'NoneType' object is not iterable:

urllib.parse.quote("/", safe=None)

whereas explicitly providing an iterable will allow the function to succeed:

urllib.parse.quote("/", safe=[])

--
messages: 328229
nosy: william.ayd
priority: normal
severity: normal
status: open
title: urllib.parse.quote safe Parameter Not Optional
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue35041>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35041] urllib.parse.quote safe Parameter Not Optional

2018-10-21 Thread william.ayd


william.ayd  added the comment:

Semantics aside is it still the intended behavior that these calls should work:

urllib.parse.quote("/", safe='')

AND 

urllib.parse.quote("/", safe=[])

But that this should raise?

urllib.parse.quote("/", safe=None)

IMO seems counterintuitive

--

___
Python tracker 
<https://bugs.python.org/issue35041>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35041] urllib.parse.quote safe Parameter Not Optional

2018-10-21 Thread william.ayd


william.ayd  added the comment:

Hmm well I still personally feel that the implementation is somewhat off mores 
than the documentation. Specifically I think it is confusing that it accepts an 
empty iterable but not one containing elements.

This is fine:

urllib.parse.quote("/", safe=[])

Though this isn't:

urllib.parse.quote("/", safe=['/'])

Even though the following two calls are fine (though with different return 
values as expected):

urllib.parse.quote("/", safe='')
urllib.parse.quote("/", safe='/')

It might go against the spirit of duck typing but I find it very nuanced that 
empty iterables are allowed but if non-empty it must be a string. Would it not 
make more sense to raise if a non-String type is passed?

--

___
Python tracker 
<https://bugs.python.org/issue35041>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35041] urllib.parse.quote safe Parameter Not Optional

2018-10-21 Thread william.ayd


william.ayd  added the comment:

What if we instead just raised for anything that isn't a string or a byte? The 
docstring for quote suggests that it should only accept str or byte objects for 
safe, though it doesn't enforce that:

https://github.com/python/cpython/blob/121eb1694cab14df857ba6abe9839654cada15cf/Lib/urllib/parse.py#L791

Seems like it would be easier to enforce that rather than trying to accept any 
arbitrary iterable.

--

___
Python tracker 
<https://bugs.python.org/issue35041>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com