[issue32431] Two bytes objects of zero length don't compare equal
New submission from Jonathan Underwood : With the current logic in Objects/bytesobject.c in the function bytes_compare_eq it can be the case that zero length bytes object object created in an extension module like this: val = PyBytes_FromStringAndSize (NULL, 20); Py_SIZE(val) = 0; won't compare equal to b'' because the memory is not initialized, so the first two bytes won't be equal. Nonetheless, the Python interpreter does return b'' for print(repr(val)), so this behaviour is very confusing. To get the correct behaviour, one would have to initialize the memory: val = PyBytes_FromStringAndSize (NULL, 20); c = PyBytes_AS_STRING (val); c[0] = '\0'; Py_SIZE(val) = 0; However, it would be more sensible to fix the logic in bytes_compare_eq in my opinion. That function should return true for two zero length bytes objects, irrespective of the memory contents. -- components: Interpreter Core messages: 309086 nosy: jonathanunderwood priority: normal severity: normal status: open title: Two bytes objects of zero length don't compare equal type: behavior versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue32431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32431] Two bytes objects of zero length don't compare equal
Jonathan Underwood added the comment: https://github.com/python/cpython/pull/5021 -- ___ Python tracker <https://bugs.python.org/issue32431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32431] Two bytes objects of zero length don't compare equal
Change by Jonathan Underwood : -- keywords: +patch pull_requests: +4911 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue32431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32431] Two bytes objects of zero length don't compare equal
Jonathan Underwood added the comment: Py_SIZE is actually precisely specified and documented[1] as it stands; I don't think a change there is needed. The usage I outlined is in line with that documentation, and many other uses of that macro in the cpython sources. The documentation issues are, at least: 1. There is no documentation specifying that bytes objects should be null terminated. 2. Nothing in the documentation of PyBytes_FromStringAndSize[2] specifies that passing 0 as the size results in a singleton being returned. This is undocumented behaviour, and it would seem fragile to rely on this. But there are more implementation inconsistencies: the documentation for PyBytes_AsString()[3] returns a buffer which is one byte longer than the length of the object *in order to store a terminating null*, which implies that the object need not itself have a terminating null. I could go on with other examples, but this is very poorly defined behaviour. Question: are bytes objects defined to be null terminated, or not? Because if they're not defined to be null terminated, the fix I propose is correct even if it doesn't solve the other 100 bugs lurking in the code. [Aside: even if bytes objects are in fact defined to be null terminated, I think the change proposed amounts to an optimization in any case.] [1] https://docs.python.org/3/c-api/structures.html#c.Py_SIZE [2] https://docs.python.org/3/c-api/bytes.html?highlight=pybytes_fromstringandsize#c.PyBytes_FromStringAndSize [3] https://docs.python.org/3/c-api/bytes.html?highlight=pybytes_asstring#c.PyBytes_AsString -- ___ Python tracker <https://bugs.python.org/issue32431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32431] Two bytes objects of zero length don't compare equal
Jonathan Underwood added the comment: Actually the commentary at the top of bytesobject.c for PyBytes_FromStringAndSize says: "... If `str' is NULL then PyBytes_FromStringAndSize() will allocate `size+1' bytes (setting the last byte to the null terminating character)... " So, perhaps that's as close to gospel as it gets - this does imply that bytes objects are expected to be null terminated. Why PyBytesAsString then adds an extra null terminator is a bit of a mystery. Perhaps what's needed is some documentation clarifications: 1/ State early on that bytes objects are always expected to be null terminated. 2/ As such, the string pointer returned by PyBytes_AsString will point to a null terminated string - I think the current docs could be misinterpreted to suggest that _AsString *adds* an extra byte for the null, which it doesn't. 3/ Document that using Py_SIZE to reduce the length of a bytes object is dangerous, because the null terminator will be lost, and subsequent behaviour undefined. 4/ Document that the preferred way to resize is to use PyBytes_FromStringAndSize with a new size. 5/ Indicate clearly that _PyBytes_Resize is not a public interface and its use is discouraged in favour of PyBytes_FromStringAndSize -- ___ Python tracker <https://bugs.python.org/issue32431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com