[Python-Dev] Problems with hex-conversion functions
Hello everyone. I see several problems with the two hex-conversion function pairs that Python offers: 1. binascii.hexlify and binascii.unhexlify 2. bytes.fromhex and bytes.hex Problem #1: bytes.hex is not implemented, although it was specified in PEP 358. This means there is no symmetrical function to accompany bytes.fromhex. Problem #2: Both pairs perform the same function, although The Zen Of Python suggests that "There should be one-- and preferably only one --obvious way to do it." I do not understand why PEP 358 specified the bytes function pair although it mentioned the binascii pair... Problem #3: bytes.fromhex may receive spaces in the input string, although binascii.unhexlify may not. I see no good reason for these two functions to have different features. Problem #4: binascii.unhexlify may receive both input types: strings or bytes, whereas bytes.fromhex raises an exception when given a bytes parameter. Again there is no reason for these functions to be different. Problem #5: binascii.hexlify returns a bytes type - although ideally, converting to hex should always return string types and converting from hex should always return bytes. IMO there is no meaning of bytes as an output of hexlify, since the output is a representation of other bytes. This is also the suggested behavior of bytes.hex in PEP 358 Problems #4 and #5 call for a decision about the input and output of the functions being discussed: Option A : Strict input and output unhexlify (and bytes.fromhex) may only receives string and may only return bytes hexlify (and bytes.hex) may only receives bytes and may only return strings Option B : Robust input and strict output unhexlify (and bytes.fromhex) may receive bytes and strings and may only return bytes hexlify (and bytes.hex) may receive bytes or strings and may only return strings Of course we may also consider a third option, which will allow the return type of all functions to be robust (perhaps specified in a keyword argument), but as I wrote in the description of problem #5, I see no sense in that. Note that PEP 3137 describes: "... the more strict definitions of encoding and decoding in Python 3000: encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string." - suggesting option A. To repeat problems #4 and #5, the current behavior does not match any option: * The return type of binascii.hexlify should be string, and this is not the current behavior. As for the input: * Option A is not the current behavior because binascii.unhexlify may receive both input types. * Option B is not the current behavior because bytes.fromhex does not allow bytes as input. To fix these issues, three changes should be applied: 1. Deprecate bytes.fromhex. This fixes the following problems: #4 (go with option B and remove the function that does not allow bytes input) #2 (the binascii functions will be the only way to "do it") #1 (bytes.hex should not be implemented) 2. In order to keep the functionality that bytes.fromhex has over unhexlify, the latter function should be able to handle spaces in its input (fix #3) 3. binascii.hexlify should return string as its return type (fix #5) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with hex-conversion functions
Sorry for the late reply. I would really like to see this fixed. >> Or we [...] deprecate binascii.(un)hexlify(). ... >> binascii is the legacy approach here, so if anything was to go, those functions would be it ... I'm not entirely convinced binascii is the legacy approach. What makes this module "legacy"? On the contrary, I'm pretty sure modularity is better than sticking all the functionality in the core. As was written in this issue: http://psf.upfronthosting.co.za/roundup/tracker/issue3532 "If you wanted to produce base-85 (say), then you can extend the functionality of bytes by providing a function that does that, whereas you can't extend the existing bytes type." This example shows that "hex" is actually getting a special treatment by having builtin methods associated with the bytes type. Why don't we add ".base64" methods? Or even ".zlib"? After all, these options were present in Python 2.x using the "encode" method of string. In my opinion, having modules to deal with these types of conversions is better, and this is why I suggested sticking to binascii. In any case, seeing as both this discussion and the one linked above were abandoned, I would like to hear about what needs to be done to actually fix these issues. If no one else is willing to do it (that would be a little disappoiting), I think I have the skills to learn and fix the code itself, but I don't have the time and I am unfamiliar with the process of submitting patches and getting them approved. For example, who gets to decide about the correct approach? Is there a better place to discuss this? Thanks for the responses. -- Arnon On Sun, Sep 6, 2009 at 5:51 AM, Nick Coghlan wrote: > Brett Cannon wrote: > >> To fix these issues, three changes should be applied: > >> 1. Deprecate bytes.fromhex. This fixes the following problems: > >>#4 (go with option B and remove the function that does not allow > bytes > >> input) > >>#2 (the binascii functions will be the only way to "do it") > >>#1 (bytes.hex should not be implemented) > >> 2. In order to keep the functionality that bytes.fromhex has over > unhexlify, > >>the latter function should be able to handle spaces in its input (fix > #3) > >> 3. binascii.hexlify should return string as its return type (fix #5) > > > > Or we fix bytes.fromhex(), add bytes.hex() and deprecate > binascii.(un)hexlify(). > > binascii is the legacy approach here, so if anything was to go, those > functions would be it. I'm not sure getting rid of them is worth the > hassle though (especially in 2.x). > > Regarding bytes.hex(), it may be better to modify the builtin hex() > function to accept bytes as an input type. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > --- > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com