[issue22621] Please make it possible to make the output of hash() equal between 32 and 64 bit architectures
New submission from josch: I recently realized that the output of the following is different between 32 bit and 64 bit architectures: PYTHONHASHSEED=0 python3 -c 'print(hash("a"))' In my case, I'm running some test cases which involve calling a Python module which creates several hundred megabyte big graphs and other things. The fastest way to make sure that the output I get is the same that I expect is to just call the md5sum or sha256sum shell tools on the output and compare them with the expected values. Unfortunately, some libraries I use rely on the order of items in Python dictionaries for their output. Yes, they should not do that but they also don't care and thus don't fix the problem. My initial solution to this was to use PYTHONHASHSEED=0 which helped but I now found out that this is limited to producing the same hash within the set of 32 bit and 64 bit architectures, respectively. See above line which behaves different depending on the integer size of architectures. So what I'd like CPython to have is yet another workaround like PYTHONHASHSEED which allows me to temporarily influence the inner workings of the hash() function such that it behaves the same on 32 bit and 64 bit architectures. Maybe something like PYTHONHASH32BIT or similar? If I understand the CPython hash function correctly, then this environment variable would just bitmask the result of the function with 0x or cast it to int32_t to achieve the same output across architectures. Would this be possible? My only alternative seems to be to either maintain patched versions of all modules I use which wrongly rely on dictionary ordering or to go to great lengths of parsing the (more or less) random output they produce into a sorted intermediary format - which seems like a bad idea because the files are several hundred megabytes big and this would just take very long and require additional complexity in handling them compared to being able to just md5sum or sha256sum them for the sake of checking whether my test cases succeed or not. -- messages: 229219 nosy: josch priority: normal severity: normal status: open title: Please make it possible to make the output of hash() equal between 32 and 64 bit architectures versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue22621> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22621] Please make it possible to make the output of hash() equal between 32 and 64 bit architectures
josch added the comment: Thank you for your quick reply. Yes, as I wrote above there are ways around it by creating a stable in-memory representation and comparing that to a stable in-memory representation of the expected output. Since both input are several hundred megabytes in size, this would be CPU intensive but do-able. I would've just likeld to avoid treating this output in a special way because I also compare other files and it is most easy to just md5sum all of the files in one fell swoop. I started using PYTHONHASHSEED to gain stable output for a certain platform/version combination. When I uploaded my package to Debian and it was built on 13 different architectures I noticed the descrepancy when the same version but different platforms are involved. >From my perspective it would be nice to just be able to set PYTHONHASH32BIT >(or whatever) and call it a day. But of course it is your choice whether you >would allow such a "hack" or not. Would your decision be more favorable if you received a patch implementing this feature? -- ___ Python tracker <http://bugs.python.org/issue22621> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24605] segmentation fault at asciilib_split_char.lto_priv
New submission from josch: Hi, sometimes (but not reliably reproducibly, one has to run it a few times) I get a segmentation fault when running the following networkx based Python code on large input graphs: https://gitlab.mister-muffin.de/debian-bootstrap/botch/blob/master/tools/graph-difference.py I'm running Debian unstable with the python3.4 package of version 3.4.3-7 on architecture amd64. The core dump is 1GB large so I'm just attaching the traceback from gdb. The string "hscolour:amd64 (= 1.20.3-2)" that you see in the traceback is one of the vertex attributes in the input graph. What else do you need to debug the problem? #0 asciilib_split_char.lto_priv () at ../Objects/stringlib/split.h:126 #1 0x0058e65a in asciilib_split (maxcount=, sep_len=1, sep=0x7f1b3088dfb0 ",", str_len=27, str=0x7f1b230abfb0 "hscolour:amd64 (= 1.20.3-2)", str_obj='hscolour:amd64 (= 1.20.3-2)') at ../Objects/stringlib/split.h:158 #2 split (maxcount=, substring=',', self='hscolour:amd64 (= 1.20.3-2)') at ../Objects/unicodeobject.c:10099 #3 unicode_split.lto_priv () at ../Objects/unicodeobject.c:12639 #4 0x0050d8fe in call_function (oparg=, pp_stack=0x7ffdf1a8ed80) at ../Python/ceval.c:4237 #5 PyEval_EvalFrameEx () at ../Python/ceval.c:2838 #6 0x005ab095 in PyEval_EvalCodeEx () at ../Python/ceval.c:3588 #7 0x0051163d in fast_function (nk=, na=, n=, pp_stack=0x7ffdf1a8ef60, func=) at ../Python/ceval.c:4344 #8 call_function (oparg=, pp_stack=0x7ffdf1a8ef60) at ../Python/ceval.c:4262 #9 PyEval_EvalFrameEx () at ../Python/ceval.c:2838 #10 0x005ab095 in PyEval_EvalCodeEx () at ../Python/ceval.c:3588 #11 0x0051163d in fast_function (nk=, na=, n=, pp_stack=0x7ffdf1a8f140, func=) at ../Python/ceval.c:4344 #12 call_function (oparg=, pp_stack=0x7ffdf1a8f140) at ../Python/ceval.c:4262 #13 PyEval_EvalFrameEx () at ../Python/ceval.c:2838 #14 0x005ab095 in PyEval_EvalCodeEx () at ../Python/ceval.c:3588 #15 0x0051163d in fast_function (nk=, na=, n=, pp_stack=0x7ffdf1a8f320, func=) at ../Python/ceval.c:4344 #16 call_function (oparg=, pp_stack=0x7ffdf1a8f320) at ../Python/ceval.c:4262 #17 PyEval_EvalFrameEx () at ../Python/ceval.c:2838 #18 0x005ab095 in PyEval_EvalCodeEx () at ../Python/ceval.c:3588 #19 0x005e16a5 in PyEval_EvalCode ( locals={'__package__': None, '__doc__': None, '__spec__': None, 'sys': , 'graph_difference': , '__file__': './tools/graph-difference.py', '__builtins__': , 'parser': , _registries={'action': {'append': , 'store_true': , 'store_false': , 'help': , 'count': , 'append_const': , 'store': , None: , 'store_const': , 'version': , 'parsers': }, 'type': {None: }}, _group_actions=[<_StoreAction(d...(truncated), globals={'__package__': None, '__doc__': None, '__spec__': None, 'sys': , 'graph_difference': , '__file__': './tools/graph-difference.py', '__builtins__': , 'parser': , _registries={'action': {'append': , 'store_true': , 'store_false': , 'help': , 'count': , 'append_const': , 'store': , None: , 'store_const': , 'version': , 'parsers': }, 'type': {None: }}, _group_actions=[<_StoreAction(d...(truncated), co=) at ../Python/ceval.c:775 #20 run_mod () at ../Python/pythonrun.c:2180 #21 0x005e176a in PyRun_FileExFlags () at ../Python/pythonrun.c:2133 #22 0x005e237a in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:1606 #23 0x005fdb60 in run_file (p_cf=, filename=, fp=) at ../Modules/main.c:319 #24 Py_Main () at ../Modules/main.c:751 #25 0x004c234f in main () at ../Modules/python.c:69 #26 0x7f1b30972b45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #27 0x005ba765 in _start () -- messages: 246559 nosy: josch priority: normal severity: normal status: open title: segmentation fault at asciilib_split_char.lto_priv versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue24605> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24605] segmentation fault at asciilib_split_char.lto_priv
josch added the comment: I do not see any module implemented in C in the imports. Is there a way to find out from where the segmentation fault came? -- ___ Python tracker <http://bugs.python.org/issue24605> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com