Jim Jewett, 08.01.2012 23:33: > Stefan Behnel wrote: >> Admittedly, this may require some adaptation for the PEP393 unicode memory >> layout in order to produce identical hashes for all three representations >> if they represent the same content. > > They SHOULD NOT represent the same content; comparing two strings > currently requires converting them to canonical form, which means the > smallest format (of those three) that works. > [...] > That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency).
That's what I meant. AFAIR, the PEP393 discussions at some point brought up the suspicion that third party code may end up generating Unicode strings that do not comply with that "invariant". So internal code shouldn't strictly rely on it when it deals with user provided data. One example is the "unequal kinds" optimisation in equality comparison, which, if I'm not mistaken, wasn't implemented, due to exactly this reasoning. The same applies to hashing then. Stefan _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com