Nathaniel Smith <n...@pobox.com> wrote:
>On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn ><d.s.seljeb...@astro.uio.no> wrote: >> Do you really think it complicates the spec? SHA-1 is pretty >standard, and >> Python ships with hashlib (the hashing part isn't performance >critical). >> >> I prefer hashing to string-interning as it can still be done >compile-time >> etc. 160 bits isn't worse than the second-to-best strcmp case of a >256-bit >> function entry. > >If you're *so* set on compile-time calculation, one could also >accommodate these within the intern framework pretty easily. Any >PyString/PyBytes * will be aligned, which means the low bit will not >be set, which means there are at least 2**31 bit-patterns that will >never be used by a run-time interned string. So we could write down a >lookup table in the spec that assigns arbitrary, well-known numbers to >every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have >15 standard types, then you can assign such an id to every 0, 1, 2, 3, >4, 5, and 6 argument function with space left over. > >And this could all be abstracted away inside the intern() function. >The only thing is that if you wanted to look at the characters in the >interned string, you'd have to call a disintern() function instead of >just following the pointer. > >I still think all this stuff would be complexity for its own sake, >though. > >> Shortening the hash to 120 bits (truncation) we could have a spec >like this: >> >> - Short signature: [64 bit encoded signature. 64 bit funcptr] >> - Long signature: [64 bit hash, 64 bit pointer to full signature, >> 8 bit guard byte, 56 bits remaining hash, >> 64 bit funcptr] > >This is a fixed length encoding, so why does it need a guard byte? No, there is two cases, one 128 bit and one 256 bit. > >BTW, the guard byte design in the last version of the CEP looks buggy >to me -- there's no guarantee that a valid pointer might not contain >the guard byte by accident. A solution would be to move the In the CEP text some posts ago? I am pretty sure I made sure that pointers would never be looked at -- you are supposed to scan in 128 bit jumps and will never look at the beginning of a pointer. Read it again and see if you can make a counterexample... That is the reason the above works, and why I split the hash in two segments. >to-be-continued byte (or bit) to the first word. This would also mean >that if you're looking for a one-word signature via switch(), you >won't hit signatures which have your signature as a prefix. In the You need 0-termination to be part of the signature (and if the 0 spills over, you spill over). I should have said that, good catch. Dag >variable-length encoding with the lookup rule you suggested you'd also >want a second bit to mark the actual beginning of each structure, so >you don't get hits on the middle of structures. > >> Anyway: Looks like it's about time to do some benchmarks. I'll try to >get >> around to it next week. > > Agreed :-). > >- N >_______________________________________________ >cython-devel mailing list >cython-devel@python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel