[Cython] Resurrecting __dict__ for extension types
As per a couple of discussions online ( http://mail.python.org/pipermail/cython-devel/2011-February/000122.html), it looks like at one point it was pretty close to being able to programmatically and automatically generate a __dict__ for extension types like for CPython classes. I have to manually code a function that does exactly what __dict__ should do, and it is a pain. I have some classes with tens of attributes, and that is already a big enough pain. This is especially useful to more easily enable deepcopy and pickling for classes. While on the pickling theme, it seems it really ought to be pretty straightforward to automatically pickle extension types. Don't you already have all the necessary information at compile time? This was on the wish list at one point if I am not mistaken and would be very useful to me and lots of other people. I'm finally loving coding in Cython and am finally making sense of how best to use extension types. Regards, Ian ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
mark florisson, 05.06.2012 22:33: > It doesn't even necessarily have to be about running user code, a user > could craft data input which causes such a situation. For instance, > let's say we have a just-in-time specializer which specializes a > function for the runtime input types, and the types depend on the user > input. For instance, if we write a web application we can post arrays > to described by a custom dtype, which draws pictures in some weird way > for us. We can get it to specialize pretty much any array type, so > that gives us a good opportunity to find collisions. Yes, and the bad thing is that a very high probability of having no collisions even in combination with the need for a huge amount of brute force work to find one is not enough. An attacker (or otherwise interested user) may just be lucky, and given how low in the application stack this will be used, such a bit of luck may have massive consequences. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
Stefan Behnel wrote: >mark florisson, 05.06.2012 22:33: >> It doesn't even necessarily have to be about running user code, a >user >> could craft data input which causes such a situation. For instance, >> let's say we have a just-in-time specializer which specializes a >> function for the runtime input types, and the types depend on the >user >> input. For instance, if we write a web application we can post arrays >> to described by a custom dtype, which draws pictures in some weird >way >> for us. We can get it to specialize pretty much any array type, so >> that gives us a good opportunity to find collisions. > >Yes, and the bad thing is that a very high probability of having no >collisions even in combination with the need for a huge amount of brute >force work to find one is not enough. An attacker (or otherwise >interested >user) may just be lucky, and given how low in the application stack >this >will be used, such a bit of luck may have massive consequences. Following that line of argument, I guess you keep your money in a mattress then? Our modern world is built around the assumption that people don't get *that* lucky. (I agree though that 64 bits is not enough for the security usecase! I'm just saying that 160 or 256 bits would be.) Dag > >Stefan >___ >cython-devel mailing list >cython-devel@python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 6 June 2012 10:11, Dag Sverre Seljebotn wrote: > > > Stefan Behnel wrote: > >>mark florisson, 05.06.2012 22:33: >>> It doesn't even necessarily have to be about running user code, a >>user >>> could craft data input which causes such a situation. For instance, >>> let's say we have a just-in-time specializer which specializes a >>> function for the runtime input types, and the types depend on the >>user >>> input. For instance, if we write a web application we can post arrays >>> to described by a custom dtype, which draws pictures in some weird >>way >>> for us. We can get it to specialize pretty much any array type, so >>> that gives us a good opportunity to find collisions. >> >>Yes, and the bad thing is that a very high probability of having no >>collisions even in combination with the need for a huge amount of brute >>force work to find one is not enough. An attacker (or otherwise >>interested >>user) may just be lucky, and given how low in the application stack >>this >>will be used, such a bit of luck may have massive consequences. > > Following that line of argument, I guess you keep your money in a mattress > then? Our modern world is built around the assumption that people don't get > *that* lucky. > > (I agree though that 64 bits is not enough for the security usecase! I'm just > saying that 160 or 256 bits would be.) > > Dag > I think we're arguing different things. You agree to the security problem, but Stefan was still emphasizing his old point. >> >>Stefan >>___ >>cython-devel mailing list >>cython-devel@python.org >>http://mail.python.org/mailman/listinfo/cython-devel > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/06/2012 11:11 AM, Dag Sverre Seljebotn wrote: Stefan Behnel wrote: mark florisson, 05.06.2012 22:33: It doesn't even necessarily have to be about running user code, a user could craft data input which causes such a situation. For instance, let's say we have a just-in-time specializer which specializes a function for the runtime input types, and the types depend on the user input. For instance, if we write a web application we can post arrays to described by a custom dtype, which draws pictures in some weird way for us. We can get it to specialize pretty much any array type, so that gives us a good opportunity to find collisions. Yes, and the bad thing is that a very high probability of having no collisions even in combination with the need for a huge amount of brute force work to find one is not enough. An attacker (or otherwise interested user) may just be lucky, and given how low in the application stack this will be used, such a bit of luck may have massive consequences. Following that line of argument, I guess you keep your money in a mattress then? Our modern world is built around the assumption that people don't get *that* lucky. (I agree though that 64 bits is not enough for the security usecase! I'm just saying that 160 or 256 bits would be.) (And just to be clear, my current stance is in favour of using interning for the ID comparison, in the other head of this thread. I just couldn't resist Stefan's bait.) Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/05/2012 12:30 AM, Robert Bradshaw wrote: I just found http://cmph.sourceforge.net/ which looks quite interesting. Though the resulting hash functions are supposedly cheap, I have the feeling that branching is considered cheap in this context. Actually, this lead was *very* promising. I believe the very first reference I actually read through and didn't eliminate after the abstract totally swept away our home-grown solutions! "Hash & Displace" by Pagh (1999) is actually very simple, easy to understand, and fast both for generation and (the branch-free) lookup: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf The idea is: - Find a hash `g(x)` to partition the keys into `b` groups (the paper requires b > 2n, though I think in practice you can often get away with less) - Find a hash `f(x)` such that f is 1:1 within each group (which is easily achieved since groups only has a few elements) - For each group, from largest to smallest: Find a displacement `d[group]` so that `f(x) ^ d` doesn't cause collisions. It requires extra storage for the displacement table. However, I think 8 bits per element might suffice even for vtables of 512 or 1024 in size. Even with 16 bits it's rather negligible compared to the minimum-128-bit entries of the table. I benchmarked these hash functions: displace1: ((h >> r1) ^ d[h & 63]) & m1 displace2: ((h >> r1) ^ d[h & m2]) & m1 displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1 Only the third one is truly in the spirit of the algorithm, but I think the first two should work well too (and when h is known compile-time, looking up d[h & 63] isn't harder than looking up r1 or m1). My computer is acting up and all my numbers today are slower than the earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, and yes, I've pinned the CPU speed). But here's today's numbers, compiled with -DIMHASH: direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00 I did a dirty prototype of the table-finder as well and it works: https://github.com/dagss/hashvtable/blob/master/pagh99.py Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote: On 06/05/2012 12:30 AM, Robert Bradshaw wrote: I just found http://cmph.sourceforge.net/ which looks quite interesting. Though the resulting hash functions are supposedly cheap, I have the feeling that branching is considered cheap in this context. Actually, this lead was *very* promising. I believe the very first reference I actually read through and didn't eliminate after the abstract totally swept away our home-grown solutions! "Hash & Displace" by Pagh (1999) is actually very simple, easy to understand, and fast both for generation and (the branch-free) lookup: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf The idea is: - Find a hash `g(x)` to partition the keys into `b` groups (the paper requires b > 2n, though I think in practice you can often get away with less) - Find a hash `f(x)` such that f is 1:1 within each group (which is easily achieved since groups only has a few elements) - For each group, from largest to smallest: Find a displacement `d[group]` so that `f(x) ^ d` doesn't cause collisions. It requires extra storage for the displacement table. However, I think 8 bits per element might suffice even for vtables of 512 or 1024 in size. Even with 16 bits it's rather negligible compared to the minimum-128-bit entries of the table. I benchmarked these hash functions: displace1: ((h >> r1) ^ d[h & 63]) & m1 displace2: ((h >> r1) ^ d[h & m2]) & m1 displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1 Only the third one is truly in the spirit of the algorithm, but I think the first two should work well too (and when h is known compile-time, looking up d[h & 63] isn't harder than looking up r1 or m1). My computer is acting up and all my numbers today are slower than the earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, and yes, I've pinned the CPU speed). But here's today's numbers, compiled with -DIMHASH: direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00 I did a dirty prototype of the table-finder as well and it works: https://github.com/dagss/hashvtable/blob/master/pagh99.py The paper obviously puts more effort on minimizing table size and not a fast lookup. My hunch is that our choice should be ((h >> table.r) ^ table.d[h & m2]) & m1 and use 8-bits d (because even if you have 1024 methods, you'd rather double the number of bins than those 2 extra bits available for displacement options). Then keep incrementing the size of d and the number of table slots (in such an order that the total vtable size is minimized) until success. In practice this should almost always just increase the size of d, and keep the table size at the lowest 2**k that fits the slots (even for 64 methods or 128 methods :-)) Essentially we avoid the shift in the argument to d[] by making d larger. Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn wrote: > On 06/05/2012 10:50 PM, Robert Bradshaw wrote: >> >> On Tue, Jun 5, 2012 at 1:10 PM, Dag Sverre Seljebotn >> wrote: >>> >>> On 06/04/2012 11:43 PM, Robert Bradshaw wrote: On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn wrote: > > > On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote: >> >> >> >> Me and Robert had a long discussion on the NumFOCUS list about this >> already, but I figured it was better to continue it and provide more >> in-depth benchmark results here. >> >> It's basically a new idea of how to provide a vtable based on perfect >> hashing, which should be a lot simpler to implement than what I first >> imagined. >> >> I'll write down some context first, if you're familiar with this >> skip ahead a bit.. >> >> This means that you can do fast dispatches *without* the messy >> business of binding vtable slots at compile time. To be concrete, this >> might e.g. take the form >> >> def f(obj): >> obj.method(3.4) # try to find a vtable with "void method(double)" in >> it >> >> or, a more typed approach, >> >> # File A >> cdef class MyImpl: >> def double method(double x): return x * x >> >> # File B >> # Here we never know about MyImpl, hence "duck-typed" >> @cython.interface >> class MyIntf: >> def double method(double x): pass >> >> def f(MyIntf obj): >> # obj *can* be MyImpl instance, or whatever else that supports >> # that interface >> obj.method(3.4) >> >> >> Now, the idea to implement this is: >> >> a) Both caller and callee pre-hash name/argument string >> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of >> md5) >> >> b) Callee (MyImpl) generates a vtable of its methods by *perfect* >> hashing. What you do is define a final hash fh as a function >> of the pre-hash ph, for instance >> >> fh = ((ph>> vtable.r1) ^ (ph>> vtable.r2) ^ (ph>> >> vtable.r3))& >> vtable.m >> >> (Me and Robert are benchmarking different functions to use here.) By >> playing with r1, r2, r3, you have 64**3 choices of hash function, and >> will be able to pick a combination which gives *no* (or very few) >> collisions. >> >> c) Caller then combines the pre-hash generated at compile-time, with >> r1, r2, r3, m stored in the vtable header, in order to find the >> final location in the hash-table. >> >> The exciting thing is that in benchmark, the performance penalty is >> actually very slight over a C++-style v-table. (Of course you can >> cache a proper vtable, but the fact that you get so close without >> caring about caching means that this can be done much faster.) One advantage about caching a vtable is that one can possibly put in adapters for non-exact matches. It also opens up the possibility of putting in stubs to call def methods if they exist. This needs to be fleshed out more, (another CEP :) but could provide for a backwards-compatible easy first implementation. >> Back to my and Robert's discussion on benchmarks: >> >> I've uploaded benchmarks here: >> >> https://github.com/dagss/hashvtable/tree/master/dispatchbench >> >> I've changed the benchmark taking to give more robust numbers (at >> least for me), you want to look at the 'min' column. >> >> I changed the benchmark a bit so that it benchmarks a *callsite*. >> So we don't pass 'h' on the stack, but either a) looks it up in a >> global >> variable (default), or b) it's a compile-time constant (immediate in >> assembly) (compile with -DIMHASH). >> >> Similarly, the ID is either an "interned" global variable, or an >> immediate (-DIMID). >> >> The results are very different on my machine depending on this aspect. >> My conclusions: >> >> - Both three shifts with masking, two shifts with a "fallback slot" >> (allowing for a single collision), three shifts, two shifts with >> two masks allows for constructing good vtables. In the case of only >> two shifts, one colliding method gets the twoshift+fback >> performance and the rest gets the twoshift performance. >> >> - Performance is really more affected by whether hashes are >> immediates or global variables than the hash function. This is in >> contrast to the interning vs. key benchmarks -- so I think that if >> we looked up the vtable through PyTypeObject, rather than getting >> the vtable directly, the loads of the global variables could >> potentially be masked by that. >> >> - My conclusion: Just use lower bits of md5 *both* for the hashing >> and the ID-ing (don't bother with any interning), and compile the >> thing as a
Re: [Cython] Hash-based vtables
On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn wrote: > On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote: >> >> On 06/05/2012 12:30 AM, Robert Bradshaw wrote: >>> >>> I just found http://cmph.sourceforge.net/ which looks quite >>> interesting. Though the resulting hash functions are supposedly cheap, >>> I have the feeling that branching is considered cheap in this context. >> >> >> Actually, this lead was *very* promising. I believe the very first >> reference I actually read through and didn't eliminate after the >> abstract totally swept away our home-grown solutions! >> >> "Hash & Displace" by Pagh (1999) is actually very simple, easy to >> understand, and fast both for generation and (the branch-free) lookup: >> >> >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf >> >> >> The idea is: >> >> - Find a hash `g(x)` to partition the keys into `b` groups (the paper >> requires b > 2n, though I think in practice you can often get away with >> less) >> >> - Find a hash `f(x)` such that f is 1:1 within each group (which is >> easily achieved since groups only has a few elements) >> >> - For each group, from largest to smallest: Find a displacement >> `d[group]` so that `f(x) ^ d` doesn't cause collisions. >> >> It requires extra storage for the displacement table. However, I think 8 >> bits per element might suffice even for vtables of 512 or 1024 in size. >> Even with 16 bits it's rather negligible compared to the minimum-128-bit >> entries of the table. >> >> I benchmarked these hash functions: >> >> displace1: ((h >> r1) ^ d[h & 63]) & m1 >> displace2: ((h >> r1) ^ d[h & m2]) & m1 >> displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1 >> >> Only the third one is truly in the spirit of the algorithm, but I think >> the first two should work well too (and when h is known compile-time, >> looking up d[h & 63] isn't harder than looking up r1 or m1). >> >> My computer is acting up and all my numbers today are slower than the >> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, >> and yes, I've pinned the CPU speed). But here's today's numbers, >> compiled with -DIMHASH: >> >> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00 >> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00 >> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00 >> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00 >> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00 >> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00 >> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00 >> >> >> I did a dirty prototype of the table-finder as well and it works: >> >> https://github.com/dagss/hashvtable/blob/master/pagh99.py > > > The paper obviously puts more effort on minimizing table size and not a fast > lookup. My hunch is that our choice should be > > ((h >> table.r) ^ table.d[h & m2]) & m1 > > and use 8-bits d (because even if you have 1024 methods, you'd rather double > the number of bins than those 2 extra bits available for displacement > options). > > Then keep incrementing the size of d and the number of table slots (in such > an order that the total vtable size is minimized) until success. In practice > this should almost always just increase the size of d, and keep the table > size at the lowest 2**k that fits the slots (even for 64 methods or 128 > methods :-)) > > Essentially we avoid the shift in the argument to d[] by making d larger. Nice. I'm surprised that the indirection on d doesn't cost us much; hopefully its size wouldn't be a big issue either. What kinds of densities were you achieving? Going back to the idea of linear probing on a cache miss, this has the advantage that one can write a brain-dead provider that sets m=0 and simply lists the methods instead of requiring a table optimizer. (Most tools, of course, would do the table optimization.) It also lets you get away with a "kind-of good" hash rather than requiring you search until you find a (larger?) perfect one. - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/06/2012 11:16 PM, Robert Bradshaw wrote: On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn wrote: On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote: On 06/05/2012 12:30 AM, Robert Bradshaw wrote: I just found http://cmph.sourceforge.net/ which looks quite interesting. Though the resulting hash functions are supposedly cheap, I have the feeling that branching is considered cheap in this context. Actually, this lead was *very* promising. I believe the very first reference I actually read through and didn't eliminate after the abstract totally swept away our home-grown solutions! "Hash& Displace" by Pagh (1999) is actually very simple, easy to understand, and fast both for generation and (the branch-free) lookup: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf The idea is: - Find a hash `g(x)` to partition the keys into `b` groups (the paper requires b> 2n, though I think in practice you can often get away with less) - Find a hash `f(x)` such that f is 1:1 within each group (which is easily achieved since groups only has a few elements) - For each group, from largest to smallest: Find a displacement `d[group]` so that `f(x) ^ d` doesn't cause collisions. It requires extra storage for the displacement table. However, I think 8 bits per element might suffice even for vtables of 512 or 1024 in size. Even with 16 bits it's rather negligible compared to the minimum-128-bit entries of the table. I benchmarked these hash functions: displace1: ((h>> r1) ^ d[h& 63])& m1 displace2: ((h>> r1) ^ d[h& m2])& m1 displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1 Only the third one is truly in the spirit of the algorithm, but I think the first two should work well too (and when h is known compile-time, looking up d[h& 63] isn't harder than looking up r1 or m1). My computer is acting up and all my numbers today are slower than the earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, and yes, I've pinned the CPU speed). But here's today's numbers, compiled with -DIMHASH: direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00 I did a dirty prototype of the table-finder as well and it works: https://github.com/dagss/hashvtable/blob/master/pagh99.py The paper obviously puts more effort on minimizing table size and not a fast lookup. My hunch is that our choice should be ((h>> table.r) ^ table.d[h& m2])& m1 and use 8-bits d (because even if you have 1024 methods, you'd rather double the number of bins than those 2 extra bits available for displacement options). Then keep incrementing the size of d and the number of table slots (in such an order that the total vtable size is minimized) until success. In practice this should almost always just increase the size of d, and keep the table size at the lowest 2**k that fits the slots (even for 64 methods or 128 methods :-)) Essentially we avoid the shift in the argument to d[] by making d larger. Nice. I'm surprised that the indirection on d doesn't cost us much; Well, table->d[const & const] compiles down to the same kind of code as table->m1. I guess I'm surprised too that displace2 doesn't penalize. hopefully its size wouldn't be a big issue either. What kinds of densities were you achieving? The algorithm is designed for 100% density in the table itself. (We can lift that to compensate for a small space of possible hash functions I guess.) I haven't done proper simulations yet, but I just tried |vtable|=128, |d|=128 from the command line and I had 15 successes or so before the first failure. That's with a 100% density in the vtable itself! (And when it fails, you increase |d| to get your success). The caveat is the space spent on d (it's small in comparison, but that's why this isn't too good to be true). A disadvantage might be that we may no longer have the opportunity to not make the table size a power of two (i.e. replace the mask with "if (likely(slot < n))"). I think for that to work one would need to replace the xor group with addition on Z_d. Going back to the idea of linear probing on a cache miss, this has the advantage that one can write a brain-dead provider that sets m=0 and simply lists the methods instead of requiring a table optimizer. (Most tools, of course, would do the table optimization.) It also lets you get away with a "kind-of good" hash rather than requiring you search until you find a (larger?) perfect one. Well, given that we can have 100% density, and generating the table is ligh
Re: [Cython] Hash-based vtables
Dag Sverre Seljebotn wrote: >On 06/06/2012 11:16 PM, Robert Bradshaw wrote: >> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn >> wrote: >>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote: On 06/05/2012 12:30 AM, Robert Bradshaw wrote: > > I just found http://cmph.sourceforge.net/ which looks quite > interesting. Though the resulting hash functions are supposedly >cheap, > I have the feeling that branching is considered cheap in this >context. Actually, this lead was *very* promising. I believe the very first reference I actually read through and didn't eliminate after the abstract totally swept away our home-grown solutions! "Hash& Displace" by Pagh (1999) is actually very simple, easy to understand, and fast both for generation and (the branch-free) >lookup: >http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf The idea is: - Find a hash `g(x)` to partition the keys into `b` groups (the >paper requires b> 2n, though I think in practice you can often get away >with less) - Find a hash `f(x)` such that f is 1:1 within each group (which is easily achieved since groups only has a few elements) - For each group, from largest to smallest: Find a displacement `d[group]` so that `f(x) ^ d` doesn't cause collisions. It requires extra storage for the displacement table. However, I >think 8 bits per element might suffice even for vtables of 512 or 1024 in >size. Even with 16 bits it's rather negligible compared to the >minimum-128-bit entries of the table. I benchmarked these hash functions: displace1: ((h>> r1) ^ d[h& 63])& m1 displace2: ((h>> r1) ^ d[h& m2])& m1 displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1 Only the third one is truly in the spirit of the algorithm, but I >think the first two should work well too (and when h is known >compile-time, looking up d[h& 63] isn't harder than looking up r1 or m1). My computer is acting up and all my numbers today are slower than >the earlier ones (yes, I've disabled turbo-mode in the BIOS for a year >ago, and yes, I've pinned the CPU speed). But here's today's numbers, compiled with -DIMHASH: direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 >val=24.00 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 >val=18.00 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 >val=18.00 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 >val=18.00 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 >val=18.00 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 >val=18.00 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 >val=18.00 I did a dirty prototype of the table-finder as well and it works: https://github.com/dagss/hashvtable/blob/master/pagh99.py >>> >>> >>> The paper obviously puts more effort on minimizing table size and >not a fast >>> lookup. My hunch is that our choice should be >>> >>> ((h>> table.r) ^ table.d[h& m2])& m1 >>> >>> and use 8-bits d (because even if you have 1024 methods, you'd >rather double >>> the number of bins than those 2 extra bits available for >displacement >>> options). >>> >>> Then keep incrementing the size of d and the number of table slots >(in such >>> an order that the total vtable size is minimized) until success. In >practice >>> this should almost always just increase the size of d, and keep the >table >>> size at the lowest 2**k that fits the slots (even for 64 methods or >128 >>> methods :-)) >>> >>> Essentially we avoid the shift in the argument to d[] by making d >larger. >> >> Nice. I'm surprised that the indirection on d doesn't cost us much; > >Well, table->d[const & const] compiles down to the same kind of code as > >table->m1. I guess I'm surprised too that displace2 doesn't penalize. > >> hopefully its size wouldn't be a big issue either. What kinds of >> densities were you achieving? > >The algorithm is designed for 100% density in the table itself. (We can > >lift that to compensate for a small space of possible hash functions I >guess.) > >I haven't done proper simulations yet, but I just tried |vtable|=128, >|d|=128 from the command line and I had 15 successes or so before the >first failure. That's with a 100% density in the vtable itself! (And >when it fails, you increase |d| to get your success). > >The caveat is the space spent on d (it's small in comparison, but >that's >why this isn't too good to be true). > >A disadvantage might be that we may no longer have the opportunity to >not make the table size a power of two (i.e. replace the mask with "if >(likely(slot < n))"). I think for that to work one would need to >replace >the xor group with addition on Z_d. Strike this p
Re: [Cython] Hash-based vtables
On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn wrote: > On 06/06/2012 11:16 PM, Robert Bradshaw wrote: >> >> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn >> wrote: >>> >>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote: On 06/05/2012 12:30 AM, Robert Bradshaw wrote: > > > I just found http://cmph.sourceforge.net/ which looks quite > interesting. Though the resulting hash functions are supposedly cheap, > I have the feeling that branching is considered cheap in this context. Actually, this lead was *very* promising. I believe the very first reference I actually read through and didn't eliminate after the abstract totally swept away our home-grown solutions! "Hash& Displace" by Pagh (1999) is actually very simple, easy to understand, and fast both for generation and (the branch-free) lookup: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf The idea is: - Find a hash `g(x)` to partition the keys into `b` groups (the paper requires b> 2n, though I think in practice you can often get away with less) - Find a hash `f(x)` such that f is 1:1 within each group (which is easily achieved since groups only has a few elements) - For each group, from largest to smallest: Find a displacement `d[group]` so that `f(x) ^ d` doesn't cause collisions. It requires extra storage for the displacement table. However, I think 8 bits per element might suffice even for vtables of 512 or 1024 in size. Even with 16 bits it's rather negligible compared to the minimum-128-bit entries of the table. I benchmarked these hash functions: displace1: ((h>> r1) ^ d[h& 63])& m1 displace2: ((h>> r1) ^ d[h& m2])& m1 displace3: ((h>> r1) ^ d[(h>> r2)& m2])& m1 Only the third one is truly in the spirit of the algorithm, but I think the first two should work well too (and when h is known compile-time, looking up d[h& 63] isn't harder than looking up r1 or m1). My computer is acting up and all my numbers today are slower than the earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, and yes, I've pinned the CPU speed). But here's today's numbers, compiled with -DIMHASH: direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00 I did a dirty prototype of the table-finder as well and it works: https://github.com/dagss/hashvtable/blob/master/pagh99.py >>> >>> >>> >>> The paper obviously puts more effort on minimizing table size and not a >>> fast >>> lookup. My hunch is that our choice should be >>> >>> ((h>> table.r) ^ table.d[h& m2])& m1 >>> >>> >>> and use 8-bits d (because even if you have 1024 methods, you'd rather >>> double >>> the number of bins than those 2 extra bits available for displacement >>> options). >>> >>> Then keep incrementing the size of d and the number of table slots (in >>> such >>> an order that the total vtable size is minimized) until success. In >>> practice >>> this should almost always just increase the size of d, and keep the table >>> size at the lowest 2**k that fits the slots (even for 64 methods or 128 >>> methods :-)) >>> >>> Essentially we avoid the shift in the argument to d[] by making d larger. >> >> >> Nice. I'm surprised that the indirection on d doesn't cost us much; > > > Well, table->d[const & const] compiles down to the same kind of code as > table->m1. I guess I'm surprised too that displace2 doesn't penalize. > > >> hopefully its size wouldn't be a big issue either. What kinds of >> densities were you achieving? > > > The algorithm is designed for 100% density in the table itself. (We can lift > that to compensate for a small space of possible hash functions I guess.) > > I haven't done proper simulations yet, but I just tried |vtable|=128, > |d|=128 from the command line and I had 15 successes or so before the first > failure. That's with a 100% density in the vtable itself! (And when it > fails, you increase |d| to get your success). > > The caveat is the space spent on d (it's small in comparison, but that's why > this isn't too good to be true). > > A disadvantage might be that we may no longer have the opportunity to not > make the table size a power of two (i.e. replace the mask with "if > (likely(slot < n))"). I think for that to w