Re: [Cython] Hash-based vtables
On 06/10/2012 11:53 AM, Robert Bradshaw wrote: On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn About signatures, a problem I see with following the C typing is that the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq" on 32-bit Linux, and so on. I think that would be really bad. This is why I suggested promotion for scalars (divide ints into <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at C compile time, though I guess you consider that evil. I don't consider not matching really bad, just kind of bad. Right. At least a convention for promotion of scalars would be good anyway. Even MSVC supports stdint.h these days; basing ourselves on the random behaviour of "long" seems a bit outdated to me. "ssize_t" would be better motivated I feel. Many linear algebra libraries use 32-bit matrix indices by default, but can be swapped to 64-bit indices (this holds for many LAPACK implementations and most sparse linear algebra). So often there will at least be one typedef that is either 32 bits or 64 bits without the Cython compiler knowing. Promoting to a single type "[u]int64" is the only one that removes possible combinatorial explosion if you have multiple external typedefs that you don't know the size of (although I guess that's rather theoretical). Anyway, runtime table generation is quite fast, see below. "l" must be banished -- but then one might as well do "i4i8i8". Designing a signature hash where you select between these at compile-time is perhaps doable but does generate a lot of code and makes everything complicated. It especially gets messy when you're trying to pre-compute tables. I think we should just start off with hashing at module load time when sizes are known, and then work with heuristics and/or build system integration to improve on that afterwards. Finding 10,000 optimal tables at runtime better be really cheap than for Sage's sake :). The code is highly unpolished as I write this, but it works so here's some preliminary benchmarks. Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot table varies between 5 and 10 us (microseconds) depending on the set of hashes. Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns per hash, so 26 us for the 64-slot table => it dominates! The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. Admittedly, that's with hand-written non-portable assembly for the crapwow64. Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 seconds for loading Sage using md5, or 0.1 seconds using crapwow64. https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote: On 06/10/2012 11:53 AM, Robert Bradshaw wrote: On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn About signatures, a problem I see with following the C typing is that the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq" on 32-bit Linux, and so on. I think that would be really bad. This is why I suggested promotion for scalars (divide ints into <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at C compile time, though I guess you consider that evil. I don't consider not matching really bad, just kind of bad. Right. At least a convention for promotion of scalars would be good anyway. Even MSVC supports stdint.h these days; basing ourselves on the random behaviour of "long" seems a bit outdated to me. "ssize_t" would be better motivated I feel. Many linear algebra libraries use 32-bit matrix indices by default, but can be swapped to 64-bit indices (this holds for many LAPACK implementations and most sparse linear algebra). So often there will at least be one typedef that is either 32 bits or 64 bits without the Cython compiler knowing. Promoting to a single type "[u]int64" is the only one that removes possible combinatorial explosion if you have multiple external typedefs that you don't know the size of (although I guess that's rather theoretical). Anyway, runtime table generation is quite fast, see below. "l" must be banished -- but then one might as well do "i4i8i8". Designing a signature hash where you select between these at compile-time is perhaps doable but does generate a lot of code and makes everything complicated. It especially gets messy when you're trying to pre-compute tables. I think we should just start off with hashing at module load time when sizes are known, and then work with heuristics and/or build system integration to improve on that afterwards. Finding 10,000 optimal tables at runtime better be really cheap than for Sage's sake :). The code is highly unpolished as I write this, but it works so here's some preliminary benchmarks. Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot table varies between 5 and 10 us (microseconds) depending on the set of hashes. Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns per hash, so 26 us for the 64-slot table => it dominates! The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. Admittedly, that's with hand-written non-portable assembly for the crapwow64. Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 seconds for loading Sage using md5, or 0.1 seconds using crapwow64. https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html Look: A big advantage of the hash-vtables is that subclasses stay ABI-compatible with superclasses, and don't need recompilation when superclasses adds or removes methods. => Finding the hash table must happen at run-time in a lot of cases anyway, so I feel Robert's chase for a compile-time table building is moot. I feel this would also need to trigger automatically heap-allocated tables if the statically allocated. Which is good to have in the very few cases where a perfect table can't be found too. One thing is that, which makes me feel uneasy about the relatively unexplored crapwow64 is that we really don't want collisions in the 64-bit prehashes within a single table (which would raise an exception -- which I think is OK from a security perspective, you can always have a MemoryError at any point too, so programmers should not expose class creation to attackers without being able to deal with it failing). For the record, I found another md5 implementation that's a bit faster; first one is "sphlib" and second is "ulib": In [2]: %timeit extensibletype.extensibletype.md5bench2(10**3) 1000 loops, best of 3: 237 us per loop In [3]: %timeit extensibletype.extensibletype.md5bench(10**3) 1000 loops, best of 3: 374 us per loop http://www.saphir2.com/sphlib/ It's really only for extremely large projects like Sage where this can be noticed in any way. Dag ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn wrote: > On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote: >> >> On 06/10/2012 11:53 AM, Robert Bradshaw wrote: >>> >>> On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn About signatures, a problem I see with following the C typing is that the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq" on 32-bit Linux, and so on. I think that would be really bad. >>> >>> >>> This is why I suggested promotion for scalars (divide ints into >>> <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at >>> C compile time, though I guess you consider that evil. I don't >>> consider not matching really bad, just kind of bad. >> >> >> Right. At least a convention for promotion of scalars would be good >> anyway. >> >> Even MSVC supports stdint.h these days; basing ourselves on the random >> behaviour of "long" seems a bit outdated to me. "ssize_t" would be >> better motivated I feel. >> >> Many linear algebra libraries use 32-bit matrix indices by default, but >> can be swapped to 64-bit indices (this holds for many LAPACK >> implementations and most sparse linear algebra). So often there will at >> least be one typedef that is either 32 bits or 64 bits without the >> Cython compiler knowing. >> >> Promoting to a single type "[u]int64" is the only one that removes >> possible combinatorial explosion if you have multiple external typedefs >> that you don't know the size of (although I guess that's rather >> theoretical). >> >> Anyway, runtime table generation is quite fast, see below. >> >>> "l" must be banished -- but then one might as well do "i4i8i8". Designing a signature hash where you select between these at compile-time is perhaps doable but does generate a lot of code and makes everything complicated. >>> >>> >>> It especially gets messy when you're trying to pre-compute tables. >>> I think we should just start off with hashing at module load time when sizes are known, and then work with heuristics and/or build system integration to improve on that afterwards. >>> >>> >>> Finding 10,000 optimal tables at runtime better be really cheap than >>> for Sage's sake :). >> >> >> The code is highly unpolished as I write this, but it works so here's >> some preliminary benchmarks. >> >> Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot >> table varies between 5 and 10 us (microseconds) depending on the set of >> hashes. >> >> Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns >> per hash, so 26 us for the 64-slot table => it dominates! >> >> The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. >> Admittedly, that's with hand-written non-portable assembly for the >> crapwow64. >> >> Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 >> seconds for loading Sage using md5, or 0.1 seconds using crapwow64. >> >> >> https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h >> >> http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html > > > Look: A big advantage of the hash-vtables is that subclasses stay > ABI-compatible with superclasses, and don't need recompilation when > superclasses adds or removes methods. > > => Finding the hash table must happen at run-time in a lot of cases anyway, > so I feel Robert's chase for a compile-time table building is moot. > > I feel this would also need to trigger automatically heap-allocated tables > if the statically allocated. Which is good to have in the very few cases > where a perfect table can't be found too. Finding the hash table at runtime should be supported, but the *vast* majority of methods sets is known at compile time. 0.4 seconds is a huge overhead to just add to Sage (yes, it's an exception, but an important one), and though crapwow64 helps I'd rather rely on a known, good standard hash. I need to actually look at Sage to see what the impact would be. Also, most tables would probably have 2 entries in them (e.g. a typed one and an all-object one). long int will continue to be an important type as long as it's the default for int literals and Python's "fast" ints (whether in type or implementation), so we can't just move to stdint. I also don't like that the form of the table (and whether certain signatures match) being platform-dependent: the less variance we have from one platform to the next is better. On an orthogonal note, sizeof(long)-sensitive tables need not be entirely at odds with compile-time table compilation, as most functions will probably have 0 or 1 parameters that are of unknown size, so we could spit out 1 or 2 statically compiled tables and do generate the rest on the fly. I still would rather have fixed Cython-compile time tables though. > One thing is that, which makes me feel uneasy about the relatively > unexplored crapwow64 is that we really don't want collisions in the 64-bi
[Cython] "__pyx_dynamic_args" undeclared in fused types code
Hi, after the merge of the "_fused_dispatch_rebased" branch, I get C compile errors in a simple fused types example: """ from cython cimport integral # define a fused type for different containers ctypedef fused container: list tuple object # define a generic function using the above types cpdef sum(container items, integral start = 0): cdef integral item, result result = start for item in items: result += item return result def test(): cdef int x = 1, y = 2 # call [list,int] specialisation implicitly print( sum([1,2,3,4], x) ) # calls [object,long] specialisation explicitly print( sum[object,long]([1,2,3,4], y) ) """ The C compiler complains that "__pyx_dynamic_args" is undeclared - supposedly something should have been passed into the function but wasn't. Mark, could you take a look? Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Hash-based vtables
On 06/12/2012 08:12 PM, Robert Bradshaw wrote: On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn wrote: On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote: On 06/10/2012 11:53 AM, Robert Bradshaw wrote: On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn About signatures, a problem I see with following the C typing is that the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq" on 32-bit Linux, and so on. I think that would be really bad. This is why I suggested promotion for scalars (divide ints into <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at C compile time, though I guess you consider that evil. I don't consider not matching really bad, just kind of bad. Right. At least a convention for promotion of scalars would be good anyway. Even MSVC supports stdint.h these days; basing ourselves on the random behaviour of "long" seems a bit outdated to me. "ssize_t" would be better motivated I feel. Many linear algebra libraries use 32-bit matrix indices by default, but can be swapped to 64-bit indices (this holds for many LAPACK implementations and most sparse linear algebra). So often there will at least be one typedef that is either 32 bits or 64 bits without the Cython compiler knowing. Promoting to a single type "[u]int64" is the only one that removes possible combinatorial explosion if you have multiple external typedefs that you don't know the size of (although I guess that's rather theoretical). Anyway, runtime table generation is quite fast, see below. "l" must be banished -- but then one might as well do "i4i8i8". Designing a signature hash where you select between these at compile-time is perhaps doable but does generate a lot of code and makes everything complicated. It especially gets messy when you're trying to pre-compute tables. I think we should just start off with hashing at module load time when sizes are known, and then work with heuristics and/or build system integration to improve on that afterwards. Finding 10,000 optimal tables at runtime better be really cheap than for Sage's sake :). The code is highly unpolished as I write this, but it works so here's some preliminary benchmarks. Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot table varies between 5 and 10 us (microseconds) depending on the set of hashes. Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns per hash, so 26 us for the 64-slot table => it dominates! The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. Admittedly, that's with hand-written non-portable assembly for the crapwow64. Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 seconds for loading Sage using md5, or 0.1 seconds using crapwow64. https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html Look: A big advantage of the hash-vtables is that subclasses stay ABI-compatible with superclasses, and don't need recompilation when superclasses adds or removes methods. => Finding the hash table must happen at run-time in a lot of cases anyway, so I feel Robert's chase for a compile-time table building is moot. I feel this would also need to trigger automatically heap-allocated tables if the statically allocated. Which is good to have in the very few cases where a perfect table can't be found too. Finding the hash table at runtime should be supported, but the *vast* majority of methods sets is known at compile time. 0.4 seconds is a huge overhead to just add to Sage (yes, it's an exception, but an important one), and though crapwow64 helps I'd rather rely on a known, good standard hash. I need to actually look at Sage to see what the impact would be. Also, most tables would probably have 2 entries in them (e.g. a typed one and an all-object one). Hopefully 0.4 was a severe overestimate once one actually looks at this. What's loaded at startup -- is it the pyx files in sage/devel/sage? My count (just cloned from github.com/sagemath/sage): $ find -name '*.pyx' -exec grep 'cdef class' {} \; | wc -l 641 And I doubt that *all* of that is loaded at Sage startup, you need to do some manual importing for at least some of those classes? So it's probably closer to 0.01-0.02 seconds than 0.4 even with md5? About the *vast* majority of method sets being known: That may be the case for old code, but keep in mind that that situation might deteriorate. Once we have hash-based vtables, declaring methods of cdef classes in pxd files could become optional (and only be there to help callers, incl. subclasses, determine the signature). So any method that's only used in the superclass and is therefore not declared in the pxd file would consistently trigger a run-time build of the table of subclasses; the compile-time generated table would be useless then. (OTOH, as duck-typing becomes the norm, more cdef classes
Re: [Cython] Hash-based vtables
On 06/12/2012 09:46 PM, Dag Sverre Seljebotn wrote: On 06/12/2012 08:12 PM, Robert Bradshaw wrote: On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn wrote: On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote: On 06/10/2012 11:53 AM, Robert Bradshaw wrote: On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn About signatures, a problem I see with following the C typing is that the signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and "iqq" on 32-bit Linux, and so on. I think that would be really bad. This is why I suggested promotion for scalars (divide ints into <=sizeof(long) and sizeof(long)< x<= sizeof(long long)), checked at C compile time, though I guess you consider that evil. I don't consider not matching really bad, just kind of bad. Right. At least a convention for promotion of scalars would be good anyway. Even MSVC supports stdint.h these days; basing ourselves on the random behaviour of "long" seems a bit outdated to me. "ssize_t" would be better motivated I feel. Many linear algebra libraries use 32-bit matrix indices by default, but can be swapped to 64-bit indices (this holds for many LAPACK implementations and most sparse linear algebra). So often there will at least be one typedef that is either 32 bits or 64 bits without the Cython compiler knowing. Promoting to a single type "[u]int64" is the only one that removes possible combinatorial explosion if you have multiple external typedefs that you don't know the size of (although I guess that's rather theoretical). Anyway, runtime table generation is quite fast, see below. "l" must be banished -- but then one might as well do "i4i8i8". Designing a signature hash where you select between these at compile-time is perhaps doable but does generate a lot of code and makes everything complicated. It especially gets messy when you're trying to pre-compute tables. I think we should just start off with hashing at module load time when sizes are known, and then work with heuristics and/or build system integration to improve on that afterwards. Finding 10,000 optimal tables at runtime better be really cheap than for Sage's sake :). The code is highly unpolished as I write this, but it works so here's some preliminary benchmarks. Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot table varies between 5 and 10 us (microseconds) depending on the set of hashes. Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns per hash, so 26 us for the 64-slot table => it dominates! The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table. Admittedly, that's with hand-written non-portable assembly for the crapwow64. Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4 seconds for loading Sage using md5, or 0.1 seconds using crapwow64. https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html Look: A big advantage of the hash-vtables is that subclasses stay ABI-compatible with superclasses, and don't need recompilation when superclasses adds or removes methods. => Finding the hash table must happen at run-time in a lot of cases anyway, so I feel Robert's chase for a compile-time table building is moot. I feel this would also need to trigger automatically heap-allocated tables if the statically allocated. Which is good to have in the very few cases where a perfect table can't be found too. Finding the hash table at runtime should be supported, but the *vast* majority of methods sets is known at compile time. 0.4 seconds is a huge overhead to just add to Sage (yes, it's an exception, but an important one), and though crapwow64 helps I'd rather rely on a known, good standard hash. I need to actually look at Sage to see what the impact would be. Also, most tables would probably have 2 entries in them (e.g. a typed one and an all-object one). Hopefully 0.4 was a severe overestimate once one actually looks at this. What's loaded at startup -- is it the pyx files in sage/devel/sage? My count (just cloned from github.com/sagemath/sage): $ find -name '*.pyx' -exec grep 'cdef class' {} \; | wc -l 641 And I doubt that *all* of that is loaded at Sage startup, you need to do some manual importing for at least some of those classes? So it's probably closer to 0.01-0.02 seconds than 0.4 even with md5? About the *vast* majority of method sets being known: That may be the case for old code, but keep in mind that that situation might deteriorate. Once we have hash-based vtables, declaring methods of cdef classes in pxd files could become optional (and only be there to help callers, incl. subclasses, determine the signature). So any method that's only used in the superclass and is therefore not declared in the pxd file would consistently trigger a run-time build of the table of subclasses; the compile-time generated table would be useless then. (OTOH, as duck-typing