On Fri, Apr 13, 2012 at 5:48 AM, mark florisson <markflorisso...@gmail.com> wrote: > On 13 April 2012 12:59, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> > wrote: >> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>> >>> Robert Bradshaw, 13.04.2012 12:17: >>>> >>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>>> >>>>>> Have you given any thought as to what happens if __call__ is >>>>>> re-assigned for an object (or subclass of an object) supporting this >>>>>> interface? Or is this out of scope? >>>>> >>>>> >>>>> Out-of-scope, I'd say. Though you can always write an object that >>>>> detects if >>>>> you assign to __call__... >>> >>> >>> +1 for out of scope. This is a pure C level feature. >>> >>> >>>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>>> one wants to save the allocation one can still use a variable-sized >>>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>>> memory is already likely close and it greatly simplifies the logic. >>>>>> But I could be wrong here. >>>>> >>>>> >>>>> >>>>> Those minor nits are exactly what I seek; since Travis will have the >>>>> first >>>>> implementation in numba<->SciPy, I just want to make sure that what he >>>>> does >>>>> will work efficiently work Cython. >>>> >>>> >>>> +1 >>>> >>>> I have to admit building/invoking these var-arg-sized __nativecall__ >>>> records seems painful. Here's another suggestion: >>>> >>>> struct { >>>> void* pointer; >>>> size_t signature; // compressed binary representation, 95% coverage >> >> >> Once you start passing around functions that take memory view slices as >> arguments, that 95% estimate will be off I think. >> > > It kind of depends on which arguments types and how many arguments you > will allow, and whether or not collisions would be fine (which would > imply ID comparison + strcmp()).
Interesting idea, though this has the drawback of doubling (at least) the overhead of the simple (important) case as well as memory requirements/locality issues. >>>> char* long_signature; // used if signature is not representable in >>>> a size_t, as indicated by signature = 0 >>>> } record; >>>> >>>> These char* could optionally be allocated at the end of the record* >>>> for optimal locality. We could even dispense with the binary >>>> signature, but having that option allows us to avoid strcmp for stuff >>>> like d)d and ffi)f. >>> >>> >>> Assuming we use literals and a const char* for the signature, the C >>> compiler would cut down the number of signature strings automatically for >>> us. And a pointer comparison is the same as a size_t comparison. >> >> >> I'll go one further: Intern Python bytes objects. It's just a PyObject*, but >> it's *required* (or just strongly encouraged) to have gone through >> >> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) >> >> Obviously in a PEP you'd have a C-API function for such interning >> (completely standalone utility). Performance of interning operation itself >> doesn't matter... >> >> Unless CPython has interning features itself, like in Java? Was that present >> back in the day and then ripped out? >> >> Requiring interning is somewhat less elegant in one way, but it makes a lot >> of other stuff much simpler. >> >> That gives us >> >> struct { >> void *pointer; >> PyBytesObject *signature; >> } record; >> >> and then you allocate a NULL-terminated arrays of these for all the >> overloads. >> > > Interesting. What I like about size_t it that it could define a > deterministic ordering, which means specializations could be stored in > a binary search tree in array form. I think the number of specializations would have to be quite large (>10, maybe 100) before a binary search wins out over a simple scan, but if we stored a count rather than did a null-terminated array teh lookup function could take this into account. (The header will already have plenty of room if we're storing a version number and want the records to be properly aligned.) Requiring them to be sorted would also allow us to abort on average half way through a scan. Of course prioritizing the "likely" signatures first may be more of a win. > Cython would precompute the size_t > for the specialization it needs (and maybe account for promotions as > well). Exactly. >>> That would only apply at a per-module level, though, so it would require >>> an >>> indirection for the signature IDs. But it would avoid a global registry. >>> >>> Another idea would be to set the signature ID field to 0 at the beginning >>> and call a C-API function to let the current runtime assign an ID> 0, >>> unique for the currently running application. Then every user would only >>> have to parse the signature once to adapt to the respective ID and could >>> otherwise branch based on it directly. >>> >>> For Cython, we could generate a static ID variable for each typed call >>> that >>> we found in the sources. When encountering a C signature on a callable, >>> either a) the ID variable is still empty (initial case), then we parse the >>> signature to see if it matches the expected signature. If it does, we >>> assign the corresponding ID to the static ID variable and issue a direct >>> call. If b) the ID field is already set (normal case), we compare the >>> signature IDs directly and issue a C call it they match. If the IDs do not >>> match, we issue a normal Python call. >>> >>> >>>>> Right... if we do some work to synchronize the types for Cython modules >>>>> generated by the same version of Cython, we're left with 3-4 types for >>>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>>> of >>>>> 10? >>>> >>>> >>>> No, I think each closure is its own type. >>> >>> >>> And that even applies to fused functions, right? They'd have one closure >>> for each type combination. >>> >>> >>>>> An alternative is do something funny in the type object to get across >>>>> the >>>>> offset-in-object information (abusing the docstring, or introduce our >>>>> own >>>>> flag which means that the type object has an additional non-standard >>>>> field >>>>> at the end). >>>> >>>> >>>> It's a hack, but the flag + non-standard field idea might just work... >>> >>> >>> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >>> CPython 3.4, we could safely use it in all existing versions of CPython. >> >> >> Sounds good. Perhaps just find a single "extended", then add a new flag >> field in our payload, in case we need to extend the types object yet again >> later and run out of unused flag bits (TBD: figure out how many unused flag >> bits there are). >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Maybe it would be a good idea if there was a third project that > defined this functionality in header files which projects could > include (or in case of Cython directly inject into the generated C > files). E.g. a function to check for the native interface, and a > function that given an array of signature strings and function > pointers builds the ABI information (and computes the ID), and one > that given an ID and signature string finds the right specialization. > The project should also expose a simple type system for the types we > care about, and be able to generate signature strings and IDs for > signatures. > > An optimization for the common case would be to only look at the first > entry in the ABI information directly and compare that for the > non-overloaded case, and otherwise do a logarithmic lookup, with a > final fallback to calling through the Python layer. I think the ABI should be simple (and fully specified) enough to allow a trivial implementation, and we and others could ship our implementations as a tiny C library (or just a header file). - Robert _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel