Re: [Cython] Wacky idea: proper macros

Wes McKinney Mon, 30 Apr 2012 09:32:29 -0700

On Mon, Apr 30, 2012 at 11:19 AM, mark florisson
<[email protected]> wrote:
> On 30 April 2012 14:49, Wes McKinney <[email protected]> wrote:
>> On Sun, Apr 29, 2012 at 5:56 AM, mark florisson
>> <[email protected]> wrote:
>>> On 29 April 2012 08:42, Nathaniel Smith <[email protected]> wrote:
>>>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson
>>>> <[email protected]> wrote:
>>>>> On 28 April 2012 22:04, Nathaniel Smith <[email protected]> wrote:
>>>>>> Was chatting with Wes today about the usual problem many of us have
>>>>>> encountered with needing to use some sort of templating system to
>>>>>> generate code handling multiple types, operations, etc., and a wacky
>>>>>> idea occurred to me. So I thought I'd through it out here.
>>>>>>
>>>>>> What if we added a simple macro facility to Cython, that worked at the
>>>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.)
>>>>>> Basically some way to write arbitrary Python code into a .pyx file
>>>>>> that gets executed at compile time and can transform the AST, plus
>>>>>> some nice convenience APIs for simple transformations.
>>>>>>
>>>>>> E.g., if we steal the illegal token sequence @@ as our marker, we
>>>>>> could have something like:
>>>>>>
>>>>>> @@ # alone on a line, starts a block of Python code
>>>>>> from Cython.MacroUtil import replace_ctype
>>>>>> def expand_types(placeholder, typelist):
>>>>>>  def my_decorator(function_name, ast):
>>>>>>    functions = {}
>>>>>>    for typename in typelist:
>>>>>>      new_name = "%s_%s" % (function_name, typename)
>>>>>>      functions[name] = replace_ctype(ast, placeholder, typename)
>>>>>>    return functions
>>>>>>  return function_decorator
>>>>>> @@ # this token sequence cannot occur in Python, so it's a safe 
>>>>>> end-marker
>>>>>>
>>>>>> # Compile-time function decorator
>>>>>> # Results in two cdef functions named sum_double and sum_int
>>>>>> @@expand_types("T", ["double", "int"])
>>>>>> cdef T sum(np.ndarray[T] arr):
>>>>>>  cdef T start = 0;
>>>>>>  for i in range(arr.size):
>>>>>>    start += arr[i]
>>>>>>  return start
>>>>>>
>>>>>> I don't know if this is a good idea, but it seems like it'd be very
>>>>>> easy to do on the Cython side, fairly clean, and be dramatically less
>>>>>> horrible than all the ad-hoc templating stuff people do now.
>>>>>> Presumably there'd be strict limits on how much backwards
>>>>>> compatibility we'd be willing to guarantee for code that went poking
>>>>>> around in the AST by hand, but a small handful of functions like my
>>>>>> notional "replace_ctype" would go a long way, and wouldn't impose much
>>>>>> of a compatibility burden.
>>>>>>
>>>>>> -- Nathaniel
>>>>>> _______________________________________________
>>>>>> cython-devel mailing list
>>>>>> [email protected]
>>>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>>>
>>>>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ?
>>>>>
>>>>> In general I would like better meta-programming support, maybe even
>>>>> allow defining new operators (although I'm not sure any of it is very
>>>>> pythonic), but for templates I think fused types should be used, or
>>>>> improved when they fall short. Maybe a plugin system could also help
>>>>> people.
>>>>
>>>> I hadn't seen that, no -- thanks for the link.
>>>>
>>>> I have to say that the examples in that link, though, give me the
>>>> impression of a cool solution looking for a problem. I've never wished
>>>> I could symbolically differentiate Python expressions at compile time,
>>>> or create a mutant Python+SQL hybrid language. Actually I guess I've
>>>> only missed define-syntax once in maybe 10 years of hacking in
>>>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it
>>>> will peek at the caller's syntax tree to automagically label the axes
>>>> as "x" and "log(y)", and that can't be done in Python. But that's not
>>>> exactly a convincing argument for a macro system.
>>>>
>>>> But generating optimized code is Cython's whole selling point, and
>>>> people really are doing klugey tricks with string-based preprocessors
>>>> just to generate multiple copies of loops in Cython and C.
>>>>
>>>> Also, fused types are great, but: (1) IIUC you can't actually do
>>>> ndarray[fused_type] yet, which speaks to the feature's complexity, and
>>>
>>> What? Yes you can do that.
>>
>> I haven't been able to get ndarray[fused_t] to work as we've discussed
>> off-list. In your own words "Unfortunately, the automatic buffer
>> dispatch didn't make it into 0.16, so you need to manually
>> specialize". I'm a bit hamstrung by other users needing to be able to
>> compile pandas using the latest released Cython.
>
> Well, as I said, it does work, but you need to tell Cython which type
> you meant. If you don't want to do that, you have to use this branch:
> https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased
> . This never made it in since we had no consensus on whether to allow
> the compiler to bootstrap itself and because of possible immaturity of
> the branch.
>
> So what doesn't work is automatic dispatch for Python functions (def
> functions and the object version of a cpdef function). They don't
> automatically select the right specialization for buffer arguments.
> Anything else should work, otherwise it's a bug.
>
> Note also that figuring out which specialization to call dynamically
> (i.e. not from Cython space at compile time, but from Python space at
> runtime) has non-trivial overhead on top of just argument unpacking.
> But you can't say "doesn't work" without giving a concrete example of
> what doesn't work besides automatic dispatch, and how it fails.
>


Sorry, I meant automatic dispatch re "doesn't work" and want to
reiterate how much I appreciate the work you're doing. To give some
context, my code is riddled with stuff like this:

lib.inner_join_indexer_float64
lib.inner_join_indexer_int32
lib.inner_join_indexer_int64
lib.inner_join_indexer_object

where the only difference between these functions is the type of the
buffer in the two arrays passed in. I have a template string for these
functions that looks like this:

inner_join_template = """@cython.wraparound(False)
@cython.boundscheck(False)
def inner_join_indexer_%(name)s(ndarray[%(c_type)s] left,
                              ndarray[%(c_type)s] right):
    '''
...

I would _love_ to replace this with fused types.

In any case, lately I've been sort of yearning for the kinds of things
you can do with an APL-variant like J. Like here's a groupby in J:

   labels
1 1 2 2 2 3 1
   data
3 4 5.5 6 7.5 _2 8.3
   labels </. data
┌───────┬─────────┬──┐
│3 4 8.3│5.5 6 7.5│_2│
└───────┴─────────┴──┘

Here < is box and /. is categorize.

Replacing the box < operator with +/ (sum), I get the group sums:

   labels +/ /. data
15.3 19 _2

Have 2-dimensional data?

   data
 0  1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32 33 34
35 36 37 38 39 40 41
42 43 44 45 46 47 48
   labels </."1 data
┌────────┬────────┬──┐
│0 1 6   │2 3 4   │5 │
├────────┼────────┼──┤
│7 8 13  │9 10 11 │12│
├────────┼────────┼──┤
│14 15 20│16 17 18│19│
├────────┼────────┼──┤
│21 22 27│23 24 25│26│
├────────┼────────┼──┤
│28 29 34│30 31 32│33│
├────────┼────────┼──┤
│35 36 41│37 38 39│40│
├────────┼────────┼──┤
│42 43 48│44 45 46│47│
└────────┴────────┴──┘

   labels +//."1 data
  7   9  5
 28  30 12
 49  51 19
 70  72 26
 91  93 33
112 114 40
133 135 47

However, J and other APLs are interpreted. If you generate C or
JIT-compile I think you can do really well performance wise and have
very expressive code for writing data algorithms without all this
boilerplate.

- Wes

>>>> (2) to handle Wes's original example on his blog (duplicating a bunch
>>>> of code between a "sum" path and a "product" path), you'd actually
>>>> need something like "fused operators", which aren't even on the
>>>> horizon. So it seems unlikely that fused types will grow to cover all
>>>> these cases in the near future.
>>>
>>> Although it doesn't handle contiguity or dimensional differences,
>>> currently the efficient fused operator is a function pointer. Wouldn't
>>> passing in a float64_t (*reducer)(float64_t, float64_t) work in this
>>> case (in the face of multiple types, you can have fused parameters in
>>> the function pointer as well)?
>>
>> I have to think that using function pointers everywhere is going to
>> lose out to "inlined" C. Maybe gcc is smart enough to optimize
>> observed code paths. In other words, you don't want
>>
>> for (i = 0; i < nlabels; i++) {
>>    lab = labels[i];
>>    result[lab] = reducer(sumx[i], data[i]);
>> }
>>
>> when you can have
>>
>> for (i = 0; i < nlabels; i++) {
>>    lab = labels[i];
>>    result[lab] = sumx[i] + data[i];
>> }
>>
>> I guess I should start writing some C code and actually measuring the
>> performance gap as I might completely be off-base here; what you want
>> eventually is to look at the array data graph and "rewrite" it to
>> better leverage data parallelism (parallelize pieces that you can) and
>> cache efficiency.
>>
>> The bigger problem with all this is that I want to avoid an
>> accumulation of ad hoc solutions.
>
> It's probably a good idea to check the overhead first, but I agree it
> will likely be non-trivial. In that sense, I would like something
> similar to julia like
>
> def  func(runtime_arguments, etc, $compile_time_expr):
>    ...
>         use $compile_time_expr(sumx[i], data[i]) # or maybe
> cython.ceval(compile_time_expr, {'op1': ..., 'op2': ...})
>
> Maybe such functions should only be restricted to Cython space,
> though. Fused types weren't designed to handle this in any way, they
> are only there to support different types, not operators. If you would
> have objects you could give them all different methods, so this is
> really rather somewhat of a special case.
>
>>>
>>> I agree with Dag that Julia has nice metaprogramming support, maybe
>>> functions could take arbitrary compile time expressions as extra
>>> arguments.
>>>
>>>> Of course some experimentation would be needed to find the right
>>>> syntax and convenience functions for this feature too, so maybe I'm
>>>> just being over-optimistic and it would also turn out to be very
>>>> complicated :-). But it seems like some simple AST search/replace
>>>> functions would get you a long way.
>>>>
>>>> - N
>>>> _______________________________________________
>>>> cython-devel mailing list
>>>> [email protected]
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>> _______________________________________________
>>> cython-devel mailing list
>>> [email protected]
>>> http://mail.python.org/mailman/listinfo/cython-devel
>> _______________________________________________
>> cython-devel mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/cython-devel
> _______________________________________________
> cython-devel mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/cython-devel
_______________________________________________
cython-devel mailing list
[email protected]
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Wacky idea: proper macros

Reply via email to