Re: [Cython] Control flow graph

2011-02-14 Thread Dag Sverre Seljebotn

On 02/14/2011 08:40 AM, Vitja Makarov wrote:

Hi!

In order to implement "reaching definitions" algorithm.
I'm now working on control-flow (or data-flow) graph.

Here is funny picture made with graphviz ;)

http://piccy.info/view3/1099337/ca29d7054d09bd0503cefa25f5f49420/1200/

   


Cool! This will be useful for so much more.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Control flow graph

2011-02-15 Thread Dag Sverre Seljebotn

On 02/15/2011 08:21 AM, Robert Bradshaw wrote:

On Mon, Feb 14, 2011 at 9:49 PM, Vitja Makarov  wrote:
   

2011/2/15 Robert Bradshaw:
 

On Sun, Feb 13, 2011 at 11:40 PM, Vitja Makarov  wrote:
   

Hi!

In order to implement "reaching definitions" algorithm.
I'm now working on control-flow (or data-flow) graph.

Here is funny picture made with graphviz ;)

http://piccy.info/view3/1099337/ca29d7054d09bd0503cefa25f5f49420/1200/
 

Cool. Any plans on handling exceptions?

   

Sure, but I don't have much time for this :(

Linear block inside try...except body should be split by assignments
and each subblock should point to exception handling entry point.
 

Would every possible failing sub-expression have to point to the
exception handling point(s)? I suppose it depends on whether you'll be
handling more than assignment tracking.

   

As result I want to have set of possible assignments for each NameNode position.

So handling of uninitialized variables, unused, unused results should be easy,
later it may help to implement local variable deletion. And guess that
could help type inference.
 

Ye.
   


I'm thinking that transforming NameNode + various assignment/lookup 
nodes into "opcode nodes" such as SetLocalNode, GetLocalNode, 
SetGlobalNode, GetAttributeNode and so on would be a natural step to 
make such code cleaner. Then (after an isolated transform doing this) 
the logic would just need to act on individual nodes, not combination of 
nodes.


Just an idea.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] python 2.7/3.x and numpy-dev (Dag, I need a quick comment)

2011-02-18 Thread Dag Sverre Seljebotn

On 02/17/2011 07:11 PM, Lisandro Dalcin wrote:

I'm working on a patch to get old, recent, and dev NumPy working in
2.7/3.x. So far, I had success, but I still have two failures like the
one pasted below.

Dag, could you elaborate a bit about the purpose of
__Pyx_BufFmt_CheckString() ? It is just a validity check for pep 3118
format strings? Do you expect the failure below to be hard to fix?s.


Yes, it compares the string with the RTTI that is saved for the type 
that is expected by the code.


I remember there was something about the '=' specifier that meant it was 
not completely trivial to fix, but still, it's just about doing it. The 
code in question is a bit convoluted; the reason I did that is because I 
wanted to allow things to match if the binary layouts matched even if 
the 'struct structure' wasn't the same...


As for user code, a quick hack around this is 'cast=True' in the  buffer 
spec.


Please file a bug...

Dag Sverre


Just in case, the format string that triggers the failure is:


memoryview(np.zeros((1,), dtype=np.dtype('b,i', align=False))).format

'T{b:f0:=i:f1:}'


==
FAIL: numpy_test ()
Doctest: numpy_test
--
Traceback (most recent call last):
   File "/usr/local/python/3.2/lib/python3.2/doctest.py", line 2113, in runTest
 raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for numpy_test
   File "/u/dalcinl/Devel/Cython/cython/BUILD/run/c/numpy_test.cpython-32dm.so",
line 1, in numpy_test

--
File "/u/dalcinl/Devel/Cython/cython/BUILD/run/c/numpy_test.cpython-32dm.so",
line 155, in numpy_test
Failed example:
 print(test_packed_align(np.zeros((1,), dtype=np.dtype('b,i', 
align=False
Exception raised:
 Traceback (most recent call last):
   File "/usr/local/python/3.2/lib/python3.2/doctest.py", line 1248, in 
__run
 compileflags, 1), test.globs)
   File "", line 1, in
 print(test_packed_align(np.zeros((1,), dtype=np.dtype('b,i',
align=False
   File "numpy_test.pyx", line 404, in numpy_test.test_packed_align
(numpy_test.c:6367)
 ValueError: Buffer packing mode currently only allowed at
beginning of format string (this is a defect)




___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Multiple modules in one compilation unit

2011-03-02 Thread Dag Sverre Seljebotn
This has been raised earlier, but I don't think there was such a 
demonstrative use-case as what I have now.


Fwrap is suppose to be able to wrap Fortran "modules", which is 
essentially a namespace mechanism. It makes sense to convert the 
namespace to Python by creating one Cython pyx file per Fortran module.


However, if you are wrapping a Fortran library this way, you suddenly 
have lots of opportunity to mess up the build:


 - If you build the Fortran code as a static library (rather 
common...), then each pyx file will have their own copy. This will link 
successfully but likely have a rather poor effect.


 - If you link each Cython file with only the corresponding Fortran 
file, things won't work (likely to get missing symbols from cross-module 
calls Fortran-side).


Yes, linking each Cython file to the same shared Fortran library should 
work. Still, this seems dangerous.


Options:
 a) Simply make sure to link with shared versions of Fortran libraries 
("this is a documentation problem")


 b) Use some other namespace mechanism in the same pyx (classes with 
static methods...)


 c) Somehow provide more than one module in the same compilation unit. 
Again, this requires the build to work correctly, but seems less 
dangerous, and also has the advantage of *allowing* static linking of 
the Fortran library, if one wants to.


But is something like this possible? Could one implement

cython -o assembly.c file1.pyx file2.pyx file3.pyx

...where assembly.so would contain three Python modules? (initfile1, 
initfile2, and so on...)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Multiple modules in one compilation unit

2011-03-02 Thread Dag Sverre Seljebotn

On 03/02/2011 11:48 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 02.03.2011 11:20:

c) Somehow provide more than one module in the same compilation unit.
Again, this requires the build to work correctly, but seems less 
dangerous,

and also has the advantage of *allowing* static linking of the Fortran
library, if one wants to.

But is something like this possible? Could one implement

cython -o assembly.c file1.pyx file2.pyx file3.pyx

...where assembly.so would contain three Python modules? (initfile1,
initfile2, and so on...)


Can't currently work because the three modules would define the same 
static C names. This could be fixed as part of the PEP 3121 
implementation:


http://trac.cython.org/cython_trac/ticket/173

Or it could be worked around by overriding the prefixes in Naming.py 
(which sounds ugly).


Generally speaking, I'm -1 on this idea. I don't see a real use case, 
and you're saying yourself that this isn't required to make your 
Fortran use case work either.


But assuming work is spent on Cython, it *could* work? I.e. there's not 
a problem with import mechanisms; Python assumes one module per .so or 
similar? Or would one have to write a custom importer?






- If you build the Fortran code as a static library (rather common...),
then each pyx file will have their own copy. This will link successfully
but likely have a rather poor effect.


So? lxml has two main modules, and if you build it statically against 
libxml2/libxslt, you end up with two static versions of that library 
in the two modules. Takes up some additional space, but otherwise 
works beautifully.


Problem is that Fortran code often has...interesting...programming 
practices. Global variables abound, and are often initialised between 
modules. Imagine:


settings_mod.set_alpha(0.34)
print compute_mod.get_alpha_squared()

This behaves quite differently with two static versions rather than one...

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Multiple modules in one compilation unit

2011-03-02 Thread Dag Sverre Seljebotn

On 03/02/2011 04:11 PM, Lisandro Dalcin wrote:

On 2 March 2011 08:35, Stefan Behnel  wrote:

Dag Sverre Seljebotn, 02.03.2011 11:54:

Problem is that Fortran code often has...interesting...programming
practices. Global variables abound, and are often initialised between
modules. Imagine:

settings_mod.set_alpha(0.34)
print compute_mod.get_alpha_squared()

This behaves quite differently with two static versions rather than one...

Then I'd suggest always linking dynamically.


And where are you going to put your fortran shared libraries? Dynamic
linking details varies wildly across platforms... I'm very much
understand Dag's use case and concerns, and I do think that some
research in all this is worth it.


I'm not sure if there's much more to research at the moment -- Stefan 
says it is possible, and that's what I wanted to know at this stage. If 
I want it, I obviously need to implement it myself. (And if such a patch 
implements PEP 3121 and there's a demonstrated need for it with some 
users, I really can't see it getting rejected just out of it being in 
"poor taste").


I.e., I'm going to make Fwrap spit out multiple pyx files and worry 
about this later. If multiple .pyx in one .so was fundamentally 
impossible, I might have gone another route with Fwrap. That was all.


Thanks Stefan!,

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Multiple modules in one compilation unit

2011-03-02 Thread Dag Sverre Seljebotn

On 03/02/2011 05:01 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 02.03.2011 16:37:

On 03/02/2011 04:11 PM, Lisandro Dalcin wrote:

On 2 March 2011 08:35, Stefan Behnel wrote:

Dag Sverre Seljebotn, 02.03.2011 11:54:

Problem is that Fortran code often has...interesting...programming
practices. Global variables abound, and are often initialised between
modules. Imagine:

settings_mod.set_alpha(0.34)
print compute_mod.get_alpha_squared()

This behaves quite differently with two static versions rather 
than one...

Then I'd suggest always linking dynamically.


And where are you going to put your fortran shared libraries? Dynamic
linking details varies wildly across platforms... I'm very much
understand Dag's use case and concerns, and I do think that some
research in all this is worth it.


I'm not sure if there's much more to research at the moment -- Stefan 
says
it is possible, and that's what I wanted to know at this stage. If I 
want

it, I obviously need to implement it myself. (And if such a patch
implements PEP 3121 and there's a demonstrated need for it with some 
users,
I really can't see it getting rejected just out of it being in "poor 
taste").


I.e., I'm going to make Fwrap spit out multiple pyx files and worry 
about

this later. If multiple .pyx in one .so was fundamentally impossible, I
might have gone another route with Fwrap. That was all.


The feature I could imagine becoming part of Cython is "compiling 
packages". I.e. you'd call "cython" on a package and it would output a 
directory with a single __init__.so that contains the modules compiled 
from all .pyx/.py files in that package. Importing the package would 
then trigger an import of that __init__.so, which in turn will execute 
code in its init__init__() function to register the other modules.


Would that work for you?


Yes, that sounds like exactly what I'm after.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] OpenMP support

2011-03-08 Thread Dag Sverre Seljebotn

On 03/08/2011 11:34 AM, mark florisson wrote:

I'd like to implement OpenMP support for Cython. Looking at


Great news! It looks like this will be a topic on the coming workshop, 
with Francesc coming as well (but nothing wrong with getting started 
before then).


(And please speak up if you are in the right group for a GSoC.)


http://wiki.cython.org/enhancements/openmp I agree a close
1:1 mapping would be nice. It would probably make sense to start with
support for 'nogil' sections because GIL-holding
sections would be hard to deal with considering all the 'goto's that
are generated (because you may not leave the
OpenMP block). See this thread
http://comments.gmane.org/gmane.comp.python.cython.devel/8695 for a
previous discussion.

Looking also at http://wiki.cython.org/enhancements/parallel , for the
'parallel for' support I think the best syntax would
be to introduce a fake 'cython.openmp' module as Dag suggested. I
propose to start with the following syntax:

 from python (c)import openmp

 with nogil:
 with openmp.parallel('sections'):
 with openmp.section():


I've changed my opinion a little bit since that thread; I'm now rather 
undecided.


Pros:

 - Easy to get results fast (nothing wrong about Sturla's approach, but 
realistically it *will* take longer to make it work)
 - OpenMP already has significant mindshare. Saying "Cython supports 
OpenMP" will sound soothing to scientific Fortran and C programmers, 
even if it is technically inferior to other solutions we could find.


Cons:

 - Like Sturla says, closures maps this better in the Cython language 
(having "private=('a', 'b')" as a syntax for referring to the local 
variables a and b is rather ugly)

 - The goto restriction that makes exception handling harder

I think that long-term we must find some middle ground that sits between 
just having the GIL and not having it. E.g., find some way of raising an 
exception from a nogil block, and also allow more simple code to "get 
more data to work on" within the context of a single thread. So what 
you're saying about goto's making me lean against OpenMP.


But, perfect is the enemy of good. I certainly won't vote against having 
this implemented -- because *anything* is better than nothing, and 
nothing prevents us from building something better in addition to or on 
top of an initial OpenMP implementation.


BTW, I found this framework interesting ...  it cites some other 
frameworks as well (cilk, OpenMP, Intel Threading Blocks), which it 
beats in a specific situation.


http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about

I'm NOT saying we should "support FastFlow instead". What I'm thinking 
is that perhaps the place to start is to ask ourselves: What are the 
current obstacles to using parallelization frameworks written for C or 
C++ with Cython? How would one use Cython with these, and do it in a 
practical way (i.e. not just C code in a nogil block without even 
exception handling).


I.e. what I'd like to tackle first (and perhaps on the workshop) is how 
to even make it a somewhat pleasant experience to use pthreads or Python 
threading. Set signal handlers for SIGTERM and friends, implement some 
form of "delayed" exception creation so that we first pop the stack, 
acquire the GIL, and *then* construct the exception object...


But this is perhaps orthogonal to an OpenMP effort.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] OpenMP support

2011-03-11 Thread Dag Sverre Seljebotn

On 03/11/2011 08:20 AM, Stefan Behnel wrote:

Robert Bradshaw, 11.03.2011 01:46:
On Tue, Mar 8, 2011 at 11:16 AM, Francesc Alted  
wrote:

A Tuesday 08 March 2011 18:50:15 Stefan Behnel escrigué:

mark florisson, 08.03.2011 18:00:

What I meant was that the
wrapper returned by the decorator would have to call the closure
for every iteration, which introduces function call overhead.

[...]

I guess we just have to establish what we want to do: do we
want to support code with Python objects (and exceptions etc), or
just C code written in Cython?


I like the approach that Sturla mentioned: using closures to
implement worker threads. I think that's very pythonic. You could do
something like this, for example:

  def worker():
  for item in queue:
  with nogil:
  do_stuff(item)

  queue.extend(work_items)
  start_threads(worker, count)

Note that the queue is only needed to tell the thread what to work
on. A lot of things can be shared over the closure. So the queue may
not even be required in many cases.


I like this approach too.  I suppose that you will need to annotate the
items so that they are not Python objects, no?  Something like:

 def worker():
 cdef int item  # tell that item is not a Python object!
 for item in queue:
 with nogil:
 do_stuff(item)

 queue.extend(work_items)
 start_threads(worker, count)


On a slightly higher level, are we just trying to use OpenMP from
Cython, or are we trying to build it into the language? If the former,
it may make sense to stick closer than one might otherwise be tempted
in terms of API to the underlying C to leverage the existing
documentation. A library with a more Pythonic interface could perhaps
be written on top of that. Alternatively, if we're building it into
Cython itself, I'd it might be worth modeling it after the
multiprocessing module (though I understand it would be implemented
with threads), which I think is a decent enough model for managing
embarrassingly parallel operations.


+1



The above code is similar to that,
though I'd prefer the for loop implicit rather than as part of the
worker method (or at least as an argument).


It provides a simple way to write per-thread initialisation code, 
though. And it's likely easier to make looping fast than to speed up 
the call into a closure. However, eventually, both ways will need to 
be supported anyway.




If we went this route,
what are the advantages of using OpenMP over, say, pthreads in the
background? (And could the latter be done with just a library + some
fancy GIL specifications?)


In the above example, basically everything is explicit and nothing 
more than a simplified threading setup is needed. Even the 
implementation of "start_threads()" could be done in a couple of lines 
of Python code, including the collection of results and errors. If 
someone thinks we need more than that, I'd like to see a couple of 
concrete use cases and code examples first.




One thing that's nice about OpenMP as
implemented in C is that the serial code looks almost exactly like the
parallel code; the code at http://wiki.cython.org/enhancements/openmp
has this property too.


Writing it with a closure isn't really that much different. You can 
put the inner function right where it would normally get executed and 
add a bit of calling/load distributing code below it. Not that bad IMO.


It may be worth providing some ready-to-use decorators to do the load 
balancing, but I don't really like the idea of having a decorator 
magically invoke the function in-place that it decorates.




Also, I like the idea of being able to hold the GIL by the invoking
thread and having the "sharing" threads do the appropriate locking
among themselves when needed if possible, e.g. for exception raising.


I like the explicit "with nogil" block in my example above. It makes 
it easy to use normal Python setup code, to synchronise based on the 
GIL if desired (e.g. to use a normal Python queue for communication), 
and it's simple enough not to get in the way.


I'm supporting Robert here. Basically, I'm +1 to anything that can make 
me pretend the GIL doesn't exist, even if it comes with a 2x performance 
hit: Because that will make me write parallell code (which I can't be 
bothered to do in Cython currently), and I have 4 cores on the laptop I 
use for debugging, so I'd still get a 2x speedup.


Perhaps the long-term solution is something like an "autogil" mode could 
work where Cython automatically releases the GIL on blocks where it can 
(such as a typed for-loop), and acquires it back when needed (an 
exception-raising if-block within said for-loop). And when doing 
multi-threading, GIL-requiring calls are dispatched to a master 
GIL-holding thread (which would not be a worker thread, i.e. on 4 cores 
you'd have 4 workers + 1 GIL-holding support thread). So the advice for 
speeding up code is simply "make sure your co

Re: [Cython] OpenMP support

2011-03-11 Thread Dag Sverre Seljebotn

On 03/11/2011 12:37 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 11.03.2011 08:56:

Basically, I'm +1 to anything that can make me
pretend the GIL doesn't exist, even if it comes with a 2x performance 
hit:
Because that will make me write parallell code (which I can't be 
bothered

to do in Cython currently), and I have 4 cores on the laptop I use for
debugging, so I'd still get a 2x speedup.

Perhaps the long-term solution is something like an "autogil" mode could
work where Cython automatically releases the GIL on blocks where it can
(such as a typed for-loop), and acquires it back when needed (an
exception-raising if-block within said for-loop).


I assume you mean this to become a decorator or other option written 
into the code.




And when doing
multi-threading, GIL-requiring calls are dispatched to a master 
GIL-holding

thread (which would not be a worker thread, i.e. on 4 cores you'd have 4
workers + 1 GIL-holding support thread). So the advice for speeding 
up code
is simply "make sure your code is all typed", just like before, but 
people

can follow that advice without even having to learn about the GIL.


The GIL does not only protect the interpreter core. It also protects C 
level data structures in user code and keeps threaded code from 
running amok. Releasing and acquiring it doesn't come for free either, 
so besides likely breaking code that was not specifically written to 
be reentrant, releasing it automatically may also introduce a 
performance penalty for many users.


The intention was that the GIL would be acquired in exceptional 
circumstances (doesn't matter for overall performance) or during 
debugging (again don't care about performance). But I agree the idea 
needs more thought on the possible pitfalls.





I'm very happy the GIL exists, and I'm against anything that tries to 
disable it automatically. Threading is an extremely dangerous 
programming model. The GIL has its gotchas, too, but it still 
simplifies it quite a bit. Actually, threading is so complex and easy 
to get wrong, that any threaded code should always be written 
specifically to support threading. Explicitly acquiring and releasing 
the GIL is really just a minor issue on that path.


I guess the point is that OpenMP takes that "extremely dangerous 
programming model" and makes it tractable, at least for a class of 
trivial problems (not necessarily SIMD, but almost).


BTW, threading is often used simply because how how array data is laid 
out in memory. Typical usecase is every thread write to different 
non-overlapping blocks of the same array (and read from the same input 
arrays that are not changed). Then you move on to step B, which does the 
same, but perhaps blocks the arrays in a different way between threads. 
Then step C blocks the data in yet another way, etc. But at each step 
it's just "input arrays, non-overlapping blocks in output arrays", 
global parameters, local loop counters.


(One doesn't need to use threads, there was another thread on 
multiprocessing + shared memory arrays.)


Just saying that not all use of threads is "extremely dangerous", and 
OpenMP exists explicitly to dumb threading down for those cases.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-16 Thread Dag Sverre Seljebotn

On 03/16/2011 11:28 AM, mark florisson wrote:

I implemented the 'with gil:' statement, and have added error checking
for nested 'with gil' or 'with nogil' statements. For instance, with
the patch applied Cython wil issue an error when you have e.g.

with nogil:
 with nogil:
 ...

(or the same thing for 'with gil:'). This because nested 'nogil' or
'gil' blocks, without intermediate GIL-acquiring/releasing, will abort
the Python interpreter. However, if people are acquiring the GIL
manually in nogil blocks and then want to release the GIL with a
'nogil' block, it will incorrectly issue an error. I do think this
kind of code is uncommon, but perhaps we want to issue a warning
instead?


I think we should make nested nogil-s noops, i.e.

with nogil:
with nogil: # => if True:

This is because one may want to change "with nogil" to "with gil" for 
debugging purposes (allow printing debug information).


Another feedback is that I wonder whether we should put the "gil" and 
"nogil" psuedo-context managers both in cython namespace, and sort of 
deprecate the "global" nogil, rather than introduce yet another name 
that can't be used safely for all kinds of variables.


-- Dag


The 'with gil:' statement can now be used in the same way as 'with
nogil:'. Exceptions raised from GIL blocks will be propagated if
possible (i.e., if the return type is 'object'). Otherwise it will
jump to the end of the function and use the usual
__Pyx_WriteUnraisable, there's not really anything new there.

For functions declared 'nogil' that contain 'with gil:' statements, it
will safely lock around around the initialization of any Python
objects and set up the refnanny context (with appropriate preprocessor
guards). At the end of the function it will safely lock around the
teardown of the refnanny context. With 'safely' I mean that it will
create a thread state if it was not already created and may be called
even while the GIL is already held (using PyGILState_Ensure()). This
means variables are declared and initialized in the same way as in
normal GIL-holding functions (except that there is additional
locking), and of course the GIL-checking code ensures that errors are
issued if those variables are attempted to be used outside any GIL
blocks.

Could someone review the patch (which is attached)? Maybe check if I
haven't missed any side cases and such?


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-16 Thread Dag Sverre Seljebotn

On 03/16/2011 12:54 PM, mark florisson wrote:

On 16 March 2011 11:58, Dag Sverre Seljebotn  wrote:

On 03/16/2011 11:28 AM, mark florisson wrote:

I implemented the 'with gil:' statement, and have added error checking
for nested 'with gil' or 'with nogil' statements. For instance, with
the patch applied Cython wil issue an error when you have e.g.

with nogil:
 with nogil:
 ...

(or the same thing for 'with gil:'). This because nested 'nogil' or
'gil' blocks, without intermediate GIL-acquiring/releasing, will abort
the Python interpreter. However, if people are acquiring the GIL
manually in nogil blocks and then want to release the GIL with a
'nogil' block, it will incorrectly issue an error. I do think this
kind of code is uncommon, but perhaps we want to issue a warning
instead?

I think we should make nested nogil-s noops, i.e.

with nogil:
 with nogil: # =>  if True:

This is because one may want to change "with nogil" to "with gil" for
debugging purposes (allow printing debug information).

Interesting, that does sound convenient, but I'm not if mere
convenience should move us to simply ignore what is technically most
likely incorrect code (unless there is intermediate manual locking).


I'm not sure if I understand what you mean here. How does simply 
ignoring redundant "with [no]gil" statements cause incorrect code? Or do 
you mean this is a


I'm just trying to minimize the "language getting in your way" factor. 
It is pretty useless to write


if x:
if x:
...

as well, but Python does allow it.

Warnings on nested "with nogil" is more the role of a "cylint" in my 
opinion.




In any case, I wouldn't really be against that. If you simply want to
allow this for debugging, we could also allow print statements in
nogil sections, by either rewriting it using 'with gil:', or by
inserting a simple printf (in which case you probably want to place a
few restrictions).


It's not only print statements. I.e., if I think something is wrong with 
an array, I'll stick in code like


print np.std(x), np.mean(x), np.any(np.isnan(x))

or something more complex that may require temporaries. Or even plot the 
vector:


plt.plot(x)
plt.show() # blocks until I close plot window

Or, launch a debugger:

if np.any(np.isnan(x)):
 import pdb; pdb.set_trace()

so I guess I should stop saying this is only about printing. In general, 
it's nice to use Python during debugging. I find myself replacing "with 
nogil" with "if True" in such situations, to avoid reindenting.


I guess I can soon start using "with gil" around the debug code though. 
Again, the restriction on nested nogil/gil is not a big problem, just an 
instance of "the language getting in your way".


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-16 Thread Dag Sverre Seljebotn

On 03/16/2011 01:55 PM, mark florisson wrote:

On 16 March 2011 13:37, Dag Sverre Seljebotn  wrote:

On 03/16/2011 12:54 PM, mark florisson wrote:

On 16 March 2011 11:58, Dag Sverre Seljebotn
  wrote:

On 03/16/2011 11:28 AM, mark florisson wrote:

I implemented the 'with gil:' statement, and have added error checking
for nested 'with gil' or 'with nogil' statements. For instance, with
the patch applied Cython wil issue an error when you have e.g.

with nogil:
 with nogil:
 ...

(or the same thing for 'with gil:'). This because nested 'nogil' or
'gil' blocks, without intermediate GIL-acquiring/releasing, will abort
the Python interpreter. However, if people are acquiring the GIL
manually in nogil blocks and then want to release the GIL with a
'nogil' block, it will incorrectly issue an error. I do think this
kind of code is uncommon, but perhaps we want to issue a warning
instead?

I think we should make nested nogil-s noops, i.e.

with nogil:
 with nogil: # =>if True:

This is because one may want to change "with nogil" to "with gil" for
debugging purposes (allow printing debug information).

Interesting, that does sound convenient, but I'm not if mere
convenience should move us to simply ignore what is technically most
likely incorrect code (unless there is intermediate manual locking).

I'm not sure if I understand what you mean here. How does simply ignoring
redundant "with [no]gil" statements cause incorrect code? Or do you mean
this is a

I'm just trying to minimize the "language getting in your way" factor. It is
pretty useless to write

if x:
if x:
...

as well, but Python does allow it.

Warnings on nested "with nogil" is more the role of a "cylint" in my
opinion.


Perhaps you're right. However, I just think it is important for users
to realize that in general, they cannot unblock threads recursively.
Currently the error checking code catches multiple nested 'with
(no)gil', but it doesn't catch this:

cdef void func() nogil:
 with nogil:
 pass

with nogil:
 func()

But the problem is that it does abort the interpreter. So I thought
that perhaps emphasizing that that code is incorrect for at least the
easy-to-catch cases, we might make users somewhat more aware. Because
if the above code aborts Python, but a nested 'with nogil:' is valid
code, there might be a source for confusion.


Ah, right. I guess I agree with disallowing nested "with nogil" 
statements for the time being then.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-16 Thread Dag Sverre Seljebotn

On 03/16/2011 02:17 PM, Pauli Virtanen wrote:

Wed, 16 Mar 2011 14:10:29 +0100, Dag Sverre Seljebotn wrote:

Ah, right. I guess I agree with disallowing nested "with nogil"
statements for the time being then.

Could you make the inner nested "with nogil" statements no-ops instead,
if the GIL is already released? Does the Cython compiler keep track if
GIL is acquired or not?


That's what I initially suggested. See Mark's posted code -- when 
calling another function, you can't know whether the nogil is "nested" 
or not (without actually checking with CPython...)


Within-Cython solutions are bad because calls may cross between 
different Cython modules.


Dag

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-17 Thread Dag Sverre Seljebotn

On 03/17/2011 12:24 AM, Greg Ewing wrote:

Stefan Behnel wrote:

I'm not sure if this is a good idea. "nogil" blocks don't have a way 
to handle exceptions, so simply jumping out of them because an inner 
'with gil' block raised an exception can have unexpected side effects.


Seems to me that the __Pyx_WriteUnraisable should be done at
the end of the 'with gil' block, and execution should then
continue from there.

In other words, the effect on exception handling should be
the same as if the 'with gil' block had been factored out into
a separate function having no exception return value.



-1.

I consider the fact that exceptions don't propagate from some functions 
a "currently unfixable bug". We should plan for it being fixed some day. 
Having a "with" statement alter execution flow in this way is totally 
unintuitive to me.


If you want this, it's better to introduce a new keyword like 
"trywithgil: ... except:" (not that I'm in favour of that).


We could perhaps fix exception propagation from nogil functions by using 
some conventions + setjmp/longjmp. Mono does this when calling into 
native code, and I recently did it manually in Cython to propagate 
exceptions through the Fortran wrappers in SciPy. Also, the GIL may not 
be around forever even in CPython? (All arguments I've seen for keeping 
it has been along the lines of "it slows down serial code", not that it 
is considered a good thing.)


Designing a language around the GIL feels like a dead-end to me. I'm OK 
with being practical in the face of the limitations of today; but let's 
keep "with gil" and "with nogil" something that can become noops in the 
future without too much pain. Yes, I know that if the GIL goes it will 
break Stefan's lxml code, and I'm sure other code -- I'm just saying 
that we shouldn't make the language design even more GIL-centric than it 
already is.


Anyway, I'm off to write some computational Fortran+OpenMP code because 
dealing with threading and Python is just more than I can deal with...


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-17 Thread Dag Sverre Seljebotn

On 03/17/2011 08:38 AM, Dag Sverre Seljebotn wrote:

On 03/17/2011 12:24 AM, Greg Ewing wrote:

Stefan Behnel wrote:

I'm not sure if this is a good idea. "nogil" blocks don't have a way 
to handle exceptions, so simply jumping out of them because an inner 
'with gil' block raised an exception can have unexpected side effects.


Seems to me that the __Pyx_WriteUnraisable should be done at
the end of the 'with gil' block, and execution should then
continue from there.

In other words, the effect on exception handling should be
the same as if the 'with gil' block had been factored out into
a separate function having no exception return value.



-1.

I consider the fact that exceptions don't propagate from some 
functions a "currently unfixable bug". We should plan for it being 
fixed some day. Having a "with" statement alter execution flow in this 
way is totally unintuitive to me.


If you want this, it's better to introduce a new keyword like 
"trywithgil: ... except:" (not that I'm in favour of that).


We could perhaps fix exception propagation from nogil functions by 
using some conventions + setjmp/longjmp. Mono does this when calling 
into native code, and I recently did it manually in Cython to 
propagate exceptions through the Fortran wrappers in SciPy. Also, the 
GIL may not be around forever even in CPython? (All arguments I've 
seen for keeping it has been along the lines of "it slows down serial 
code", not that it is considered a good thing.)


Heh. I obviously meant that "removing it would slow down serial code".

Dag



Designing a language around the GIL feels like a dead-end to me. I'm 
OK with being practical in the face of the limitations of today; but 
let's keep "with gil" and "with nogil" something that can become noops 
in the future without too much pain. Yes, I know that if the GIL goes 
it will break Stefan's lxml code, and I'm sure other code -- I'm just 
saying that we shouldn't make the language design even more 
GIL-centric than it already is.


Anyway, I'm off to write some computational Fortran+OpenMP code 
because dealing with threading and Python is just more than I can deal 
with...


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-17 Thread Dag Sverre Seljebotn

On 03/17/2011 09:27 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 17.03.2011 08:38:

On 03/17/2011 12:24 AM, Greg Ewing wrote:

Stefan Behnel wrote:

I'm not sure if this is a good idea. "nogil" blocks don't have a 
way to
handle exceptions, so simply jumping out of them because an inner 
'with

gil' block raised an exception can have unexpected side effects.


Seems to me that the __Pyx_WriteUnraisable should be done at
the end of the 'with gil' block, and execution should then
continue from there.

In other words, the effect on exception handling should be
the same as if the 'with gil' block had been factored out into
a separate function having no exception return value.



-1.

I consider the fact that exceptions don't propagate from some 
functions a

"currently unfixable bug". We should plan for it being fixed some day.


It can't be fixed in general, because there are cases where exceptions 
simply cannot be propagated. Think of C callbacks, for example. C 
doesn't have a "normal" way of dealing with exceptions, so if an 
exception that originated from a callback simply leads to returning 
from the function, it may mean that the outer C code will simply 
continue to execute normally. Nothing's won in that case.


Yes, that's a good point. (This is what I used setjmp/longjmp to work 
around BTW, to longjmp across the calling Fortran code. I knew it wasn't 
doing any mallocs/frees, let alone any file handling etc., so this was 
safe.)


I'll admit that I'm mostly focused on code like

def f():
with nogil:
for ...:
A
if something_exceptional:
with gil:
raise Exception(...)
B
C

where I'd say it's up to me to make sure that B and C can safely be 
skipped. It would be a major pain to have my raised exception here be 
"trapped" -- in fact, it would make the "with gil" statement unusable 
for my purposes.






In code:

cdef void c_callback(...) nogil:
... do some C stuff ...
with gil:
... do some Python stuff ...
... do some more C stuff ...

So far, there are two proposed ways of doing this.

1) acquire the GIL on entry and exit, handling unraisable exceptions 
right before exiting.


2) keep all GIL requiring code inside of the "with gil" block, 
including unraisable exceptions.


I find (2) a *lot* more intuitive, as well as much safer. We can't 
know what effects the surrounding "do C stuff" code has. It may 
contain thread-safe C level cleanup code for the "with gil" block, for 
example, or preparation code that enables returning into the calling C 
code. Simply jumping out of the GIL block without executing the 
trailing code may simply not work at all.


I think you find (2) more intuitive because you have a very detailed 
knowledge of Cython and CPython, but that somebody new to Cython would 
expect a "with" statement to have the same control flow logic as the 
Python with statement. Of course, I don't have any data for that.


How about this compromise: We balk on the code you wrote with:

Error line 345: Exceptions propagating from "with gil" block cannot be 
propagated out of function, please insert try/except and handle exception


So that we require this:

with gil:
try:
...
except:
warnings.warning(...) # or even cython.unraisable(e)

This keeps me happy about not abusing the with statement for strange 
control flow, and makes the "with gil" useful for raising exceptions 
inside regular def functions with nogil blocks.







We could perhaps fix exception propagation from nogil functions by using
some conventions + setjmp/longjmp. Mono does this when calling into 
native

code, and I recently did it manually in Cython to propagate exceptions
through the Fortran wrappers in SciPy.


Regardless of the topic of this thread, it would be nice to have 
longjmp support in Cython. Lupa, my Cython wrapper for LuaJIT, 
currently has to work around several quirks in that area.


Not sure what you mean here, I used longjmp (in a function without any 
Python objects) and it seems to work just fine. Did I miss anything?






Also, the GIL may not be around
forever even in CPython? (All arguments I've seen for keeping it has 
been
along the lines of "it slows down serial code", not that it is 
considered a

good thing.)


If it ever gets removed, there will surely have to be an emulation 
layer for C modules. Many of them simply use it as thread-lock, and 
that's totally reasonable IMHO.


Good point. But there may be an option to disable said emulation layer 
that we want to make use of in Cython...


(This is relevant today for Cython-on-.NET, for instance.)





Designing a language around the GIL feels like a de

Re: [Cython] 'with gil:' statement

2011-03-17 Thread Dag Sverre Seljebotn

On 03/17/2011 11:16 AM, mark florisson wrote:

On 17 March 2011 10:08, Dag Sverre Seljebotn  wrote:


How about this compromise: We balk on the code you wrote with:

Error line 345: Exceptions propagating from "with gil" block cannot be
propagated out of function, please insert try/except and handle exception

So that we require this:

with gil:
try:
...
except:
warnings.warning(...) # or even cython.unraisable(e)

This keeps me happy about not abusing the with statement for strange control
flow, and makes the "with gil" useful for raising exceptions inside regular
def functions with nogil blocks.


I agree with your previous statement, but not with your compromise :).
We have to differentiate between two cases, similar to Stefan's cases,
but different in a very important way that matter for nested GIL
blocks.

1) Exceptions can propagate to some outer GIL section (in or outside
the current function)
2) Exceptions can't propagate, because there is no outer GIL section
and the function has a non-object return type

With your compromise, with 1) exceptions cannot propagate, but with 2)
you win forcing the user to be explicit. But then you still need to
write to some variable indicating that an exception occurred and
adjust control flow accordingly in your nogil section (unless you want
to clean up and return immediately).

If you have Python with-statement semantics, you can do the following,
for instance:

cdef void func() nogil:
 with gil:
 try:

 with nogil:
 with gil:
 code that may raise an exception

 this is not executed

 except ExceptionRaisedFromInnerWithGilBlock:
 handle exception here

The point is, if you have case 2), and you want to use GIL code, you
need to handle exceptions in some way. Forcing the user to not
propagate anything doesn't sound right, unless this holds only for the
outermost 'with gil' block. I would be OK with that, although it would
be inconsistent with how exceptions in normal cdef functions with
non-object return work, so I would say that we'd have to force it in
the same manner there.


I think we should perhaps look at enforcing explicit exception-ignoring 
everywhere.. there's a lot of details to hash out, and there's the issue 
of backwards compatability, but it could be dealt with with a couple of 
releases where we only raise a warning and so on.


It could involve a *very* limited subset of exception handling for use 
in nogil mode (i.e., only a bare "except:" statement allowed, where one 
can call either "cython.unraisable()", "pass", or set a flag).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] 'with gil:' statement

2011-03-18 Thread Dag Sverre Seljebotn

On 03/18/2011 11:10 AM, Stefan Behnel wrote:

mark florisson, 18.03.2011 10:52:

On 18 March 2011 07:07, Stefan Behnel wrote:

Greg Ewing, 18.03.2011 01:18:


mark florisson wrote:


I think we could support it without having to acquire
the GIL in the finally clause.


That was the intention -- the code in the finally clause would
be subject to the same nogil restrictions as the rest of
the nogil block.

My point is that as long as you're allowing exceptions to be
tunnelled through nogil blocks, they should respect any finally
clauses that they pass through on the way.


+1


Ok, I will give it a go and try to allow it when they surround with
gil blocks. I would however like to reiterate that it is a
special-case, inconsistent with previous behaviour, and basically
extends the language and won't work for functions that are called and
declared 'with gil'. But it is convenient, so I can't help but like it
at the same time :]


I'm not sure I understand why you think it's so bad, and why it would 
be inconsistent with previous behaviour.


The only real problem I see is that you could do things like this:

with nogil:
try:
with gil: raise ...
finally:
with gil: raise ...

i.e. you could loose the original exception. Even worse:

with nogil:
try:
with gil: raise ...
finally:
with gil:
try: raise
except: pass

Here, it must be made sure that the original exception still gets 
raised properly at the end. That's a problem in Py2, where exceptions 
are badly scoped, i.e. Python code that runs in the interpreter could 
fail to reset the original exception after catching another one. But I 
guess these things are up to the "with gil" block/function, rather 
than the above "finally" clause.


Actually, I think I still find it more convenient to not provide *any* 
special exception paths through nogil code, i.e. to not let exceptions 
in "with gil" blocks exit from outer "nogil" blocks. That would avoid 
all of the semantic difficulties above.


Well, of course not supporting something is easier. But is it user 
friendly? Relying on boolean flags to signal errors states is a real 
pain when one is used to using exceptions.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Message system refactoring

2011-03-21 Thread Dag Sverre Seljebotn

On 03/21/2011 11:45 AM, Vitja Makarov wrote:

Now error/warning messages are stored in global variables at
Cython.Compiler.Errors

I think it's much better to move error handling into some object,
Main.Context for example.

Some benefits:
  - reduce use of global variables
  - allow more then one cython compiler instance at the time
  - make it much easy to implement -Werror
  - cython directives can affect message system (fast_fail, werror)
  - messages could be easily sorted


+1. I assume the reason this is not done is simply because it would be a 
lot of work and the payback is less than spending time on other stuff.


By attaching the error context to "env" and "code" one can avoid a lot 
of signature changes. I think transforms should take the context in 
their constructor.


Dag Sverre

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 1. 
Quite a few different ways of expressing parallelism was looked at, and 
finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


Main post-workshop changes:

 * cython.parallel.firstiteration()/lastiteration # for in-loop if-test 
for thread setup/teardown blocks


 * An idea for how to implement numthreads(), so that we can drop the 
rather complex Context idea.


 * More thoughts on firstprivate/lastprivate


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 1. 
Quite a few different ways of expressing parallelism was looked at, and 
finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 11:43 AM, Dag Sverre Seljebotn wrote:

CEP up at http://wiki.cython.org/enhancements/prange

"""
This spec is the result of a number of discussions at Cython workshop 
1. Quite a few different ways of expressing parallelism was looked at, 
and finally we decided to split the problem in two:


 * A simple and friendly solution that covers, perhaps, 80% of the 
cases, based on simply replacing range with prange.


 * Less friendly solutions for the remaining cases. These cases may 
well not even require language support in Cython, or only in indirect 
ways (e.g., cdef closures if normal closures are too expensive).


This document focuses exclusively on the former solution and does not 
intend to cover all use-cases for parallel programming, only the most 
common ones.

"""

Note that me and Mark talked some more on the way to the airport, and 
also I got a couple of more ideas afterwards, so everybody interested 
should probably take a read even if you were there for discussions.


To be more specific, here's the main post-workshop changes:

 * if cython.parallel.firstthreaditer()/lastthreaditer() # Use if-test 
in loop for thread setup/teardown


 * An idea for implementing threadnum() in a way so that we can drop 
the rather complex Context idea.


 * More thoughts on firstprivate/lastprivate

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely 
on conventions:


* Thread-shared: Variables that are only read and not written in 
the loop body are shared across threads. Variables that are only used 
in the else block are considered shared as well.


* Thread-private: Variables that are assigned to in the loop body 
are thread-private. Obviously, the iteration counter is thread-private 
as well.


* Reduction: Variables that only used on the LHS of an inplace 
operator, such as s above, are marked as targets for reduction. If the 
variable is also used in other ways (LHS of assignment or in an 
expression) it does instead turn into a thread-private variable. Note: 
This means that if one, e.g., inserts printf(... s) above, s is turned 
into a thread-local variable. OTOH, there is simply no way to 
correctly emulate the effect printf(... s) would have in a sequential 
loop, so such code must be discouraged anyway.

"""

What about simply (ab-)using Python semantics and creating a new inner 
scope for the prange loop body? That would basically make the loop 
behave like a closure function, but with the looping header at the 
'right' place rather than after the closure.


I'm not quite sure what the concrete changes to the CEP this would lead 
to (assuming you mean this as a proposal for alternative semantics, and 
not an implementation detail).


How would we treat reduction variables? They need to be supported, and 
there's nothing in Python semantics to support reduction variables, they 
are a rather special case everywhere. I suppose keeping the reduction 
clause above, or use the "nonlocal" keyword in the loop body...


Also there's the else:-block, although we could make that part of the 
scope. And the "lastprivate" functionality, although that could be 
dropped without much loss.




Also, in the example, the local variable declaration of "tmp" outside 
of the loop looks somewhat misplaced, although it's precedented by 
comprehensions (which also have their own local scope in Cython).


Well, depending on the decision of lastprivate, the declaration would 
need to be outside; I really like the idea of moving "cdef", and am 
prepared to drop lastprivate for this.


Being explicit about thread-local variables does make things a lot safer 
to use.


(One problem is that switching between serial and parallel one needs to 
move variable declarations. But that only happens once, and one can use 
"nthreads=1" to disable parallel after that.)


An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace 
operators
# printf(...s) -> now leads to error, since s is not 
declared thread-private but is read

else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:04 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 13:53:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange


"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the 
loop
body are shared across threads. Variables that are only used in the 
else

block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private 
as well.


* Reduction: Variables that only used on the LHS of an inplace 
operator,
such as s above, are marked as targets for reduction. If the 
variable is

also used in other ways (LHS of assignment or in an expression) it does
instead turn into a thread-private variable. Note: This means that if
one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop 
behave
like a closure function, but with the looping header at the 'right' 
place

rather than after the closure.


I'm not quite sure what the concrete changes to the CEP this would 
lead to
(assuming you mean this as a proposal for alternative semantics, and 
not an

implementation detail).


What I would like to avoid is having to tell users "and now for 
something completely different". It looks like a loop, but then 
there's a whole page of new semantics for it. And this also cannot be 
used in plain Python code due to the differing scoping behaviour.


Well, at least it's better than the 300 pages of semantics for OpenMP :-)





How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they
are a rather special case everywhere. I suppose keeping the reduction
clause above, or use the "nonlocal" keyword in the loop body...


That's what I thought, yes. It looks unexpected, sure. That's the 
clear advantage of using inner functions, which do not add anything 
new at all. But if we want to add something that looks more like a 
loop, we should at least make it behave like something that's easy to 
explain.


Sorry for not taking the opportunity to articulate my scepticism in 
the workshop discussion.



I like the idea of considering cdef/nonlocal in the prange blocks. But, 
yes, I do feel that opposing a parallel loop construct in general is 
rather late, or at least could have been done at a more convenient time...


All I know and care about is that a decorator-and-closure solution will 
be a lot more obscure among non-CS people who have no clue what a 
closure or decorator is, and those are exactly the people who need this 
kind of simple 80%-solution.  You and me don't really need any support 
from Cython at all to write multithreaded apps (leaving aesthetics and 
number of keystrokes to the side).


It'd be good to hear Robert's and Mark's opinions before going further, 
let's economise this thread a bit.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:27 PM, Nathaniel Smith wrote:

On Mon, Apr 4, 2011 at 3:17 AM, Dag Sverre Seljebotn
  wrote:

  * A simple and friendly solution that covers, perhaps, 80% of the cases,
based on simply replacing range with prange.

This is a "merely" aesthetic objection, while remaining agnostic on
the larger discussion, but -- 'for i in prange(...)' looks Just Wrong.
This is not a regular loop over a funny range, it's a funny loop over
a regular range. Surely it should be 'pfor i in range(...)'. Or better
yet, spell it 'parallel_for'.


I don't mind calling it "parallel_for" myself, if only a good place to 
provide scheduling parameters (numthreads, dynamic vs. static 
scheduling, chunksize) can be found. That would make it more obvious 
that scoping rules are different too.


No sense in discussing this further until the higher-level discussion on 
whether to do it or not has completed though.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 03:04 PM, Stefan Behnel wrote:


That's what I thought, yes. It looks unexpected, sure. That's the 
clear advantage of using inner functions, which do not add anything 
new at all. But if we want to add something that looks more like a 
loop, we should at least make it behave like something that's easy to 
explain.


Sorry for not taking the opportunity to articulate my scepticism in 
the workshop discussion. Skipping through the CEP now, I think this 
feature adds quite some complexity to the language, and I'm not sure 
it's worth that when compared to the existing closures. The equivalent 
closure+decorator syntax is certainly easier to explain, and could 
translate into exactly the same code. But with the clear advantage 
that the scope of local, nonlocal and thread-configuring variables is 
immediately obvious.


Basically, your example would become

def f(np.ndarray[double] x, double alpha):
cdef double s = 0

with cython.nogil:
@cython.run_parallel_for_loop( range(x.shape[0]) )
cdef threaded_loop(i):# 'nogil' is inherited
cdef double tmp = alpha * i
nonlocal s
s += x[i] * tmp
s += alpha * (x.shape[0] - 1)
return s

We likely agree that this is not beautiful. It's also harder to 
implement than a "simple" for-in-prange loop. But I find it at least 
easier to explain and semantically 'obvious'. And it would allow us to 
write a pure mode implementation for this based on the threading module.


Short clarification on this example: There is still magic going on here 
in the reduction variable -- one must have a version of "s" for each 
thread, and then reduce at the end.


(Stefan: I realize that you may know this, I'm just making sure 
everything is stated clearly in this discussion.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 05:22 PM, mark florisson wrote:

On 4 April 2011 13:53, Dag Sverre Seljebotn  wrote:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the
loop body are shared across threads. Variables that are only used in the
else block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as well.

* Reduction: Variables that only used on the LHS of an inplace
operator, such as s above, are marked as targets for reduction. If the
variable is also used in other ways (LHS of assignment or in an expression)
it does instead turn into a thread-private variable. Note: This means that
if one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop behave
like a closure function, but with the looping header at the 'right' place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead to
(assuming you mean this as a proposal for alternative semantics, and not an
implementation detail).

How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they are
a rather special case everywhere. I suppose keeping the reduction clause
above, or use the "nonlocal" keyword in the loop body...

Also there's the else:-block, although we could make that part of the scope.
And the "lastprivate" functionality, although that could be dropped without
much loss.


Also, in the example, the local variable declaration of "tmp" outside of
the loop looks somewhat misplaced, although it's precedented by
comprehensions (which also have their own local scope in Cython).

Well, depending on the decision of lastprivate, the declaration would need
to be outside; I really like the idea of moving "cdef", and am prepared to
drop lastprivate for this.

Being explicit about thread-local variables does make things a lot safer to
use.

(One problem is that switching between serial and parallel one needs to move
variable declarations. But that only happens once, and one can use
"nthreads=1" to disable parallel after that.)

An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace operators
# printf(...s) ->  now leads to error, since s is not declared
thread-private but is read
else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

I think since we are disallowing break (yet) we shouldn't support the
else clause. Basically, I think we can make the CEP a tad more simple.

I think we could declare everything outside of the prange body. Then,
in the prange loop body:

 if a variable is assigned to anywhere ->  make it lastprivate
 - if a variable is read before assigned to ->  make it
firstprivate in addition to lastprivate (raise compiler error if the
variable is not initialized outside of the loop body)

 if a variable is only ever read ->  make it shared (the default for OpenMP)

 if a variable has an inplace operator ->  make it a reduction

There is really no reason to disallow reading of the reduction
variable (in e.g. a printf). The reduction should also be initialized
outside of the prange body.


The reason for disallowing reading the reduction variable is that 
otherwise you have a contradiction above, since a reduction variable may 
also be a thread-local variable. Or, you disable inplace operators for 
thread-local variables? (ugh)


That's the main reason I'm leaning towards explicit declaring local 
variables using "cdef".


If we're reducing complexity BTW, I'd rather remove 
firstprivate/lastprivate alltogether, see below.



Then prange() could be implemented in pure mode as simply the
sequential version, i.e. range() which some

Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/04/2011 09:26 PM, mark florisson wrote:

On 4 April 2011 19:18, Dag Sverre Seljebotn  wrote:

On 04/04/2011 05:22 PM, mark florisson wrote:

On 4 April 2011 13:53, Dag Sverre Seljebotn
  wrote:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the
loop body are shared across threads. Variables that are only used in the
else block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as
well.

* Reduction: Variables that only used on the LHS of an inplace
operator, such as s above, are marked as targets for reduction. If the
variable is also used in other ways (LHS of assignment or in an
expression)
it does instead turn into a thread-private variable. Note: This means
that
if one, e.g., inserts printf(... s) above, s is turned into a
thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop
behave
like a closure function, but with the looping header at the 'right'
place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead
to
(assuming you mean this as a proposal for alternative semantics, and not
an
implementation detail).

How would we treat reduction variables? They need to be supported, and
there's nothing in Python semantics to support reduction variables, they
are
a rather special case everywhere. I suppose keeping the reduction clause
above, or use the "nonlocal" keyword in the loop body...

Also there's the else:-block, although we could make that part of the
scope.
And the "lastprivate" functionality, although that could be dropped
without
much loss.


Also, in the example, the local variable declaration of "tmp" outside of
the loop looks somewhat misplaced, although it's precedented by
comprehensions (which also have their own local scope in Cython).

Well, depending on the decision of lastprivate, the declaration would
need
to be outside; I really like the idea of moving "cdef", and am prepared
to
drop lastprivate for this.

Being explicit about thread-local variables does make things a lot safer
to
use.

(One problem is that switching between serial and parallel one needs to
move
variable declarations. But that only happens once, and one can use
"nthreads=1" to disable parallel after that.)

An example would then be:

def f(np.ndarray[double] x, double alpha):
cdef double s = 0, globtmp
with nogil:
for i in prange(x.shape[0]):
cdef double tmp # thread-private
tmp = alpha * i # alpha available from global scope
s += x[i] * tmp # still automatic reduction for inplace
operators
# printf(...s) ->now leads to error, since s is not declared
thread-private but is read
else:
# tmp still available here...looks a bit strange, but useful
s += tmp * 10
globtmp = tmp # we save tmp for later
# tmp not available here, globtmp is
return s

Or, we just drop support for the else block on these loops.

I think since we are disallowing break (yet) we shouldn't support the
else clause. Basically, I think we can make the CEP a tad more simple.

I think we could declare everything outside of the prange body. Then,
in the prange loop body:

 if a variable is assigned to anywhere ->make it lastprivate
 - if a variable is read before assigned to ->make it
firstprivate in addition to lastprivate (raise compiler error if the
variable is not initialized outside of the loop body)

 if a variable is only ever read ->make it shared (the default for
OpenMP)

 if a variable has an inplace operator ->make it a reduction

There is really no reason to disallow reading of the reduction
variable (in e.g. a printf). The reduction should also be initialized
outside of the prange body.

The reason for disallowing reading the reduction variable is that otherwise
you have a contradiction above, since a reduction variable may also be a
thread-local variable. Or, you disable inplace operators for thread-local
variables? (ugh)

Yes, an inplace operator would make it a reduction variable, just like
assigning something makes it lastprivate, only reading makes it shared
and reading before writing makes it firstprivate in addition to
lastprivate. This is all imp

Re: [Cython] CEP: prange for parallel loops

2011-04-04 Thread Dag Sverre Seljebotn

On 04/05/2011 07:05 AM, Robert Bradshaw wrote:

On Mon, Apr 4, 2011 at 6:04 AM, Stefan Behnel  wrote:

Dag Sverre Seljebotn, 04.04.2011 13:53:

On 04/04/2011 01:23 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 04.04.2011 12:17:

CEP up at http://wiki.cython.org/enhancements/prange

"""
Variable handling

Rather than explicit declaration of shared/private variables we rely on
conventions:

* Thread-shared: Variables that are only read and not written in the loop
body are shared across threads. Variables that are only used in the else
block are considered shared as well.

* Thread-private: Variables that are assigned to in the loop body are
thread-private. Obviously, the iteration counter is thread-private as
well.

* Reduction: Variables that only used on the LHS of an inplace operator,
such as s above, are marked as targets for reduction. If the variable is
also used in other ways (LHS of assignment or in an expression) it does
instead turn into a thread-private variable. Note: This means that if
one, e.g., inserts printf(... s) above, s is turned into a thread-local
variable. OTOH, there is simply no way to correctly emulate the effect
printf(... s) would have in a sequential loop, so such code must be
discouraged anyway.
"""

What about simply (ab-)using Python semantics and creating a new inner
scope for the prange loop body? That would basically make the loop behave
like a closure function, but with the looping header at the 'right' place
rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead to
(assuming you mean this as a proposal for alternative semantics, and not
an
implementation detail).

What I would like to avoid is having to tell users "and now for something
completely different". It looks like a loop, but then there's a whole page
of new semantics for it. And this also cannot be used in plain Python code
due to the differing scoping behaviour.

The same could be said of OpenMP--it looks exactly like a loop except
for a couple of pragmas.

The proposed (as I'm reading the CEP now) semantics of what's shared
and first/last private and reduction would give it the semantics of a
normal, sequential loop (and if your final result changes based on how
many threads were involved then you've got incorrect code). Perhaps
reading of the reduction variable could be fine (though obviously
ill-defined, suitable only for debugging).


So would you disable inplace operators for thread-private variables? 
Otherwise a variable could be both a reduction variable and 
thread-private...


There's a reason I disabled reading the reduction variable (which I 
should have written down).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Another CEP: Parallel block

2011-04-05 Thread Dag Sverre Seljebotn
There's a (much shorter) proposal for a more explicit parallelism 
construct at


http://wiki.cython.org/enhancements/parallelblock

This is a little more verbose for the simplest case, but makes the 
medium-cases that needs work buffers much simpler, and is also more 
explicit and difficult to get wrong.


I am not sure myself which one I prefer of this and prange.

Justification for Cython-specific syntax: This is something that is 
really only useful if you can release the GIL *outside* of the loop. So 
I feel this is an area where a custom Cython solution is natural, sort 
of like "cdef extern", and the buffer access.


Since a similar pure-Python solution is rather useless, I also think 
there's less incentive for making something that works well in 
pure-Python mode.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Another CEP: Parallel block

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 11:01 AM, Stefan Behnel wrote:

mark florisson, 05.04.2011 10:44:

On 5 April 2011 10:34, Stefan Behnel wrote:

mark florisson, 05.04.2011 10:26:


On 5 April 2011 09:21, Dag Sverre Seljebotn wrote:


Justification for Cython-specific syntax: This is something that is
really
only useful if you can release the GIL *outside* of the loop. So I 
feel

this
is an area where a custom Cython solution is natural, sort of like 
"cdef

extern", and the buffer access.

Since a similar pure-Python solution is rather useless, I also think
there's
less incentive for making something that works well in pure-Python 
mode.


Which feature is Cython specific here? The 'with a, b as c:' thing?


No, the syntax is just Python. It's the scoping that's Cython specific,
including the local variable declarations inside of the "with" block.


Hmm, but you can use cython.declare() for that, no?


cython.declare() is a no-op (or just a plain assignment) in Python. 
But the thread-local scoping of these variables cannot be emulated in 
Python. So this would be a feature that cannot be used in pure Python 
mode, unlike closures.


The intention of prange was certainly to fall back to a normal 
single-threaded range in Python mode.


Because of the GIL there would rarely be any benefit in running the loop 
in parallel -- only if you immediately dispatch to a long-running task 
that itself releases the GIL, but in those cases you should rather stick 
to pure Python in the first place and not bother with prange.


I think the chance of seeing real-life code that both requires prange to 
run optimally in Cython, and that would not be made slower by more than 
one thread in Python, is pretty close to zero.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel  wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).


-1, I think that's too obscure. Not being able to use inplace operators 
for certain variables will be at the very least be nagging.


I think we need to explicitly declare something. Either a simple 
prange(..., reduce="s:+"), or all-out declaration of thread-local variables.


Reduction isn't *that* common, so perhaps that is what should be 
explicit, unlike my other proposal...


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel  
wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).


-1, I think that's too obscure. Not being able to use inplace 
operators for certain variables will be at the very least be nagging.


I think we need to explicitly declare something. Either a simple 
prange(..., reduce="s:+"), or all-out declaration of thread-local 
variables.


Sorry: prange(..., reduce="s"), or perhaps &s or cython.address(s). The 
+ is of course still specified in code.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] CEP: prange for parallel loops

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 05:26 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
  wrote:

On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:

On 04/05/2011 04:53 PM, Robert Bradshaw wrote:

On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel
  wrote:

mark florisson, 04.04.2011 21:26:

For clarity, I'll add an example:

def f(np.ndarray[double] x, double alpha):
 cdef double s = 0
 cdef double tmp = 2
 cdef double other = 6.6

 with nogil:
 for i in prange(x.shape[0]):
 # reading 'tmp' makes it firstprivate in addition to
lastprivate
 # 'other' is only ever read, so it's shared
 printf("%lf %lf %lf\n", tmp, s, other)

So, adding a printf() to your code can change the semantics of your
variables? That sounds like a really bad design to me.

That's what I was thinking. Basically, if you do an inlace operation,
then it's a reduction variable, no matter what else you do to it
(including possibly a direct assignment, though we could make that a
compile-time error).

-1, I think that's too obscure. Not being able to use inplace operators
for certain variables will be at the very least be nagging.

You could still use inplace operators to your hearts content--just
don't bother using the reduced variable outside the loop. (I guess I'm
assuming reducing a variable has negligible performance overhead,
which it should.) For the rare cases that you want the non-aggregated
private, make an assignment to another variable, or use non-inplace
operations.


Ahh! Of course! With some control flow analysis we could even eliminate 
the reduction if the variable isn't used after the loop, although I 
agree the cost should be trivial.




Not being able to mix inplace operators might be an annoyance. We
could also allow explicit declarations, as per Pauli's suggestion, but
not require them. Essentially, as long as we have


I think you should be able to mix them, but if you do a reduction 
doesn't happen. This is slightly uncomfortable, but I believe control 
flow analysis and disabling firstprivate can solve it, see below.


I believe I'm back in the implicit-camp. And the CEP can probably be 
simplified a bit too, I'll try to do that tomorrow.


Two things:

 * It'd still be nice with something like a parallel block for thread 
setup/teardown rather than "if firstthreaditeration():". So, a prange 
for the 50% simplest cases, followed by a parallel-block for the next 30%.


 * Control flow analysis can help us tight it up a bit: For loops where 
you actually depend on values of thread-private variables computed in 
the previous iteration (beyond reduction), it'd be nice to raise a 
warning unless the variable is explicitly declared thread-local or 
similar. There are uses for such variables but they'd be rather rare, 
and such a hint could be very helpful.


I'm still not sure if we want firstprivate, even if we can do it. It'd 
be good to see a usecase for it. I'd rather have NaN and 0x7FFF 
personally, as relying on the firstprivate value is likely a bug -- yes, 
it makes the sequential case work, but that is exactly in the case where 
parallelizing the sequential case would be wrong!!


Grepping through 3 lines of heavily OpenMP-ified Fortran code here 
there's no mention of firstprivate or lastprivate (although we certainly 
want lastprivate to align with the sequential case).


Dag Sverre

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] prange CEP updated

2011-04-05 Thread Dag Sverre Seljebotn
I've done a pretty major revision to the prange CEP, bringing in a lot 
of the feedback.


Thread-private variables are now split in two cases:

 i) The safe cases, which really require very little technical 
knowledge -> automatically inferred


 ii) As an advanced feature, unsafe cases that requires some knowledge 
of threading -> must be explicitly declared


I think this split simplifies things a great deal.

I'm rather excited over this now; this could turn out to be a really 
user-friendly and safe feature that would not only allow us to support 
OpenMP-like threading, but be more convenient to use in a range of 
common cases.


http://wiki.cython.org/enhancements/prange 



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-05 Thread Dag Sverre Seljebotn

On 04/05/2011 10:29 PM, Dag Sverre Seljebotn wrote:
I've done a pretty major revision to the prange CEP, bringing in a lot 
of the feedback.


Thread-private variables are now split in two cases:

 i) The safe cases, which really require very little technical 
knowledge -> automatically inferred


 ii) As an advanced feature, unsafe cases that requires some knowledge 
of threading -> must be explicitly declared


I think this split simplifies things a great deal.

I'm rather excited over this now; this could turn out to be a really 
user-friendly and safe feature that would not only allow us to support 
OpenMP-like threading, but be more convenient to use in a range of 
common cases.


http://wiki.cython.org/enhancements/prange 
<http://wiki.cython.org/enhancements/prange#preview>


As a digression: threadlocal(int)-variables could also be supported 
elsewhere as syntax candy for the pythread.h Thread Local Storage, which 
would work well for fast TLS for any kind of threads (e.g., when using 
threading module).


Dag Sverre

(Sorry about the previous HTML-mail.)
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Cython paper in Computing in Science & Engineering

2011-04-06 Thread Dag Sverre Seljebotn
I just wanted to make everybody aware that there's a paper on Cython in 
this month's CiSE (http://cise.aip.org/).


http://dx.doi.org/10.1109/MCSE.2010.118 (paywall)

Researchers: Please consider citing this paper if Cython helps your 
research in non-trivial ways.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython paper in Computing in Science & Engineering

2011-04-06 Thread Dag Sverre Seljebotn

On 04/07/2011 02:12 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 4:40 PM, Zak Stone  wrote:

Researchers: Please consider citing this paper if Cython helps your
research in non-trivial ways.

Is this the canonical citation reference for Cython now?  If so, can this be
mentioned on the Cython webpage somewhere that is prominent enough to be
found?

On a related note, would it be possible to post a preprint somewhere
that isn't behind a paywall? If that's allowed, I would be delighted
to share the preprint with friends to introduce them to Cython.

Yes, I think we can post the pre-print, though I'm opposed to making
this the "canonical citation" just because of this paywall.


Is this for ideological or practical reasons?

This is probably the only paper in a "real" journal for some time, and 
citations are going to boost the authors' citation counts. Nobody would 
actually look up the citation anyway simply to learn about Cython, 
they'd just Google it. So unless we're trying to hide the existence of 
the paper, I think we should make it the default citation until there's 
something better.


Next time we've got anything to share in a paper, let's do it here:

http://www.openresearchcomputation.com/

Although that wasn't around when we started writing the paper.

Posting the pre-print is a matter of making the necesarry references 
within it and formatting it.


http://www.sherpa.ac.uk/romeo/search.php?jrule=ISSN&search=1521-9615 



I'll fix it and post a link later today.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] CiSE Cython paper: Preprint up

2011-04-07 Thread Dag Sverre Seljebotn

I should have put up this right away, sorry:

http://folk.uio.no/dagss/cython_cise.pdf

It is actually post-review, so it contains most things but some 
stylistic improvements and layout. Not sure about posting this on 
cython.org, but we could perhaps link to my webpage 
(http://folk.uio.no/dagss/) and say it is there...


The repo is here: https://github.com/dagss/cython-cise-postprint

If only the world could move to open access a bit quicker...

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython paper in Computing in Science & Engineering

2011-04-07 Thread Dag Sverre Seljebotn

On 04/07/2011 10:00 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 07.04.2011 07:54:

On 04/07/2011 02:12 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 4:40 PM, Zak Stone wrote:

Researchers: Please consider citing this paper if Cython helps your
research in non-trivial ways.

Is this the canonical citation reference for Cython now? If so, can
this be
mentioned on the Cython webpage somewhere that is prominent enough 
to be

found?

On a related note, would it be possible to post a preprint somewhere
that isn't behind a paywall? If that's allowed, I would be delighted
to share the preprint with friends to introduce them to Cython.

Yes, I think we can post the pre-print, though I'm opposed to making
this the "canonical citation" just because of this paywall.


Is this for ideological or practical reasons?


Both.



This is probably the only paper in a "real" journal for some time, and
citations are going to boost the authors' citation counts. Nobody would
actually look up the citation anyway simply to learn about Cython, 
they'd

just Google it.


Depends on the reference. If it's just cited as "you know, Cython", 
people will either look for "Cython" directly and be happy, or they 
may look up the paper, see that it's paid, and keep searching, either 
for the paper or for the project. If it's cited as "in that paper, you 
can read about doing X with Cython", then people will try even harder 
to get at the paper. In either case, chances are that they need to 
invest more time because of the reference, compared to a plain link in 
a footnote. So citing this article is likely to be an inconvenience 
for interested readers of papers that cite it.


I guess this depends on the paper and reader in question then. Myself 
I'd never bother with the paper but go right to the website. Citing is 
just "paying the authors of the software through improving their 
citation stats". Then again my field is unfortunately very much 
pyramid-scheme-inflicted.


I definitely think we should encourage giving a footnote as well.

How about just presenting the situation as it is in a "Citing Cython" 
section, and leave the decision up to who's citing Cython? ("If you 
don't like to cite a paywall paper, a website reference is OK. At any 
rate, please link to the website in a footnote the first time you 
mention Cython.")


Really, I hate the current situation as much as you do. But I see moving 
the world towards open access as the task of those whose already got a 
bit up the food chain; I'm just at the start of my PhD. (And it should 
be obvious I'm arguing with my own interests in mind here.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython paper in Computing in Science & Engineering

2011-04-07 Thread Dag Sverre Seljebotn

On 04/07/2011 10:01 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 10:54 PM, Dag Sverre Seljebotn
  wrote:

On 04/07/2011 02:12 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 4:40 PM, Zak Stonewrote:

Researchers: Please consider citing this paper if Cython helps your
research in non-trivial ways.

Is this the canonical citation reference for Cython now?  If so, can
this be
mentioned on the Cython webpage somewhere that is prominent enough to be
found?

On a related note, would it be possible to post a preprint somewhere
that isn't behind a paywall? If that's allowed, I would be delighted
to share the preprint with friends to introduce them to Cython.

Yes, I think we can post the pre-print, though I'm opposed to making
this the "canonical citation" just because of this paywall.

Is this for ideological or practical reasons?

Both.

Actually, opposed is probably too strong of a word here. I'm
disinclined, but there isn't really a better option. Currently, people
usually just cite the website, for whatever that's worth.
http://scholar.google.com/scholar?q=cython


And I don't think that's worth very much. To me it's really looking like 
CiSE citation or no citation at all.



Next time we've got anything to share in a paper, let's do it here:

http://www.openresearchcomputation.com/

Although that wasn't around when we started writing the paper.

Or at least look into this more carefully. Some of CiSE's papers are
open access, I (naively) thought ours wouldn't be hard to get to
either. It is a nice paper though and I think it'll hit a nice
audience (who primarily won't even be aware that they're paying
through it indirectly through university overhead and monolithic
library subscriptions).


I did the same mistake, because I couldn't see the paywall myself. At 
the time I actually had a hard time finding an internet connection that 
wouldn't transparently serve me the PDFs. And once I learned I figured 
it was a bit too late to back out.


I've learned a lot since then.

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython-users] CiSE Cython paper: Preprint up

2011-04-07 Thread Dag Sverre Seljebotn

On 04/07/2011 11:37 AM, René Rex wrote:

Any more keywords to add?

What about "Python"? ;)



Done and done. Thanks for the patches.

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython paper in Computing in Science & Engineering

2011-04-07 Thread Dag Sverre Seljebotn

On 04/07/2011 10:01 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 10:54 PM, Dag Sverre Seljebotn
  wrote:

On 04/07/2011 02:12 AM, Robert Bradshaw wrote:

On Wed, Apr 6, 2011 at 4:40 PM, Zak Stonewrote:

Researchers: Please consider citing this paper if Cython helps your
research in non-trivial ways.

Is this the canonical citation reference for Cython now?  If so, can
this be
mentioned on the Cython webpage somewhere that is prominent enough to be
found?

On a related note, would it be possible to post a preprint somewhere
that isn't behind a paywall? If that's allowed, I would be delighted
to share the preprint with friends to introduce them to Cython.

Yes, I think we can post the pre-print, though I'm opposed to making
this the "canonical citation" just because of this paywall.

Is this for ideological or practical reasons?

Both.

Actually, opposed is probably too strong of a word here. I'm
disinclined, but there isn't really a better option. Currently, people
usually just cite the website, for whatever that's worth.
http://scholar.google.com/scholar?q=cython


OK, I wrote this:

http://wiki.cython.org/FAQ#HowdoIciteCythoninanacademicpaper.3F

If any of you can think of something better that that, just do it -- I 
won't start an edit war :-)



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [GSoC] Python backend for Cython using PyPy's FFI

2011-04-07 Thread Dag Sverre Seljebotn

On 04/07/2011 05:01 PM, Romain Guillebert wrote:

Hi

I proposed the Summer of Code project regarding the Python backend for
Cython.

As I said in my proposal this would translate Cython code to Python +
FFI code (I don't know yet if it will use ctypes or something specific
to PyPy). PyPy's ctypes is now really fast and this will allow people to
port their Cython code to PyPy.

For the moment I've been mostly in touch with the PyPy people and they
seem happy with my proposal.

Of course I'm available for questions.


Disclaimer: I haven't read the proposal (don't have access yet but will 
soon). So perhaps the below is redundant.


This seems similar to Carl Witty's port of Cython to .NET/IronPython. An 
important insight from that project is that Cython code does NOT specify 
an ABI, only an API which requires a C compiler to make sense. That is; 
many wrapped C libraries have plenty of macros, we only require partial 
definition of struct, we only require approximate typedef's, and so on.


In the .NET port, the consequence was that rather than the original idea 
of generating C# code (with FFI specifications) was dropped, and one 
instead went with C++/CLR (which is a proper C++ compiler that really 
understands the C side on an API level, in addition to giving access to 
the .NET runtime).


There are two ways around this:

 a) In addition to Python code, generate C code that can take (the 
friendlest) APIs and probe for the ABIs (such as, for instance, getting 
the offset of each struct field from the base pointer). Of course, this 
must really be rerun for each platform/build of the wrapped library.


Essentially, you'd use Cython to generate C code that, in a target 
build, would generate Python code...


 b) Create a subset of the Cython language ("RCython" :-)), where you 
require explicit ABIs (essentially this means either disallowing "cdef 
extern from ...", or creating some new form of it). Most Cython 
extensions I know about would not work with this though, so there would 
need to be porting in each case. Ideally one should then have a similar 
mode for Cython+CPython so that one can debug with CPython as well.



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-11 Thread Dag Sverre Seljebotn

On 04/11/2011 10:45 AM, mark florisson wrote:

On 5 April 2011 22:29, Dag Sverre Seljebotn  wrote:

I've done a pretty major revision to the prange CEP, bringing in a lot of
the feedback.

Thread-private variables are now split in two cases:

  i) The safe cases, which really require very little technical knowledge ->
automatically inferred

  ii) As an advanced feature, unsafe cases that requires some knowledge of
threading ->  must be explicitly declared

I think this split simplifies things a great deal.


Can't we obsolete the declaration entirely by assigning to variables
that need to have firstprivate behaviour inside the with parallel
block? Basically in the same way the scratch space is used. The only
problem with that is that it won't be lastprivate, so the value will
be undefined after the parallel block (but not after the worksharing
loop).

cdef int myvariable

with nogil, parallel:
 myvariable = 2
 for i in prange(...):
 use myvariable
 maybe assign to myvariable

 # myvariable is well-defined here

# myvariable is not well-defined here

If you still desperately want lastprivate behaviour you can simply
assign myvariable to another variable in the loop body.


I don't care about lastprivate, I don't think that is an issue, as you say.

My problem with this is that it means going into an area where possibly 
tricky things are implicit rather than explicit. I also see this as a 
rather special case that will be seldomly used, and implicit behaviour 
is more difficult to justify because of that.


(The other instance of thread-local variables I feel is still explicit: 
You use prange instead of range, which means that you declare that 
values created in the iteration does not leak to the next iteration. The 
rest is just optimization from there.)


As Robert said in his recent talk: A lot of languages are easy to write. 
The advantage of Python is that it is easy to *read*. That's what I feel 
is wrong with the proposal above: An assignment to a variable changes 
the semantics of it. Granted, it happens in a way so that it will almost 
always be correct, but I feel that reading the code, I'd spend some 
extra cycles to go "ah, so this variable is thread-local and therefore 
its values survive across a loop iteration".


If I even knew about the feature in the first place. In seeing 
"threadprivate" spelled out, it is either obvious what it means, or 
obvious that I should look up the docs.


There's *a lot* of things that can be made implicit in a programming 
language; Python/Cython simply usually leans towards the explicit side.


Oh, and we may want to support writable shared variables (and flush) 
eventually too, and the above doesn't easily differentiate there?


That's just my opinion, I'm happy to be overruled here.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-11 Thread Dag Sverre Seljebotn

On 04/11/2011 11:41 AM, mark florisson wrote:

On 11 April 2011 11:10, Dag Sverre Seljebotn  wrote:

On 04/11/2011 10:45 AM, mark florisson wrote:


On 5 April 2011 22:29, Dag Sverre Seljebotn
  wrote:


I've done a pretty major revision to the prange CEP, bringing in a lot of
the feedback.

Thread-private variables are now split in two cases:

  i) The safe cases, which really require very little technical knowledge
->
automatically inferred

  ii) As an advanced feature, unsafe cases that requires some knowledge of
threading ->must be explicitly declared

I think this split simplifies things a great deal.


Can't we obsolete the declaration entirely by assigning to variables
that need to have firstprivate behaviour inside the with parallel
block? Basically in the same way the scratch space is used. The only
problem with that is that it won't be lastprivate, so the value will
be undefined after the parallel block (but not after the worksharing
loop).

cdef int myvariable

with nogil, parallel:
 myvariable = 2
 for i in prange(...):
 use myvariable
 maybe assign to myvariable

 # myvariable is well-defined here

# myvariable is not well-defined here

If you still desperately want lastprivate behaviour you can simply
assign myvariable to another variable in the loop body.


I don't care about lastprivate, I don't think that is an issue, as you say.

My problem with this is that it means going into an area where possibly
tricky things are implicit rather than explicit. I also see this as a rather
special case that will be seldomly used, and implicit behaviour is more
difficult to justify because of that.


Indeed, I actually considered if we should support firstprivate at
all, as it's really about "being firstprivate and lastprivate".
Without any declaration, you can have firstprivate or lastprivate, but
not both :) So I agree that supporting such a (probably) uncommon case
is better left explicit. On the other hand it seems silly to have
support for such a weird case.


Well, I actually need to do the per-thread cache thing I described in 
the CEP in my own codes, so it's not *that* special; it'd be nice to 
support it.


OTOH I *could* work around it by having an array of scalars

cdef int[:] old_ell = int[:numthreads]()

...
if old_ell[threadid()] != ell: ...


So I guess, it's at least on the bottom of list of priorities in that CEP.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-11 Thread Dag Sverre Seljebotn

On 04/11/2011 12:14 PM, mark florisson wrote:

On 11 April 2011 12:08, Dag Sverre Seljebotn  wrote:

On 04/11/2011 11:41 AM, mark florisson wrote:


On 11 April 2011 11:10, Dag Sverre Seljebotn
  wrote:


On 04/11/2011 10:45 AM, mark florisson wrote:


On 5 April 2011 22:29, Dag Sverre Seljebotn
  wrote:


I've done a pretty major revision to the prange CEP, bringing in a lot
of
the feedback.

Thread-private variables are now split in two cases:

  i) The safe cases, which really require very little technical
knowledge
->
automatically inferred

  ii) As an advanced feature, unsafe cases that requires some knowledge
of
threading ->  must be explicitly declared

I think this split simplifies things a great deal.


Can't we obsolete the declaration entirely by assigning to variables
that need to have firstprivate behaviour inside the with parallel
block? Basically in the same way the scratch space is used. The only
problem with that is that it won't be lastprivate, so the value will
be undefined after the parallel block (but not after the worksharing
loop).

cdef int myvariable

with nogil, parallel:
 myvariable = 2
 for i in prange(...):
 use myvariable
 maybe assign to myvariable

 # myvariable is well-defined here

# myvariable is not well-defined here

If you still desperately want lastprivate behaviour you can simply
assign myvariable to another variable in the loop body.


I don't care about lastprivate, I don't think that is an issue, as you
say.

My problem with this is that it means going into an area where possibly
tricky things are implicit rather than explicit. I also see this as a
rather
special case that will be seldomly used, and implicit behaviour is more
difficult to justify because of that.


Indeed, I actually considered if we should support firstprivate at
all, as it's really about "being firstprivate and lastprivate".
Without any declaration, you can have firstprivate or lastprivate, but
not both :) So I agree that supporting such a (probably) uncommon case
is better left explicit. On the other hand it seems silly to have
support for such a weird case.


Well, I actually need to do the per-thread cache thing I described in the
CEP in my own codes, so it's not *that* special; it'd be nice to support it.


You need 'old_ell' and 'alpha' after the loop?



No...but I need the values to not be blanked out at the beginning of 
each loop iteration!


Note that in the CEP, the implicitly thread-local variables are *not 
available* before the first assignment in the loop. That is, code such 
as this is NOT allowed:


cdef double x
...
for i in prange(10):
print x
x = f(x)

We raise a compiler error in such cases if we can: The code above is 
violating the contract that the order of execution of loop bodies should 
not matter.


In cases where we can't raise an error (because we didn't bother or 
because it is not possible with a proof), we still initialize the 
variables to invalid values (NaN for double) at the beginning of the 
for-loop just to be sure the contract is satisfied.


This was added to answer Stefan's objection to new types of implicit 
scopes (and I agree with his concern).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-11 Thread Dag Sverre Seljebotn

On 04/11/2011 01:02 PM, Dag Sverre Seljebotn wrote:

On 04/11/2011 12:14 PM, mark florisson wrote:

On 11 April 2011 12:08, Dag Sverre
Seljebotn wrote:

On 04/11/2011 11:41 AM, mark florisson wrote:


On 11 April 2011 11:10, Dag Sverre
Seljebotn
wrote:


On 04/11/2011 10:45 AM, mark florisson wrote:


On 5 April 2011 22:29, Dag Sverre
Seljebotn
wrote:


I've done a pretty major revision to the prange CEP, bringing in
a lot
of
the feedback.

Thread-private variables are now split in two cases:

i) The safe cases, which really require very little technical
knowledge
->
automatically inferred

ii) As an advanced feature, unsafe cases that requires some
knowledge
of
threading -> must be explicitly declared

I think this split simplifies things a great deal.


Can't we obsolete the declaration entirely by assigning to variables
that need to have firstprivate behaviour inside the with parallel
block? Basically in the same way the scratch space is used. The only
problem with that is that it won't be lastprivate, so the value will
be undefined after the parallel block (but not after the worksharing
loop).

cdef int myvariable

with nogil, parallel:
myvariable = 2
for i in prange(...):
use myvariable
maybe assign to myvariable

# myvariable is well-defined here

# myvariable is not well-defined here

If you still desperately want lastprivate behaviour you can simply
assign myvariable to another variable in the loop body.


I don't care about lastprivate, I don't think that is an issue, as you
say.

My problem with this is that it means going into an area where
possibly
tricky things are implicit rather than explicit. I also see this as a
rather
special case that will be seldomly used, and implicit behaviour is
more
difficult to justify because of that.


Indeed, I actually considered if we should support firstprivate at
all, as it's really about "being firstprivate and lastprivate".
Without any declaration, you can have firstprivate or lastprivate, but
not both :) So I agree that supporting such a (probably) uncommon case
is better left explicit. On the other hand it seems silly to have
support for such a weird case.


Well, I actually need to do the per-thread cache thing I described in
the
CEP in my own codes, so it's not *that* special; it'd be nice to
support it.


You need 'old_ell' and 'alpha' after the loop?



No...but I need the values to not be blanked out at the beginning of
each loop iteration!


Sorry, I now realize that re-reading your email I may have misunderstood 
you. Anyway, no, I don't need lastprivate at all anywhere.


Dag Sverre



Note that in the CEP, the implicitly thread-local variables are *not
available* before the first assignment in the loop. That is, code such
as this is NOT allowed:

cdef double x
...
for i in prange(10):
print x
x = f(x)

We raise a compiler error in such cases if we can: The code above is
violating the contract that the order of execution of loop bodies should
not matter.

In cases where we can't raise an error (because we didn't bother or
because it is not possible with a proof), we still initialize the
variables to invalid values (NaN for double) at the beginning of the
for-loop just to be sure the contract is satisfied.

This was added to answer Stefan's objection to new types of implicit
scopes (and I agree with his concern).

Dag Sverre


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-11 Thread Dag Sverre Seljebotn

On 04/11/2011 01:12 PM, mark florisson wrote:

On 11 April 2011 13:03, Dag Sverre Seljebotn  wrote:

On 04/11/2011 01:02 PM, Dag Sverre Seljebotn wrote:


On 04/11/2011 12:14 PM, mark florisson wrote:


On 11 April 2011 12:08, Dag Sverre
Seljebotn  wrote:


On 04/11/2011 11:41 AM, mark florisson wrote:


On 11 April 2011 11:10, Dag Sverre
Seljebotn
wrote:


On 04/11/2011 10:45 AM, mark florisson wrote:


On 5 April 2011 22:29, Dag Sverre
Seljebotn
wrote:


I've done a pretty major revision to the prange CEP, bringing in
a lot
of
the feedback.

Thread-private variables are now split in two cases:

i) The safe cases, which really require very little technical
knowledge
->
automatically inferred

ii) As an advanced feature, unsafe cases that requires some
knowledge
of
threading ->  must be explicitly declared

I think this split simplifies things a great deal.


Can't we obsolete the declaration entirely by assigning to variables
that need to have firstprivate behaviour inside the with parallel
block? Basically in the same way the scratch space is used. The only
problem with that is that it won't be lastprivate, so the value will
be undefined after the parallel block (but not after the worksharing
loop).

cdef int myvariable

with nogil, parallel:
myvariable = 2
for i in prange(...):
use myvariable
maybe assign to myvariable

# myvariable is well-defined here

# myvariable is not well-defined here

If you still desperately want lastprivate behaviour you can simply
assign myvariable to another variable in the loop body.


I don't care about lastprivate, I don't think that is an issue, as you
say.

My problem with this is that it means going into an area where
possibly
tricky things are implicit rather than explicit. I also see this as a
rather
special case that will be seldomly used, and implicit behaviour is
more
difficult to justify because of that.


Indeed, I actually considered if we should support firstprivate at
all, as it's really about "being firstprivate and lastprivate".
Without any declaration, you can have firstprivate or lastprivate, but
not both :) So I agree that supporting such a (probably) uncommon case
is better left explicit. On the other hand it seems silly to have
support for such a weird case.


Well, I actually need to do the per-thread cache thing I described in
the
CEP in my own codes, so it's not *that* special; it'd be nice to
support it.


You need 'old_ell' and 'alpha' after the loop?



No...but I need the values to not be blanked out at the beginning of
each loop iteration!


Sorry, I now realize that re-reading your email I may have misunderstood
you. Anyway, no, I don't need lastprivate at all anywhere.


Right, so basically you can rewrite your example by introducing the
parallel block (which doesn't add an indentation level as you're
already using nogil) and assigning to your variables that need to be
firstprivate there. The only thing you miss out on is lastprivate
behaviour. So basically, the question is, do we want explicit syntax
for such a rare case (firstprivate + lastprivate)?


OK, we're on the same page here.


I must say, I found your previous argument of future shared
declarations persuasive enough to introduce explicit syntax.


OK, lets leave it at this then, we don't have to agree for the same 
reasons :-)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-13 Thread Dag Sverre Seljebotn

On 04/13/2011 09:31 PM, mark florisson wrote:

On 5 April 2011 22:29, Dag Sverre Seljebotn  wrote:

I've done a pretty major revision to the prange CEP, bringing in a lot of
the feedback.

Thread-private variables are now split in two cases:

  i) The safe cases, which really require very little technical knowledge ->
automatically inferred

  ii) As an advanced feature, unsafe cases that requires some knowledge of
threading ->  must be explicitly declared

I think this split simplifies things a great deal.

I'm rather excited over this now; this could turn out to be a really
user-friendly and safe feature that would not only allow us to support
OpenMP-like threading, but be more convenient to use in a range of common
cases.

http://wiki.cython.org/enhancements/prange

Dag Sverre

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel




If we want to support cython.parallel.threadsavailable outside of
parallel regions (which does not depend on the schedule used for
worksharing constructs!), then we have to disable dynamic scheduling.
For instance, if OpenMP sees some OpenMP threads are already busy,
then with dynamic scheduling it dynamically establishes how many
threads to use for any parallel region.
So basically, if you put omp_get_num_threads() in a parallel region,
you have a race when you depend on that result in a subsequent
parallel region, because the number of busy OpenMP threads may have
changed.


Ah, I don't know why I thought there wouldn't be a race condition. I 
wonder if the whole threadsavailable() idea should just be ditched and 
that we should think of something else. It's not a very common usecase. 
Starting to disable some forms of scheduling just to, essentially, 
shoehorn in one particular syntax, doesn't seem like the way to go.


Perhaps this calls for support for the critical(?) block then, after 
all. I'm at least +1 on dropping threadsavailable() and instead require 
that you call numthreads() in a critical block:


with parallel:
with critical:
# call numthreads() and allocate global buffer
# calling threadid() not allowed, if we can manage that
# get buffer slice for each thread


So basically, to make threadsavailable() work outside parallel
regions, we'd have to disable dynamic scheduling (omp_set_dynamic(0)).
Of course, when OpenMP cannot request the amount of threads desired
(because they are bounded by a configurable thread limit (and the OS
of course)), the behaviour will be implementation defined. So then we
could just put a warning in the docs for that, and users can check for
this in the parallel region using threadsavailable() if it's really
important.


Do you have any experience with what actually happen with, say, GNU 
OpenMP? I blindly assumed from the specs that it was an error condition 
("flag an error any way you like"), but I guess that may be wrong.


Just curious, I think we can just fall back to OpenMP behaviour; unless 
it terminates the interpreter in an error condition, in which case we 
should look into how expensive it is to check for the condition up front...



Dag Sverre

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-14 Thread Dag Sverre Seljebotn

On 04/13/2011 11:13 PM, mark florisson wrote:


Although there is omp_get_max_threads():

"The omp_get_max_threads routine returns an upper bound on the number
of threads that could be used to form a new team if a parallel region
without a num_threads clause were encountered after execution returns
from this routine."

So we could have threadsvailable() evaluate to that if encountered
outside a parallel region. Inside, it would evaluate to
omp_get_num_threads(). At worst, people would over-allocate a bit.


Well, over-allocating could well mean 1 GB, which could well mean 
getting an unecesarry MemoryError (or, like in my case, if I'm not 
careful to set ulimit, getting a SIGKILL sent to you 2 minutes after the 
fact by the cluster patrol process...)


But even ignoring this, we also have to plan for people misusing the 
feature. If we put it in there, somebody somewhere *will* write code 
like this:


nthreads = threadsavailable()
with parallel:
for i in prange(nthreads):
for j in range(100*i, 100*(i+1)): [...]

(Yes, they shouldn't. Yes, they will.)

Combined with a race condition that will only very seldomly trigger, 
this starts to sound like a very bad idea indeed.


So I agree with you that we should just leave it for now, and do 
single/barrier later.


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-14 Thread Dag Sverre Seljebotn

On 04/14/2011 08:39 PM, mark florisson wrote:

On 14 April 2011 20:29, Dag Sverre Seljebotn  wrote:

On 04/13/2011 11:13 PM, mark florisson wrote:


Although there is omp_get_max_threads():

"The omp_get_max_threads routine returns an upper bound on the number
of threads that could be used to form a new team if a parallel region
without a num_threads clause were encountered after execution returns
from this routine."

So we could have threadsvailable() evaluate to that if encountered
outside a parallel region. Inside, it would evaluate to
omp_get_num_threads(). At worst, people would over-allocate a bit.


Well, over-allocating could well mean 1 GB, which could well mean getting an
unecesarry MemoryError (or, like in my case, if I'm not careful to set
ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
cluster patrol process...)


The upper bound is not "however many threads you think you can start",
but rather "how many threads are considered useful for your machine".
So if you use omp_set_num_threads(), it will return the value you set
there. Otherwise, if you have e.g. a quadcore, it will return 4. The
spec says:

"Note – The return value of the omp_get_max_threads routine can be
used to dynamically allocate sufficient storage for all threads in the
team formed at the subsequent active parallel region."

So this sounds like a viable option.


What would happen here: We have 8 cores. Some code has an OpenMP 
parallel section with maxthreads=2, and inside the section another 
function is called.


That called function uses threadsavailable(), and has a parallel block 
that wants as many threads as it can get.


I don't know the details as well as you do, but my uninformed guess is 
that in this case it'd be quite possible with a race where 
omp_get_max_threads would return 7 in each case, then the first one to 
the parallel would get the 7 threads. The remaining thread then has 
allocated storage for 7 threads but only has 1 thread running.


BTW, I'm not sure what the difference is between the original idea and 
omp_get_max_threads -- in the absence of such races as above, my 
original idea with entering a parallel section (with the same scheduling 
parameters) just to see how many threads we got, would work as well?


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-14 Thread Dag Sverre Seljebotn

On 04/14/2011 08:42 PM, mark florisson wrote:

On 14 April 2011 20:29, Dag Sverre Seljebotn  wrote:

On 04/13/2011 11:13 PM, mark florisson wrote:


Although there is omp_get_max_threads():

"The omp_get_max_threads routine returns an upper bound on the number
of threads that could be used to form a new team if a parallel region
without a num_threads clause were encountered after execution returns
from this routine."

So we could have threadsvailable() evaluate to that if encountered
outside a parallel region. Inside, it would evaluate to
omp_get_num_threads(). At worst, people would over-allocate a bit.


Well, over-allocating could well mean 1 GB, which could well mean getting an
unecesarry MemoryError (or, like in my case, if I'm not careful to set
ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
cluster patrol process...)

But even ignoring this, we also have to plan for people misusing the
feature. If we put it in there, somebody somewhere *will* write code like
this:

nthreads = threadsavailable()
with parallel:
for i in prange(nthreads):
for j in range(100*i, 100*(i+1)): [...]

(Yes, they shouldn't. Yes, they will.)

Combined with a race condition that will only very seldomly trigger, this
starts to sound like a very bad idea indeed.

So I agree with you that we should just leave it for now, and do
single/barrier later.


omp_get_max_threads() doesn't have a race, as it returns the upper
bound. So e.g. if between your call and your parallel section less
OpenMP threads become available, then you might get less threads, but
never more.


Oh, now I'm following you.

Well, my argument was that I think erroring in that direction is pretty 
bad as well.


Also, even if we're not making it available in cython.parallel, we're 
not stopping people from calling omp_get_max_threads directly 
themselves, which should be OK for the people who know enough to do this 
safely...


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-14 Thread Dag Sverre Seljebotn

On 04/14/2011 09:08 PM, mark florisson wrote:

On 14 April 2011 20:58, Dag Sverre Seljebotn  wrote:

On 04/14/2011 08:42 PM, mark florisson wrote:


On 14 April 2011 20:29, Dag Sverre Seljebotn
  wrote:


On 04/13/2011 11:13 PM, mark florisson wrote:


Although there is omp_get_max_threads():

"The omp_get_max_threads routine returns an upper bound on the number
of threads that could be used to form a new team if a parallel region
without a num_threads clause were encountered after execution returns
from this routine."

So we could have threadsvailable() evaluate to that if encountered
outside a parallel region. Inside, it would evaluate to
omp_get_num_threads(). At worst, people would over-allocate a bit.


Well, over-allocating could well mean 1 GB, which could well mean getting
an
unecesarry MemoryError (or, like in my case, if I'm not careful to set
ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
cluster patrol process...)

But even ignoring this, we also have to plan for people misusing the
feature. If we put it in there, somebody somewhere *will* write code like
this:

nthreads = threadsavailable()
with parallel:
for i in prange(nthreads):
for j in range(100*i, 100*(i+1)): [...]

(Yes, they shouldn't. Yes, they will.)

Combined with a race condition that will only very seldomly trigger, this
starts to sound like a very bad idea indeed.

So I agree with you that we should just leave it for now, and do
single/barrier later.


omp_get_max_threads() doesn't have a race, as it returns the upper
bound. So e.g. if between your call and your parallel section less
OpenMP threads become available, then you might get less threads, but
never more.


Oh, now I'm following you.

Well, my argument was that I think erroring in that direction is pretty bad
as well.

Also, even if we're not making it available in cython.parallel, we're not
stopping people from calling omp_get_max_threads directly themselves, which
should be OK for the people who know enough to do this safely...


True, but it wouldn't be as easy to wrap in a #ifdef _OPENMP. In any
event, we could just put a warning in the docs stating that using
threadsavailable outside parallel sections returns an upper bound on
the actual number of threads in a subsequent parallel section.


I don't think outside or within makes a difference -- what about nested 
parallel sections? At least my intention in the CEP was that 
threadsavailable was always for the next section (so often it would be 1 
after entering the section).


Perhaps just calling it "maxthreads" instead solves the issue.

(Still, I favour just dropping threadsavailable/maxthreads for the time 
being. It is much simpler to add something later, when we've had some 
time to use it and reflect about it, than to remove something that 
shouldn't have been added.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Incompatibility with numpy-1.5.1 - fixed in master?!

2011-04-15 Thread Dag Sverre Seljebotn

On 04/15/2011 06:45 PM, Lisandro Dalcin wrote:

On 15 April 2011 11:20, Yury V. Zaytsev  wrote:

Hi folks!

I have just ran into buffer protocol incompatibility problem with
numpy-1.5.1, which lead me to discover the following ticket (discussed
back in December 2010 on this list):

http://trac.cython.org/cython_trac/ticket/630

In despair, I was about to try to see if there is anything I can do to
fix it myself. To this end I cloned your git repository and set up my
Python environment to use the latest bleeding edge version from there.

To my surprise I discovered that my code started working and I don't
have the buffer interface problem that I was facing before anymore.

So my suggestion would be to maybe test it more thoroughly and if it is
indeed the case, close the ticket. I tried to subscribe to it or leave a
comment, but I need an account which I can't register on my own.



I'm opposed to close the ticket. Cython cannot currently parse format
strings according to the full spec. It worked for you just because of
some quick fixes I've pushed for simple cases.


Pauli Virtanen fixed this and there's a pull request here:

https://github.com/cython/cython/pull/17

I'll get to it in a couple of days if nobody beats me to it.

BTW, did anyone figure out how to get emailed on pull requests? (Yes, I 
checked off all the "send me email" boxes etc., we were talking about 
this during the workshop -- it seems that org admins can't subscribe...)


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Incompatibility with numpy-1.5.1 - fixed in master?!

2011-04-15 Thread Dag Sverre Seljebotn

On 04/15/2011 07:57 PM, Dag Sverre Seljebotn wrote:

On 04/15/2011 06:45 PM, Lisandro Dalcin wrote:

On 15 April 2011 11:20, Yury V. Zaytsev wrote:

Hi folks!

I have just ran into buffer protocol incompatibility problem with
numpy-1.5.1, which lead me to discover the following ticket (discussed
back in December 2010 on this list):

http://trac.cython.org/cython_trac/ticket/630

In despair, I was about to try to see if there is anything I can do to
fix it myself. To this end I cloned your git repository and set up my
Python environment to use the latest bleeding edge version from there.

To my surprise I discovered that my code started working and I don't
have the buffer interface problem that I was facing before anymore.

So my suggestion would be to maybe test it more thoroughly and if it is
indeed the case, close the ticket. I tried to subscribe to it or leave a
comment, but I need an account which I can't register on my own.



I'm opposed to close the ticket. Cython cannot currently parse format
strings according to the full spec. It worked for you just because of
some quick fixes I've pushed for simple cases.


Pauli Virtanen fixed this and there's a pull request here:

https://github.com/cython/cython/pull/17

I'll get to it in a couple of days if nobody beats me to it.

BTW, did anyone figure out how to get emailed on pull requests? (Yes, I
checked off all the "send me email" boxes etc., we were talking about
this during the workshop -- it seems that org admins can't subscribe...)


And BTW, the Cython test suite exposed a bug in NumPy as well -- you can 
see it in the test case comments. So there may be NumPy builds (and 
releases?) out there that fail the Cython test suite because of a bug in 
NumPy.


It only affect unpacked structs though; I believe you're good with 
packed structs (and few use NumPy with unpacked structs).


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-16 Thread Dag Sverre Seljebotn
(Moving discussion from http://markflorisson.wordpress.com/, where Mark 
said:)


"""
Started a new branch https://github.com/markflorisson88/cython/tree/openmp .

Now the question is whether sharing attributes should be propagated 
outwards. e.g. if you do


for i in prange(m):
for j in prange(n):
sum += i * j

then ‘sum’ is a reduction for the inner parallel loop, but not for the 
outer one. So the user would currently have to rewrite this to


for i in prange(m):
for j in prange(n):
sum += i * j
sum += 0

which seems a bit silly  . Of course, we could just disable nested 
parallelism, or tell the users to use a prange and a ‘for from’ in such 
cases.

"""

Dag: Interesting. The first one is definitely the behaviour we want, as 
long as it doesn't cause unintended consequences.


I don't really think it will -- the important thing is that that the 
order of loop iteration evaluation must be unimportant. And that is 
still true (for the outer loop, as well as for the inner) in your first 
example.


Question: When you have nested pranges, what will happen is that two 
nested OpenMP parallel blocks are used, right? And do you know if there 
is complete freedom/"reentrancy" in that variables that are 
thread-private in an outer parallel block and be shared in an inner one, 
and vice versa?


If so I'd think that this algorithm should work and feel natural:

 - In each prange, for the purposes of variable 
private/shared/reduction inference, consider all internal "prange" just 
as if they had been "range"; no special treatment.


 - Recurse to children pranges.

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-18 Thread Dag Sverre Seljebotn
(apologies for top post)

This all seems to scream 'disallow' to me, in particular since some openmp 
implementations may not support it etc.

At any rate I feel 'parallel/parallel/prange/prange' is going to far; so next 
step could be to only allowing 'parallel/prange/parallel/prange'.

But really, my feeling is that if you really do need this then you can always 
write a seperate function for the inner loop (I honestly can't think of a 
usecase anyway...). So I'd really drop it; at least until the rest of the gsoc 
project is completed :)

DS
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mark florisson  wrote:

On 16 April 2011 18:42, Dag Sverre Seljebotn  
wrote: > (Moving discussion from http://markflorisson.wordpress.com/, where 
Mark > said:) Ok, sure, it was just an issue I was wondering about at that 
moment, but it's a tricky issue, so thanks. > """ > Started a new branch 
https://github.com/markflorisson88/cython/tree/openmp . > > Now the question is 
whether sharing attributes should be propagated > outwards. e.g. if you do > > 
for i in prange(m): >for j in prange(n): >sum += i * j > > then 
‘sum’ is a reduction for the inner parallel loop, but not for the outer > one. 
So the user would currently have to rewrite this to > > for i in prange(m): >   
 for j in prange(n): >sum += i * j >sum += 0 > > which seems a bit 
silly  . Of course, we could just disable nested > parallelism, or tell the 
users to use a prange and a ‘for from’ in such > cases. > """ > > Dag: 
Interesting. The first one is definitely the behaviour we want, 
 as long
> as it doesn't cause unintended consequences. > > I don't really think it will 
> -- the important thing is that that the order > of loop iteration evaluation 
> must be unimportant. And that is still true > (for the outer loop, as well as 
> for the inner) in your first example. > > Question: When you have nested 
> pranges, what will happen is that two nested > OpenMP parallel blocks are 
> used, right? And do you know if there is complete > freedom/"reentrancy" in 
> that variables that are thread-private in an outer > parallel block and be 
> shared in an inner one, and vice versa? An implementation may or may not 
> support it, and if it is supported the behaviour can be configured through 
> omp_set_nested(). So we should consider the case where it is supported and 
> enabled. If you have a lastprivate or reduction, and after the loop these are 
> (reduced and) assigned to the original variable. So if that happens inside a 
> parallel construct which does not declare the variable private to the 
> construct
 , you
actually have a race. So e.g. the nested prange currently races in the outer 
parallel range. > If so I'd think that this algorithm should work and feel 
natural: > >  - In each prange, for the purposes of variable 
private/shared/reduction > inference, consider all internal "prange" just as if 
they had been "range"; > no special treatment. > >  - Recurse to children 
pranges. Right, that is most natural. Algorithmically, reductions and 
lastprivates (as those can have races if placed in inner parallel constructs) 
propagate outwards towards the outermost parallel block, or up to the first 
parallel with block, or up to the first construct that already determined the 
sharing attribute. e.g. with parallel: with parallel: for i in prange(n): for j 
in prange(n): sum += i * j # sum is well-defined here # sum is undefined here 
Here 'sum' is a reduction for the two innermost loops. 'sum' is not private for 
the inner parallel with block, as a prange in a parallel with block is a 
worksharin
 g loop
that binds to that parallel with block. However, the outermost parallel with 
block declares sum (and i and j) private, so after that block all those 
variables become undefined. However, in the outermost parallel with block, sum 
will have to be initialized to 0 before anything else, or be declared 
firstprivate, otherwise 'sum' is undefined to begin with. Do you think 
declaring it firstprivate would be the way to go, or should we make it private 
and issue a warning or perhaps even an error? > DS 
>_
> cython-devel mailing list > cython-devel@python.org > 
> http://mail.python.org/mailman/listinfo/cython-devel 
> >_
cython-devel mailing list cython-devel@python.org 
http://mail.python.org/mailman/listinfo/cython-devel 

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-04-18 Thread Dag Sverre Seljebotn
Excellent! Sounds great! (as I won't have my laptop for some days I can't have 
a look yet but I will later)

You're right about (the current) buffers and the gil. A testcase explicitly for 
them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better to 
take a break from it at this point so that we can think more about that and not 
do anything rash; perhaps open up a specific thread on them and ask for more 
general input. Perhaps you want to take a break or task-switch to something 
else (fused types?) until I can get around to review and merge what you have so 
far? You'll know best what works for you though. If you decide to implement 
explicit threadprivate variables because you've got the flow I certainly wom't 
object myself.


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mark florisson  wrote:

On 18 April 2011 13:06, mark florisson  wrote: > On 
16 April 2011 18:42, Dag Sverre Seljebotn  wrote: 
>> (Moving discussion from http://markflorisson.wordpress.com/, where Mark >> 
said:) > > Ok, sure, it was just an issue I was wondering about at that moment, 
> but it's a tricky issue, so thanks. > >> """ >> Started a new branch 
https://github.com/markflorisson88/cython/tree/openmp . >> >> Now the question 
is whether sharing attributes should be propagated >> outwards. e.g. if you do 
>> >> for i in prange(m): >>for j in prange(n): >>sum += i * j >> 
>> then ‘sum’ is a reduction for the inner parallel loop, but not for the outer 
>> one. So the user would currently have to rewrite this to >> >> for i in 
prange(m): >>for j in prange(n): >>sum += i * j >>sum += 0 >> 
>> which seems a bit silly  . Of course, we could just disable nested >> 
parallelism, or tell the users to use a prange and a ‘fo
 r
from’ in such >> cases. >> """ >> >> Dag: Interesting. The first one is 
definitely the behaviour we want, as long >> as it doesn't cause unintended 
consequences. >> >> I don't really think it will -- the important thing is that 
that the order >> of loop iteration evaluation must be unimportant. And that is 
still true >> (for the outer loop, as well as for the inner) in your first 
example. >> >> Question: When you have nested pranges, what will happen is that 
two nested >> OpenMP parallel blocks are used, right? And do you know if there 
is complete >> freedom/"reentrancy" in that variables that are thread-private 
in an outer >> parallel block and be shared in an inner one, and vice versa? > 
> An implementation may or may not support it, and if it is supported > the 
behaviour can be configured through omp_set_nested(). So we should > consider 
the case where it is supported and enabled. > > If you have a lastprivate or 
reduction, and after the loop these are > (reduced and) as
 signed
to the original variable. So if that happens > inside a parallel construct 
which does not declare the variable > private to the construct, you actually 
have a race. So e.g. the nested > prange currently races in the outer parallel 
range. > >> If so I'd think that this algorithm should work and feel natural: 
>> >>  - In each prange, for the purposes of variable private/shared/reduction 
>> inference, consider all internal "prange" just as if they had been "range"; 
>> no special treatment. >> >>  - Recurse to children pranges. > > Right, that 
is most natural. Algorithmically, reductions and > lastprivates (as those can 
have races if placed in inner parallel > constructs) propagate outwards towards 
the outermost parallel block, > or up to the first parallel with block, or up 
to the first construct > that already determined the sharing attribute. > > 
e.g. > > with parallel: > with parallel: >for i in prange(n): > 
   for j in prange(n): >sum += i
  * j > 
   # sum is well-defined here > # sum is undefined here > > Here 'sum' is a 
reduction for the two innermost loops. 'sum' is not > private for the inner 
parallel with block, as a prange in a parallel > with block is a worksharing 
loop that binds to that parallel with > block. However, the outermost parallel 
with block declares sum (and i > and j) private, so after that block all those 
variables become > undefined. > > However, in the outermost parallel with 
block, sum will have to be > initialized to 0 before anything else, or be 
declared firstprivate, > otherwise 'sum' is undefined to begin with. Do you 
thin

Re: [Cython] prange CEP updated

2011-04-21 Thread Dag Sverre Seljebotn

On 04/21/2011 10:37 AM, Robert Bradshaw wrote:

On Mon, Apr 18, 2011 at 7:51 AM, mark florisson
  wrote:

On 18 April 2011 16:41, Dag Sverre Seljebotn  wrote:

Excellent! Sounds great! (as I won't have my laptop for some days I can't
have a look yet but I will later)

You're right about (the current) buffers and the gil. A testcase explicitly
for them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better to
take a break from it at this point so that we can think more about that and
not do anything rash; perhaps open up a specific thread on them and ask for
more general input. Perhaps you want to take a break or task-switch to
something else (fused types?) until I can get around to review and merge
what you have so far? You'll know best what works for you though. If you
decide to implement explicit threadprivate variables because you've got the
flow I certainly wom't object myself.


  Ok, cool, I'll move on :) I already included a test with a prange and
a numpy buffer with indexing.


Wow, you're just plowing away at this. Very cool.

+1 to disallowing nested prange, that seems to get really messy with
little benefit.

In terms of the CEP, I'm still unconvinced that firstprivate is not
safe to infer, but lets leave the initial values undefined rather than
specifying them to be NaNs (we can do that as an implementation if you
want), which will give us flexibility to change later once we've had a
chance to play around with it.


I don't see any technical issues with inferring firstprivate, the 
question is whether we want to. I suggest not inferring it in order to 
make this safer: One should be able to just try to change a loop from 
"range" to "prange", and either a) have things fail very hard, or b) 
just work correctly and be able to trust the results.


Note that when I suggest using NaN, it is as initial values for EACH 
ITERATION, not per-thread initialization. It is not about "firstprivate" 
or not, but about disabling thread-private variables entirely in favor 
of "per-iteration" variables.


I believe that by talking about "readonly" and "per-iteration" 
variables, rather than "thread-shared" and "thread-private" variables, 
this can be used much more safely and with virtually no knowledge of the 
details of threading. Again, what's in my mind are scientific 
programmers with (too) little training.


In the end it's a matter of taste and what is most convenient to more 
users. But I believe the case of needing real thread-private variables 
that preserves per-thread values across iterations (and thus also can 
possibly benefit from firstprivate) is seldomly enough used that an 
explicit declaration is OK, in particular when it buys us so much in 
safety in the common case.


To be very precise,

cdef double x, z
for i in prange(n):
x = f(x)
z = f(i)
...

goes to

cdef double x, z
for i in prange(n):
x = z = nan
x = f(x)
z = f(i)
...

and we leave it to the C compiler to (trivially) optimize away "z = 
nan". And, yes, it is a stopgap solution until we've got control flow 
analysis so that we can outright disallow such uses of x (without 
threadprivate declaration, which also gives firstprivate behaviour).






The "cdef threadlocal(int) foo" declaration syntax feels odd to me...
We also probably want some way of explicitly marking a variable as
shared and still be able to assign to/flush/sync it. Perhaps the
parallel context could be used for these declarations, i.e.

 with parallel(threadlocal=a, shared=(b,c)):
 ...

which would be considered an "expert" usecase.


I'm not set on the syntax for threadlocal variables; although your 
proposal feels funny/very unpythonic, almost like a C macro. For some 
inspiration, here's the Python solution (with no obvious place to put 
the type):


import threading
mydata = threading.local()
mydata.myvar = ... # value is threadprivate


For all the discussion of threadsavailable/threadid, the most common
usecase I see is for allocating a large shared buffer and partitioning
it. This seems better handled by allocating separate thread-local
buffers, no? I still like the context idea, but everything in a
parallel block before and after the loop(s) also seems like a natural
place to put any setup/teardown code (though the context has the
advantage that __exit__ is always called, even if exceptions are
raised, which makes cleanup a lot easier to handle).


I'd *really* like to have try/finally available in cython.parallel block 
for this, although I realize that may have to wait for a while. A big 
part of our discussions at the workshop were about how to handle 
exceptions; I guess there'll be a "phase 2" of this where 
break/continue/raise is dealt with.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-04-29 Thread Dag Sverre Seljebotn

On 04/29/2011 12:53 PM, mark florisson wrote:

On 29 April 2011 12:28, Pauli Virtanen  wrote:

Fri, 29 Apr 2011 11:30:19 +0200, mark florisson wrote:

On 29 April 2011 11:03, Pauli Virtanen  wrote:

[clip]

Are you planning to special-case the "real_t complex" syntax? Shooting
from the sidelines, one more generic solution might be, e.g.,


I'm sorry, I'm not sure what syntax you are referring to. Are you
talking about actual complex numbers?


This:

On 28 April 2011 23:30, Robert Bradshaw
wrote:

OK, I take back what I said, I was looking at the RHS, not the LHS. If
one needs to specialize in this manner, explicitly creating two
branches should typically be enough. The same for casting. The one
exception (perhaps) is "my_fused_type complex." Otherwise it's
starting to feel too much like C++ template magic and complexity for
little additional benefit.


That is, declaring a complex type matching a real one.


Ah, I see what you mean now.


ctypedef cython.fused_type(A, B) struct_t
ctypedef cython.fused_type(float, double, paired=struct_t) real_t
ctypedef cython.fused_type(int_t, string_t, paired=struct_t) var_t

and just restrict the specialization to cases that make sense.


The paired means you're declaring types of attributes?


No, just that real_t is specialized to float whenever struct_t is specialized
to A and to double when B. Or a more realistic example,

ctypedef cython.fused_type(float, double) real_t
ctypedef cython.fused_type(float complex, double complex) complex_t

cdef real_plus_one(complex_t a):
real_t b = a.real
return b + 1

which I suppose would not be a very unusual thing in numerical codes.


Did you mean

ctypedef cython.fused_type(float complex, double complex,
   paired=real_t) complex_t

?


This would also allow writing the case you had earlier as

cdef cython.fused_type(string_t, int, paired=struct_t) attr_t

cdef func(struct_t mystruct, int i):
cdef attr_t var

if typeof(mystruct) is typeof(int):
var = mystruct.attrib + i
...
else:
var = mystruct.attrib + i
...

Things would need to be done explicitly instead of implicitly, though,
but it would remove the need for any special handling of
the "complex" keyword.


I see, so it's like a mapping. So, I didn't realize that you can't do this:

def func(arbitrary_type complex x):
 ...



We could support this, but I don't think it is powerful enough. I see 
some code that require pairings like this:


ctypedef cython.fused_type(float, double, float complex, \
 double complex) complex_or_float_t

ctypedef cython.fused_type(float, double, float, \
 double) only_float_t


cdef func(complex_or_float_t x):
cdef only_float_t y
...

So IIUC, one could here add "paired=complex_or_float_t" to say that 
only_float_t links positionally to corresponding types in 
complex_or_float_t.


Perhaps "pair_up_with="? "given_by="?

Anyway, I'm wondering if the special case of complex could be handled by 
having magical built-in fused types for the floating point for these 
purposes, and that these would suffice *shrug*.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-04-30 Thread Dag Sverre Seljebotn

On 04/30/2011 08:39 AM, Robert Bradshaw wrote:

On Fri, Apr 29, 2011 at 3:53 AM, mark florisson
  wrote:

On 29 April 2011 12:28, Pauli Virtanen  wrote:

No, just that real_t is specialized to float whenever struct_t is specialized
to A and to double when B. Or a more realistic example,

ctypedef cython.fused_type(float, double) real_t
ctypedef cython.fused_type(float complex, double complex) complex_t

cdef real_plus_one(complex_t a):
real_t b = a.real
return b + 1

which I suppose would not be a very unusual thing in numerical codes.
This would also allow writing the case you had earlier as

cdef cython.fused_type(string_t, int, paired=struct_t) attr_t

cdef func(struct_t mystruct, int i):
cdef attr_t var

if typeof(mystruct) is typeof(int):
var = mystruct.attrib + i
...
else:
var = mystruct.attrib + i
...

Things would need to be done explicitly instead of implicitly, though,
but it would remove the need for any special handling of
the "complex" keyword.


If we're going to introduce pairing, another option would be

 ctypedef fused_type((double complex, double), (float complex,
float)) (complex_t, real_t)

though I'm not sure I like that either. We're not trying to create the
all-powerful templating system here, and anything that can be done
with pairing can be done (though less elegantly) via branching on the
types, or, as Pauli mentions, using a wider type is often (but not
always) a viable option.


Keeping the right balance is difficult. But, at least there's some cases 
of needing this in various codebases when interfacing with LAPACK.


Most uses of templating with Cython code I've seen so far does a similar 
kind of "zip" as what you have above (as we discussed on the workshop). 
So at least the usage pattern you write above is very common.


float32 is not about to disappear, it really is twice as fast when 
you're memory IO bound.


Using a wider type is actually quite often not possible; any time the 
type is involved as the base type of an array it is not possible, and 
that's a pretty common case. (With LAPACK you take the address of the 
variable and pass it to Fortran, so using a wider type is not possible 
there either, although I'll agree that's a more remote case.)


My proposal: Don't support either "real_t complex" or paired fused types 
for the time being. Then see.


But my vote is for paired fused types instead of "real_t complex".

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Pull request emails

2011-04-30 Thread Dag Sverre Seljebotn
Finally think I figured out how to get pull request emails (thanks to 
Gael V). From https://github.com/organizations/cython/teams/24445:


"""
Owners do not receive notifications for the organization's repos by 
default. To receive notifications, create a team and add the owners and 
repos for which notifications are desired.

"""

I created Reviewers and added me, Stefan & Robert for now.

https://github.com/organizations/cython/teams/54516

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-02 Thread Dag Sverre Seljebotn

On 05/01/2011 06:25 PM, Sturla Molden wrote:

Den 01.05.2011 16:36, skrev Stefan Behnel:


Not everyone uses C++. And the C++ compiler cannot adapt the code to
specific Python object types.


Ok, that makes sence.

Second question: Why not stay with the current square-bracket syntax?
Does Cython
need a fused-type in addition?


There is no current feature for templates in Cython currently, only 
interfacing with C++ templates, which is rather different.


I.e., your question is very vague.

You're welcome to draft your own proposal for full-blown templates in 
Cython, if that is what you mean. When we came up with this idea, we 
felt that bringing the full power of C++ templates (including pattern 
matching etc.) into Cython would be a bit too much; I think Cython devs 
are above average sceptical to C++ and the mixed blessings of templates.


E.g., one reason for not wanting to do it the C++ way is the need to 
stick largs parts of your program in header files. With fused types, the 
valid instantiations are determined up front.


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-02 Thread Dag Sverre Seljebotn

On 05/02/2011 03:00 PM, Sturla Molden wrote:

Den 02.05.2011 11:15, skrev Dag Sverre Seljebotn:


I.e., your question is very vague.


Ok, what I wanted to ask was "why have one syntax for interfacing C++
templates and another for generics?" It seems like syntax bloat to me.


But we do that. The CEP specifies that if you have

def f(floating x): return x**2

then "f[double]" will refer to the specialization where 
floating==double, and calling f[double](3.4f) will make the float be 
upcast to a double.


There's no [] within the function definition, but there's no "prior art" 
for how that would look within Cython.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-02 Thread Dag Sverre Seljebotn

On 05/02/2011 03:00 PM, Sturla Molden wrote:

Den 02.05.2011 11:15, skrev Dag Sverre Seljebotn:


I.e., your question is very vague.


Ok, what I wanted to ask was "why have one syntax for interfacing C++
templates and another for generics?" It seems like syntax bloat to me.



You're welcome to draft your own proposal for full-blown templates in
Cython, if that is what you mean. When we came up with this idea, we
felt that bringing the full power of C++ templates (including pattern
matching etc.) into Cython would be a bit too much; I think Cython
devs are above average sceptical to C++ and the mixed blessings of
templates.

E.g., one reason for not wanting to do it the C++ way is the need to
stick largs parts of your program in header files. With fused types,
the valid instantiations are determined up front.


C++ templates are evil. They require huge header files (compiler
dependent, but they all do) and make debugging a night mare. Template
metaprogramming in C++ is crazy; we have optimizing compilers for
avoiding that. Java and C# has a simpler form of generics, but even that
can be too general.

Java and C# can specialize code at run-time, because there is a
JIT-compiler. Cython must do this in advance, for which fused_types
which will give us a combinatoral bloat of specialized code. That is why
I suggested using run-time type information from test runs to select
those we want.


Well, I think that what you see about "fused_types(object, list)" is 
mainly theoretical exercises at this point.


When fused_types was discussed originally the focus was very much on 
just finding something that would allow people to specialise for 
"float,double", or real and complex.


IOW, the kind of specializations people would have generated themselves 
using a templating language anyway.


Myself I see typing from profile-assisted compilation as a completely 
seperate feature (and something that's internal to "cython 
optimization"), even though they may share most implementation details, 
and fused types makes such things easier (but so would C++-style 
templates have done).



Personally I solve this by "writing code that writes code". It is easy
to use a Python script to generate ad print specialized C or Cython code.


fused_types is simply a proposal to make people resort to this a little 
less often (not everybody are comfortable generating source code -- I 
think everybody reading cython-devel are though). Basically: We don't 
want C++ templates, but can we extend the language in a way that deals 
with the most common situations. And fused_types was the compromise we 
ended up with.


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn

On 05/03/2011 09:59 AM, mark florisson wrote:

On 3 May 2011 00:21, Robert Bradshaw  wrote:

On Mon, May 2, 2011 at 1:56 PM, mark florisson
  wrote:

On 2 May 2011 18:24, Robert Bradshaw  wrote:

On Sun, May 1, 2011 at 2:38 AM, mark florisson
  wrote:

A remaining issue which I'm not quite certain about is the
specialization through subscripts, e.g. func[double]. How should this
work from Python space (assuming cpdef functions)? Would we want to
pass in cython.double etc? Because it would only work for builtin
types, so what about types that aren't exposed to Python but can still
be coerced to and from Python? Perhaps it would be better to pass in
strings instead. I also think e.g. "int *" reads better than
cython.pointer(cython.int).


That's whey we offer cython.p_int. On that note, we should support
cython.astype("int *") or something like that. Generally, I don't like
encoding semantic information in strings.

OTHO, since it'll be a mapping of some sort, there's no reason we
can't support both. Most of the time it should dispatch (at runtime or
compile time) based on the type of the arguments.


If we have an argument type that is composed of a fused type, would be
want the indexing to specify the composed type or the fused type? e.g.

ctypedef floating *floating_p


How should we support this? It's clear in this case, but only because
you chose good names. Another option would be to require
parameterization floating_p, with floating_p[floating] the
"as-yet-unparameterized" version. Explicit but redundant. (The same
applies to struct as classes as well as typedefs.) On the other had,
the above is very succinct and clear in context, so I'm leaning
towards it. Thoughts?


Well, it is already supported. floating is fused, so any composition
of floating is also fused.


cdef func(floating_p x):
...

Then do we want

func[double](10.0)

or

func[double_p](10.0)

to specialize func?


The latter.


I'm really leaning towards the former. What if you write

cdef func(floating_p x, floating_p *y):
 ...

Then specializing floating_p using double_p sounds slightly
nonsensical, as you're also specializing floating_p *.


I made myself agree with both of you in turn, but in the end I think I'm 
with Robert here.


Robert's approach sounds perhaps slightly simpler if you think of it 
this way:


ctypedef fused_type(float, double) floating
ctypedef floating* floating_p

is really a short-hand for

ctypedef fused_type(float*, double*) floating_p

I.e., when using a fused_type in a typedef you simply get a new 
fused_type. This sounds in a sense simpler without extra complexity 
getting in the way ("which was my fused base type again...").


Dag SVerre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn

On 05/03/2011 10:42 AM, mark florisson wrote:

On 3 May 2011 10:07, Dag Sverre Seljebotn  wrote:

On 05/03/2011 09:59 AM, mark florisson wrote:


On 3 May 2011 00:21, Robert Bradshawwrote:


On Mon, May 2, 2011 at 1:56 PM, mark florisson
wrote:


On 2 May 2011 18:24, Robert Bradshaw
  wrote:


On Sun, May 1, 2011 at 2:38 AM, mark florisson
wrote:


A remaining issue which I'm not quite certain about is the
specialization through subscripts, e.g. func[double]. How should this
work from Python space (assuming cpdef functions)? Would we want to
pass in cython.double etc? Because it would only work for builtin
types, so what about types that aren't exposed to Python but can still
be coerced to and from Python? Perhaps it would be better to pass in
strings instead. I also think e.g. "int *" reads better than
cython.pointer(cython.int).


That's whey we offer cython.p_int. On that note, we should support
cython.astype("int *") or something like that. Generally, I don't like
encoding semantic information in strings.

OTHO, since it'll be a mapping of some sort, there's no reason we
can't support both. Most of the time it should dispatch (at runtime or
compile time) based on the type of the arguments.


If we have an argument type that is composed of a fused type, would be
want the indexing to specify the composed type or the fused type? e.g.

ctypedef floating *floating_p


How should we support this? It's clear in this case, but only because
you chose good names. Another option would be to require
parameterization floating_p, with floating_p[floating] the
"as-yet-unparameterized" version. Explicit but redundant. (The same
applies to struct as classes as well as typedefs.) On the other had,
the above is very succinct and clear in context, so I'm leaning
towards it. Thoughts?


Well, it is already supported. floating is fused, so any composition
of floating is also fused.


cdef func(floating_p x):
...

Then do we want

func[double](10.0)

or

func[double_p](10.0)

to specialize func?


The latter.


I'm really leaning towards the former. What if you write

cdef func(floating_p x, floating_p *y):
 ...

Then specializing floating_p using double_p sounds slightly
nonsensical, as you're also specializing floating_p *.


I made myself agree with both of you in turn, but in the end I think I'm
with Robert here.

Robert's approach sounds perhaps slightly simpler if you think of it this
way:

ctypedef fused_type(float, double) floating
ctypedef floating* floating_p

is really a short-hand for

ctypedef fused_type(float*, double*) floating_p

I.e., when using a fused_type in a typedef you simply get a new fused_type.
This sounds in a sense simpler without extra complexity getting in the way
("which was my fused base type again...").

Dag SVerre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel



Ok, if those typedefs should be disallowed then specialization through
indexing should then definitely get the types listed in the fused_type
typedef.


I'm not sure what you mean here. What is disallowed exactly?

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn

On 05/03/2011 10:49 AM, mark florisson wrote:

On 3 May 2011 10:44, Dag Sverre Seljebotn  wrote:

On 05/03/2011 10:42 AM, mark florisson wrote:


On 3 May 2011 10:07, Dag Sverre Seljebotn
  wrote:


On 05/03/2011 09:59 AM, mark florisson wrote:


On 3 May 2011 00:21, Robert Bradshaw
  wrote:


On Mon, May 2, 2011 at 1:56 PM, mark florisson
  wrote:


On 2 May 2011 18:24, Robert Bradshaw
  wrote:


On Sun, May 1, 2011 at 2:38 AM, mark florisson
  wrote:


A remaining issue which I'm not quite certain about is the
specialization through subscripts, e.g. func[double]. How should
this
work from Python space (assuming cpdef functions)? Would we want to
pass in cython.double etc? Because it would only work for builtin
types, so what about types that aren't exposed to Python but can
still
be coerced to and from Python? Perhaps it would be better to pass in
strings instead. I also think e.g. "int *" reads better than
cython.pointer(cython.int).


That's whey we offer cython.p_int. On that note, we should support
cython.astype("int *") or something like that. Generally, I don't
like
encoding semantic information in strings.

OTHO, since it'll be a mapping of some sort, there's no reason we
can't support both. Most of the time it should dispatch (at runtime
or
compile time) based on the type of the arguments.


If we have an argument type that is composed of a fused type, would be
want the indexing to specify the composed type or the fused type? e.g.

ctypedef floating *floating_p


How should we support this? It's clear in this case, but only because
you chose good names. Another option would be to require
parameterization floating_p, with floating_p[floating] the
"as-yet-unparameterized" version. Explicit but redundant. (The same
applies to struct as classes as well as typedefs.) On the other had,
the above is very succinct and clear in context, so I'm leaning
towards it. Thoughts?


Well, it is already supported. floating is fused, so any composition
of floating is also fused.


cdef func(floating_p x):
...

Then do we want

func[double](10.0)

or

func[double_p](10.0)

to specialize func?


The latter.


I'm really leaning towards the former. What if you write

cdef func(floating_p x, floating_p *y):
 ...

Then specializing floating_p using double_p sounds slightly
nonsensical, as you're also specializing floating_p *.


I made myself agree with both of you in turn, but in the end I think I'm
with Robert here.

Robert's approach sounds perhaps slightly simpler if you think of it this
way:

ctypedef fused_type(float, double) floating
ctypedef floating* floating_p

is really a short-hand for

ctypedef fused_type(float*, double*) floating_p

I.e., when using a fused_type in a typedef you simply get a new
fused_type.
This sounds in a sense simpler without extra complexity getting in the
way
("which was my fused base type again...").

Dag SVerre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel



Ok, if those typedefs should be disallowed then specialization through
indexing should then definitely get the types listed in the fused_type
typedef.


I'm not sure what you mean here. What is disallowed exactly?


ctypedef cython.fused_type(float, double) floating
ctypedef floating *floating_p

That is what you meant right? Because prohibiting that makes it easier
to see where a type is variable (as the entire type always is, and not
some base type of it).



No. I meant that the above is automatically transformed into

ctypedef cython.fused_type(float, double) floating
ctypedef cython.fused_type(float*, double*) floating_p


DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn

On 05/03/2011 03:51 PM, Stefan Behnel wrote:

mark florisson, 03.05.2011 15:17:

if you have

cdef func(floating x, floating y):
...

you get a "float, float" version, and a "double, double" version, but
not "float, double" or "double, float".


So, what would you have to do in order to get a "float, double" and
"double, float" version then? Could you get that with

ctypedef fused_type(double, float) floating_df
ctypedef fused_type(float, double) floating_fd

cdef func(floating_df x, floating_fd y):

?


Well, if you do something like

ctypedef fused_type(float, double) speed_t
ctypedef fused_type(float, double) acceleration_t

cdef func(speed_t x, acceleration_t y)

then you get 4 specializations. Each new typedef gives a new polymorphic 
type.


OTOH, with

ctypedef speed_t acceleration_t

I guess only 2 specializations.

Treating the typedefs in this way is slightly fishy of course. It may 
hint that "ctypedef" is the wrong way to declare a fused type *shrug*.


To only get the "cross-versions" you'd need something like what you 
wrote + Pauli's "paired"-suggestion.



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn
I was wrong. We need

cdef f(floating x, floating_p y)

...to get 2 specializations, not 4. And the rest follows from there. So I'm 
with Robert's real stance :-)

I don't think we want flexibility, we want simplicity over all. You can always 
use a templating language.

Btw we shouldn't count on pruning for the design of this, I think this will for 
a large part be used with def functions. And if you use a cdef function from 
another module through a pxd, you also need all versions.

DS
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mark florisson  wrote:

On 3 May 2011 18:00, Robert Bradshaw  wrote: > On 
Tue, May 3, 2011 at 12:59 AM, mark florisson >  
wrote: >> On 3 May 2011 00:21, Robert Bradshaw  
wrote: >>> On Mon, May 2, 2011 at 1:56 PM, mark florisson >>> 
 wrote:  On 2 May 2011 18:24, Robert Bradshaw 
 wrote: > On Sun, May 1, 2011 at 2:38 AM, 
mark florisson >  wrote: >> A remaining 
issue which I'm not quite certain about is the >> specialization through 
subscripts, e.g. func[double]. How should this >> work from Python space 
(assuming cpdef functions)? Would we want to >> pass in cython.double etc? 
Because it would only work for builtin >> types, so what about types that 
aren't exposed to Python but can still >> be coerced to and from Python? 
Perhaps it would be better to pass in >> strings instead. I also think e.g.
  "int
*" reads better than >> cython.pointer(cython.int). > > That's whey 
we offer cython.p_int. On that note, we should support > cython.astype("int 
*") or something like that. Generally, I don't like > encoding semantic 
information in strings. > > OTHO, since it'll be a mapping of some 
sort, there's no reason we > can't support both. Most of the time it should 
dispatch (at runtime or > compile time) based on the type of the arguments. 
  If we have an argument type that is composed of a fused type, would 
be  want the indexing to specify the composed type or the fused type? e.g. 
  ctypedef floating *floating_p >>> >>> How should we support this? 
It's clear in this case, but only because >>> you chose good names. Another 
option would be to require >>> parameterization floating_p, with 
floating_p[floating] the >>> "as-yet-unparameterized" version. Explicit but 
redundant. (The same >>> applies to struct as classes as well as type
 defs.)
On the other had, >>> the above is very succinct and clear in context, so I'm 
leaning >>> towards it. Thoughts? >> >> Well, it is already supported. floating 
is fused, so any composition >> of floating is also fused. >>  cdef 
func(floating_p x): ...   Then do we want  
func[double](10.0)   or  func[double_p](10.0)   to 
specialize func? >>> >>> The latter. >> >> I'm really leaning towards the 
former. > > Ugh. I totally changed the meaning of that when I refactored my 
email. > I'm in agreement with you: func[double]. I see, however Dag just 
agreed on double_p :) So it depends, as Dag said, we can view ctypedef floating 
*floating_p as a fused type with variable part double * and float *. But you 
can also view the variable part as double and float. Either way makes sense, 
but the former allows you to differentiate floating from floating_p. So I 
suppose that if we want func[double] to specialize 'cdef func(floating_p x, fl
 oating
y)', then it would specialize both floating_p and floating. However, if we 
settle on Dag's proposal, we can differentiate 'floating' from 'floating_p' and 
we could make 'speed_t' and 'acceleration_t' a ctypedef of floating. So I guess 
Dag's proposal makes sense, because if you want a single specialization, you'd 
write 'cdef func(floating *x, floating y)'. So overall you get more 
flexibility. >> What if you write >> >> cdef func(floating_p x, floating_p *y): 
>>... >> >> Then specializing floating_p using double_p sounds slightly >> 
nonsensical, as you're also specializing floating_p *. >>  FYI, the type 
checking works like 'double_p is  floating_p' and not 'double is 
floating_p'. But for functions this is  a little different. On the one hand 
specifying the full types  (double_p) makes sense as you're kind of 
specifying a signature, but  on the other hand you're specializing fused 
types and you don't care  how they are composed -- especially if they 
 occur
multiple times with  different composition. So I'm thinking we want 
'func[double]'. >>> >>> That's what I'm thinking too. The type you're branching 
on is >>> floating, and withing that block you can declare variables as >>> 
floating*, ndarray[dtype=floating], etc. >> >> What I actually meant there was 
"I think we want func[double] for the >> func(floating_p x) signature". >> >> 
Right, people can already say 'cdef func(floating *p): ...' and then >> use 
'floating'. However, if you do 'cdef floating_p x): ...', then >> 'floa

Re: [Cython] Fused Types

2011-05-03 Thread Dag Sverre Seljebotn

On 05/03/2011 08:19 PM, Robert Bradshaw wrote:


Btw we shouldn't count on pruning for the design of this, I think this will
for a large part be used with def functions. And if you use a cdef function
from another module through a pxd, you also need all versions.


Well, we'll want to avoid compiler warnings. E.g. floating might
include long double, but only float and double may be used. In pxd and
def functions, however, we will make all versions available.


Which is a reminder to hash out exactly how the dispatch will be 
resolved when coming from Python space (we do want to support "f(x, y)", 
without []-qualifier, when calling from Python, right?)


Fused types mostly make sense when used through PEP 3118 memory views 
(using the planned syntax for brevity):


def f(floating[:] x, floating y): ...

I'm thinking that in this kind of situation we let the array override 
how y is interpreted (y will always be a double here, but if x is passed 
as a float32 then use float32 for y as well and coerce y).


Does this make sense as a general rule -- if there's a conflict between 
array arguments and scalar arguments (array base type is narrower than 
the scalar type), the array argument wins? It makes sense because we can 
easily convert a scalar while we can't convert an array; and there's no 
"3.4f" notation in Python.


This makes less sense

def f(floating x): ...

as it can only ever resolve to double; although I guess we should allow 
it for consistency with usecases that do make sense, such as 
"real_or_complex" and "int_or_float"


The final and most difficult problem is what Python ints resolve to in 
this context. The widest integer type available in the fused type? 
Always Py_ssize_t? -1 on making the dispatch depend on the actual 
run-time value.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 01:07 AM, Greg Ewing wrote:

mark florisson wrote:


cdef func(floating x, floating y):
...

you get a "float, float" version, and a "double, double" version, but
not "float, double" or "double, float".


It's hard to draw conclusions from this example because
it's degenerate. You don't really need multiple versions of a
function like that, because of float <-> double coercions.

A more telling example might be

cdef double dot_product(floating *u, floating *v, int length)

By your current rules, this would give you one version that
takes two float vectors, and another that takes two double
vectors.

But if you want to find the dot product of a float vector and
a double vector, you're out of luck.


First, I'm open for your proposed syntax too...But in the interest of 
seeing how we got here:


The argument to the above goes that you *should* be out of luck. For 
instance, talking about dot products, BLAS itself has float-float and 
double-double, but not float-double AFAIK.


What you are saying that this does not have the full power of C++ 
templates. And the answer is that yes, this does not have the full power 
of C++ templates.


At the same time we discussed this, we also discussed better support for 
string-based templating languages (so that, e.g., compilation error 
messages could refer to the template file). The two are complementary.


Going back to Greg's syntax: What I don't like is that it makes the 
simple unambiguous cases, where this would actually be used in real 
life, less readable.


Would it be too complicated to have both? For instance;

 i) You are allowed to use a *single* fused_type on a *function* 
without declaration.


def f(floating x, floating *y): # ok

Turns into

def f[floating T](T x, T *y):

This is NOT ok:

def f(floating x, integral y):
# ERROR: Please explicitly declare fused types inside []

 ii) Using more than one fused type, or using it on a cdef class or 
struct, you need to use the [] declaration.



Finally: It is a bit uncomfortable that we seem to be hashing things out 
even as Mark is implementing this. Would it be feasible to have a Skype 
session sometimes this week where everybody interested in the outcome of 
this come together for an hour and actually decide on something?


Mark: How much does this discussion of syntax impact your development? 
Are you able to treat them just as polish on top and work on the 
"engine" undisturbed by this?


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 12:00 PM, mark florisson wrote:

On 21 April 2011 20:13, Dag Sverre Seljebotn  wrote:

On 04/21/2011 10:37 AM, Robert Bradshaw wrote:


On Mon, Apr 18, 2011 at 7:51 AM, mark florisson
wrote:


On 18 April 2011 16:41, Dag Sverre Seljebotn
  wrote:


Excellent! Sounds great! (as I won't have my laptop for some days I
can't
have a look yet but I will later)

You're right about (the current) buffers and the gil. A testcase
explicitly
for them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better
to
take a break from it at this point so that we can think more about that
and
not do anything rash; perhaps open up a specific thread on them and ask
for
more general input. Perhaps you want to take a break or task-switch to
something else (fused types?) until I can get around to review and merge
what you have so far? You'll know best what works for you though. If you
decide to implement explicit threadprivate variables because you've got
the
flow I certainly wom't object myself.


  Ok, cool, I'll move on :) I already included a test with a prange and
a numpy buffer with indexing.


Wow, you're just plowing away at this. Very cool.

+1 to disallowing nested prange, that seems to get really messy with
little benefit.

In terms of the CEP, I'm still unconvinced that firstprivate is not
safe to infer, but lets leave the initial values undefined rather than
specifying them to be NaNs (we can do that as an implementation if you
want), which will give us flexibility to change later once we've had a
chance to play around with it.


I don't see any technical issues with inferring firstprivate, the question
is whether we want to. I suggest not inferring it in order to make this
safer: One should be able to just try to change a loop from "range" to
"prange", and either a) have things fail very hard, or b) just work
correctly and be able to trust the results.

Note that when I suggest using NaN, it is as initial values for EACH
ITERATION, not per-thread initialization. It is not about "firstprivate" or
not, but about disabling thread-private variables entirely in favor of
"per-iteration" variables.

I believe that by talking about "readonly" and "per-iteration" variables,
rather than "thread-shared" and "thread-private" variables, this can be used
much more safely and with virtually no knowledge of the details of
threading. Again, what's in my mind are scientific programmers with (too)
little training.

In the end it's a matter of taste and what is most convenient to more users.
But I believe the case of needing real thread-private variables that
preserves per-thread values across iterations (and thus also can possibly
benefit from firstprivate) is seldomly enough used that an explicit
declaration is OK, in particular when it buys us so much in safety in the
common case.

To be very precise,

cdef double x, z
for i in prange(n):
x = f(x)
z = f(i)
...

goes to

cdef double x, z
for i in prange(n):
x = z = nan
x = f(x)
z = f(i)
...

and we leave it to the C compiler to (trivially) optimize away "z = nan".
And, yes, it is a stopgap solution until we've got control flow analysis so
that we can outright disallow such uses of x (without threadprivate
declaration, which also gives firstprivate behaviour).



I think the preliminary OpenMP support is ready for review. It
supports 'with cython.parallel.parallel:' and 'for i in
cython.parallel.prange(...):'. It works in generators and closures and
the docs are updated. Support for break/continue/with gil isn't there
yet.

There are two remaining issue. The first is warnings for potentially
uninitialized variables for prange(). When you do

for i in prange(start, stop, step): ...

it generates code like

nsteps = (stop - start) / step;
#pragma omp parallel for lastprivate(i)
for (temp = 0; temp<  nsteps; temp++) {
 i = start + temp * step;
 ...
}

So here it will complain about 'i' being potentially uninitialized, as
it might not be assigned to in the loop. However, simply assigning 0
to 'i' can't work either, as you expect zero iterations not to touch
it. So for now, we have a bunch of warnings, as I don't see a
__attribute__ to suppress it selectively.


Isn't this is orthogonal to OpenMP -- even if it said "range", your 
testcase could get such a warning? If so, the fix is simply to 
initialize i in your testcase code.



The second is NaN-ing private variables, NaN isn't part of C. For gcc,
the docs ( http://www.delorie.com/gnu/docs/glibc/libc_407.html ) have
the following to say:

"You can use `#ifdef NAN' to test whether the machine supports NaN.
(Of course, you must arrange for GNU extensions to be visible, such as
by defining _GNU_SOURCE

Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 12:59 PM, mark florisson wrote:

On 4 May 2011 12:45, Dag Sverre Seljebotn  wrote:

On 05/04/2011 12:00 PM, mark florisson wrote:

There are two remaining issue. The first is warnings for potentially
uninitialized variables for prange(). When you do

for i in prange(start, stop, step): ...

it generates code like

nsteps = (stop - start) / step;
#pragma omp parallel for lastprivate(i)
for (temp = 0; temp<nsteps; temp++) {
 i = start + temp * step;
 ...
}

So here it will complain about 'i' being potentially uninitialized, as
it might not be assigned to in the loop. However, simply assigning 0
to 'i' can't work either, as you expect zero iterations not to touch
it. So for now, we have a bunch of warnings, as I don't see a
__attribute__ to suppress it selectively.


Isn't this is orthogonal to OpenMP -- even if it said "range", your testcase
could get such a warning? If so, the fix is simply to initialize i in your
testcase code.


No, the problem is that 'i' needs to be lastprivate, and 'i' is
assigned to in the loop body. It's irrelevant whether 'i' is assigned
to before the loop. I think this is the case because the spec says
that lastprivate variables will get the value of the private variable
of the last sequential iteration, but it cannot at compile time know
whether there might be zero iterations, which I believe the spec
doesn't have anything to say about. So basically we could guard
against it by checking if nsteps>  0, but the compiler doesn't detect
this, so it will still issue a warning even if 'i' is initialized (the
warning is at the place of the lastprivate declaration).


Ah. But this is then more important than I initially thought it was. You 
are saying that this is the case:


cdef int i = 0
with nogil:
for i in prange(n):
...
print i # garbage when n == 0?

It would be in the interest of less semantic differences w.r.t. range to 
deal better with this case.


Will it silence the warning if we make "i" firstprivate as well as 
lastprivate? firstprivate would only affect the case of zero iterations, 
since we overwrite with NaN if the loop is entered...


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 01:30 PM, mark florisson wrote:

On 4 May 2011 13:15, Dag Sverre Seljebotn  wrote:

On 05/04/2011 12:59 PM, mark florisson wrote:


On 4 May 2011 12:45, Dag Sverre Seljebotn
  wrote:


On 05/04/2011 12:00 PM, mark florisson wrote:


There are two remaining issue. The first is warnings for potentially
uninitialized variables for prange(). When you do

for i in prange(start, stop, step): ...

it generates code like

nsteps = (stop - start) / step;
#pragma omp parallel for lastprivate(i)
for (temp = 0; temp<  nsteps; temp++) {
 i = start + temp * step;
 ...
}

So here it will complain about 'i' being potentially uninitialized, as
it might not be assigned to in the loop. However, simply assigning 0
to 'i' can't work either, as you expect zero iterations not to touch
it. So for now, we have a bunch of warnings, as I don't see a
__attribute__ to suppress it selectively.


Isn't this is orthogonal to OpenMP -- even if it said "range", your
testcase
could get such a warning? If so, the fix is simply to initialize i in
your
testcase code.


No, the problem is that 'i' needs to be lastprivate, and 'i' is
assigned to in the loop body. It's irrelevant whether 'i' is assigned
to before the loop. I think this is the case because the spec says
that lastprivate variables will get the value of the private variable
of the last sequential iteration, but it cannot at compile time know
whether there might be zero iterations, which I believe the spec
doesn't have anything to say about. So basically we could guard
against it by checking if nsteps>0, but the compiler doesn't detect
this, so it will still issue a warning even if 'i' is initialized (the
warning is at the place of the lastprivate declaration).


Ah. But this is then more important than I initially thought it was. You are
saying that this is the case:

cdef int i = 0
with nogil:
for i in prange(n):
...
print i # garbage when n == 0?


I think it may be, depending on the implementation. With libgomp it
return 0. With the check it should also return 0.


It would be in the interest of less semantic differences w.r.t. range to
deal better with this case.

Will it silence the warning if we make "i" firstprivate as well as
lastprivate? firstprivate would only affect the case of zero iterations,
since we overwrite with NaN if the loop is entered...


Well, it wouldn't be NaN, it would be start + step * temp :) But, yes,


Doh.


that works. So we need both the check and an initialization in there:

if (nsteps>  0) {
 i = 0;
 #pragma omp parallel for firstprivate(i) lastprivate(i)
 for (temp = 0; ...; ...) ...
}


Why do you need the if-test? Won't simply

#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) ...

do the job -- any initial value will be copied into all threads, 
including the "last" thread, even if there are no iterations?


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 01:41 PM, mark florisson wrote:

On 4 May 2011 13:39, Dag Sverre Seljebotn  wrote:

On 05/04/2011 01:30 PM, mark florisson wrote:


On 4 May 2011 13:15, Dag Sverre Seljebotn
  wrote:


On 05/04/2011 12:59 PM, mark florisson wrote:


On 4 May 2011 12:45, Dag Sverre Seljebotn
  wrote:


On 05/04/2011 12:00 PM, mark florisson wrote:


There are two remaining issue. The first is warnings for potentially
uninitialized variables for prange(). When you do

for i in prange(start, stop, step): ...

it generates code like

nsteps = (stop - start) / step;
#pragma omp parallel for lastprivate(i)
for (temp = 0; temp<nsteps; temp++) {
 i = start + temp * step;
 ...
}

So here it will complain about 'i' being potentially uninitialized, as
it might not be assigned to in the loop. However, simply assigning 0
to 'i' can't work either, as you expect zero iterations not to touch
it. So for now, we have a bunch of warnings, as I don't see a
__attribute__ to suppress it selectively.


Isn't this is orthogonal to OpenMP -- even if it said "range", your
testcase
could get such a warning? If so, the fix is simply to initialize i in
your
testcase code.


No, the problem is that 'i' needs to be lastprivate, and 'i' is
assigned to in the loop body. It's irrelevant whether 'i' is assigned
to before the loop. I think this is the case because the spec says
that lastprivate variables will get the value of the private variable
of the last sequential iteration, but it cannot at compile time know
whether there might be zero iterations, which I believe the spec
doesn't have anything to say about. So basically we could guard
against it by checking if nsteps>  0, but the compiler doesn't detect
this, so it will still issue a warning even if 'i' is initialized (the
warning is at the place of the lastprivate declaration).


Ah. But this is then more important than I initially thought it was. You
are
saying that this is the case:

cdef int i = 0
with nogil:
for i in prange(n):
...
print i # garbage when n == 0?


I think it may be, depending on the implementation. With libgomp it
return 0. With the check it should also return 0.


It would be in the interest of less semantic differences w.r.t. range to
deal better with this case.

Will it silence the warning if we make "i" firstprivate as well as
lastprivate? firstprivate would only affect the case of zero iterations,
since we overwrite with NaN if the loop is entered...


Well, it wouldn't be NaN, it would be start + step * temp :) But, yes,


Doh.


that works. So we need both the check and an initialization in there:

if (nsteps>0) {
 i = 0;
 #pragma omp parallel for firstprivate(i) lastprivate(i)
 for (temp = 0; ...; ...) ...
}


Why do you need the if-test? Won't simply

#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) ...

do the job -- any initial value will be copied into all threads, including
the "last" thread, even if there are no iterations?


It will, but you don't expect your iteration variable to change with
zero iterations.


Look.

i = 42
for i in prange(n):
f(i)
print i # want 42 whenever n == 0

Now, translate this to:

i = 42;
#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) {
i = ...
}
#pragma omp parallel end
/* At this point, i == 42 if n == 0 */

Am I missing something?

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 01:48 PM, mark florisson wrote:

On 4 May 2011 13:47, mark florisson  wrote:

On 4 May 2011 13:45, Dag Sverre Seljebotn  wrote:



Look.

i = 42
for i in prange(n):
f(i)
print i # want 42 whenever n == 0

Now, translate this to:

i = 42;
#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) {
i = ...
}
#pragma omp parallel end
/* At this point, i == 42 if n == 0 */

Am I missing something?


Yes, 'i' may be uninitialized with nsteps>  0 (this should be valid
code). So if nsteps>  0, we need to initialize 'i' to something to get
correct behaviour with firstprivate.


This I don't see. I think I need to be spoon-fed on this one.


  And of course, if you initialize 'i' unconditionally, you change 'i'
whereas you might have to leave it unaffected.


This I see.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 01:59 PM, mark florisson wrote:

On 4 May 2011 13:54, Dag Sverre Seljebotn  wrote:

On 05/04/2011 01:48 PM, mark florisson wrote:


On 4 May 2011 13:47, mark florissonwrote:


On 4 May 2011 13:45, Dag Sverre Seljebotn
  wrote:



Look.

i = 42
for i in prange(n):
f(i)
print i # want 42 whenever n == 0

Now, translate this to:

i = 42;
#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) {
i = ...
}
#pragma omp parallel end
/* At this point, i == 42 if n == 0 */

Am I missing something?


Yes, 'i' may be uninitialized with nsteps>0 (this should be valid
code). So if nsteps>0, we need to initialize 'i' to something to get
correct behaviour with firstprivate.


This I don't see. I think I need to be spoon-fed on this one.


So assume this code

cdef int i

for i in prange(10): ...

Now if we transform this without the guard we get

int i;

#pragma omp parallel for firstprivate(i) lastprivate(i)
for (...) { ...}

This is invalid C code, but valid Cython code. So we need to
initialize 'i', but then we get our "leave it unaffected for 0
iterations" paradox. So we need a guard.


You mean C code won't compile if i is firstprivate and not initialized? 
(Sorry, I'm not aware of such things.)


My first instinct is to initialize i to 0xbadabada. After all, its value 
is not specified -- we're not violating any Cython specs by initializing 
it to garbage ourselves.


OTOH, I see that your approach with an if-test is more 
Valgrind-friendly, so I'm OK with that.


Would it work to do

if (nsteps > 0) {
#pragma omp parallel
i = 0;
#pragma omp for lastprivate(i)
for (temp = 0; ...) ...
...
}

instead, to get rid of the warning without using a firstprivate? Not 
sure if there's an efficiency difference here, I suppose a good C 
compiler could compile them to the same thing.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 02:17 PM, mark florisson wrote:

On 4 May 2011 14:10, Dag Sverre Seljebotn  wrote:

On 05/04/2011 01:59 PM, mark florisson wrote:


On 4 May 2011 13:54, Dag Sverre Seljebotn
  wrote:


On 05/04/2011 01:48 PM, mark florisson wrote:


On 4 May 2011 13:47, mark florisson  wrote:


On 4 May 2011 13:45, Dag Sverre Seljebotn
  wrote:



Look.

i = 42
for i in prange(n):
f(i)
print i # want 42 whenever n == 0

Now, translate this to:

i = 42;
#pragma omp parallel for firstprivate(i) lastprivate(i)
for (temp = 0; ...; ...) {
i = ...
}
#pragma omp parallel end
/* At this point, i == 42 if n == 0 */

Am I missing something?


Yes, 'i' may be uninitialized with nsteps>  0 (this should be valid
code). So if nsteps>  0, we need to initialize 'i' to something to
get
correct behaviour with firstprivate.


This I don't see. I think I need to be spoon-fed on this one.


So assume this code

cdef int i

for i in prange(10): ...

Now if we transform this without the guard we get

int i;

#pragma omp parallel for firstprivate(i) lastprivate(i)
for (...) { ...}

This is invalid C code, but valid Cython code. So we need to
initialize 'i', but then we get our "leave it unaffected for 0
iterations" paradox. So we need a guard.


You mean C code won't compile if i is firstprivate and not initialized?
(Sorry, I'm not aware of such things.)


It will compile and warn, but it is technically invalid, as you're
reading an uninitialized variable, which has undefined behavior. If
e.g. the variable contains a trap representation on a certain
architecture, it might halt the program (I'm not sure which
architecture that would be, but I believe they exist).


My first instinct is to initialize i to 0xbadabada. After all, its value is
not specified -- we're not violating any Cython specs by initializing it to
garbage ourselves.


The problem is that we don't know whether the user has initialized the
variable. So if we want firstprivate to suppress warnings, we should
assume that the user hasn't and do it ourselves.


I meant that if we don't care about Valgrindability, we can initialize i 
at the top of our function (i.e. where it says "int __pyx_v_i").



OTOH, I see that your approach with an if-test is more Valgrind-friendly, so
I'm OK with that.

Would it work to do

if (nsteps>  0) {
#pragma omp parallel
i = 0;
#pragma omp for lastprivate(i)
for (temp = 0; ...) ...
...
}


I'm assuming you mean #pragma omp parallel private(i), otherwise you
have a race (I'm not sure how much that matters for assignment). In
any case, with the private() clause 'i' would be uninitialized
afterwards. In either case it won't do anything useful.


Sorry, I meant that lastprivate(i) should go on the parallel line.

if (nsteps>  0) {
#pragma omp parallel lastprivate(i)
i = 0;
#pragma omp for
for (temp = 0; ...) ...
...
}

won't this silence the warning? At any rate, it's obvious you have a 
better handle on this than me, so I'll shut up now and leave you to it :-)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn
Moving pull requestion discussion 
(https://github.com/cython/cython/pull/28) over here:


First, I got curious why you'd have a strip off "-pthread" from CC. I'd 
think you could just execute with it with "-pthread", which seems simpler.


Second: If parallel.parallel is not callable, how are scheduling 
parameters for parallel blocks handled? Is there a reason to not support 
that? Do you think it should stay this way, or will parallel take 
parameters in the future?


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 07:03 PM, mark florisson wrote:

On 4 May 2011 18:35, Dag Sverre Seljebotn  wrote:

Moving pull requestion discussion (https://github.com/cython/cython/pull/28)
over here:

First, I got curious why you'd have a strip off "-pthread" from CC. I'd
think you could just execute with it with "-pthread", which seems simpler.


It needs to end up in a list of arguments, and it's not needed at all
as I only need the version. I guess I could do (cc + " -v").split()
but eh.


OK, that's reassuring, thought perhaps you had encountered a strange gcc 
strain.





Second: If parallel.parallel is not callable, how are scheduling parameters
for parallel blocks handled? Is there a reason to not support that? Do you
think it should stay this way, or will parallel take parameters in the
future?


Well, as I mentioned a while back, you cannot schedule parallel
blocks, there is no worksharing involved. All a parallel block does is
executed a code block in however many threads there are available. The
scheduling parameters are valid for a worksharing for loop only, as
you schedule (read "distribute") the work among the threads.


Perhaps I used the wrong terms; but checking the specs, I guess I meant 
"num_threads", which definitely applies to parallel.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] prange CEP updated

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 08:07 PM, mark florisson wrote:

On 4 May 2011 19:44, Dag Sverre Seljebotn  wrote:

On 05/04/2011 07:03 PM, mark florisson wrote:


On 4 May 2011 18:35, Dag Sverre Seljebotn
  wrote:


Moving pull requestion discussion
(https://github.com/cython/cython/pull/28)
over here:

First, I got curious why you'd have a strip off "-pthread" from CC. I'd
think you could just execute with it with "-pthread", which seems
simpler.


It needs to end up in a list of arguments, and it's not needed at all
as I only need the version. I guess I could do (cc + " -v").split()
but eh.


OK, that's reassuring, thought perhaps you had encountered a strange gcc
strain.




Second: If parallel.parallel is not callable, how are scheduling
parameters
for parallel blocks handled? Is there a reason to not support that? Do
you
think it should stay this way, or will parallel take parameters in the
future?


Well, as I mentioned a while back, you cannot schedule parallel
blocks, there is no worksharing involved. All a parallel block does is
executed a code block in however many threads there are available. The
scheduling parameters are valid for a worksharing for loop only, as
you schedule (read "distribute") the work among the threads.


Perhaps I used the wrong terms; but checking the specs, I guess I meant
"num_threads", which definitely applies to parallel.


Ah, that level of scheduling :) Right, so it doesn't take that, but I
don't think it's a big issue. If dynamic scheduling is enabled, it's
only a suggestion, if dynamic scheduling is disabled (whether it's
turned on or off by default is implementation defined) it will give
the the amount of threads requested, if available.
The user can still use omp_set_num_threads(), although admittedly that
modifies a global setting.


Hmm...I'm not completely happy about this. For now I just worry about 
not shutting off the possibility of adding thread-pool-spawning 
parameters in the future. Specifying the number of threads can be 
useful, and omp_set_num_threads is a bad way of doing as you say.


And other backends than OpenMP may call for something we don't know what 
is yet?


Anyway, all I'm asking is whether we should require trailing () on parallel:

with nogil, parallel(): ...

I think we should, to keep the window open for options. Unless, that is, 
we're OK both with and without trailing () down the line.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Fused Types

2011-05-04 Thread Dag Sverre Seljebotn

On 05/04/2011 08:13 PM, Robert Bradshaw wrote:

On Wed, May 4, 2011 at 1:47 AM, mark florisson
  wrote:

On 4 May 2011 10:24, Dag Sverre Seljebotn  wrote:

On 05/04/2011 01:07 AM, Greg Ewing wrote:


mark florisson wrote:


cdef func(floating x, floating y):
...

you get a "float, float" version, and a "double, double" version, but
not "float, double" or "double, float".


It's hard to draw conclusions from this example because
it's degenerate. You don't really need multiple versions of a
function like that, because of float<->  double coercions.

A more telling example might be

cdef double dot_product(floating *u, floating *v, int length)

By your current rules, this would give you one version that
takes two float vectors, and another that takes two double
vectors.

But if you want to find the dot product of a float vector and
a double vector, you're out of luck.


First, I'm open for your proposed syntax too...But in the interest of seeing
how we got here:

The argument to the above goes that you *should* be out of luck. For
instance, talking about dot products, BLAS itself has float-float and
double-double, but not float-double AFAIK.

What you are saying that this does not have the full power of C++ templates.
And the answer is that yes, this does not have the full power of C++
templates.

At the same time we discussed this, we also discussed better support for
string-based templating languages (so that, e.g., compilation error messages
could refer to the template file). The two are complementary.

Going back to Greg's syntax: What I don't like is that it makes the simple
unambiguous cases, where this would actually be used in real life, less
readable.

Would it be too complicated to have both? For instance;

  i) You are allowed to use a *single* fused_type on a *function* without
declaration.

def f(floating x, floating *y): # ok

Turns into

def f[floating T](T x, T *y):

This is NOT ok:

def f(floating x, integral y):
# ERROR: Please explicitly declare fused types inside []

  ii) Using more than one fused type, or using it on a cdef class or struct,
you need to use the [] declaration.



I don't think it would be too complicated, but as you mention it's
probably not a very likely case, and if the user does need it, a new
(equivalent) fused type can be created. The current way reads a lot
nicer than the indexed one in my opinion. So I'd be fine with
implementing it, but I find the current way more elegant.


I was actually thinking of exactly the same thing--supporting syntax
(i) for the case of a single type parameter, but the drawback is the
introduction of two distinct syntaxes for essentially the same
feature. Something like this is necessary to give an ordering to the
types for structs and classes, or when a fused type is used for
intermediate results but not in the argument list. I really like the
elegance of the (much more common) single-parameter variant.

Another option is using the with syntax, which was also considered for
supporting C++ templates.


In particular since that will work in pure Python mode. One thing I 
worry about with the func[]()-syntax is that it is not Python compatible.


That's one thing I like about the CEP, that in time we can do

def f(x: floating) -> floating:
...

and have something that's nice in both Python and Cython.

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [SciPy-User] Scikits.sparse build issue

2011-05-05 Thread Dag Sverre Seljebotn

On 05/05/2011 08:42 PM, Nathaniel Smith wrote:

On Thu, May 5, 2011 at 3:03 AM, Anand Patil
  wrote:


On May 4, 8:16 pm, Nathaniel Smith  wrote:

On Tue, May 3, 2011 at 10:10 AM, Nathaniel Smith  wrote:

On Tue, May 3, 2011 at 5:51 AM, Anand Patil
  wrote:

scikits/sparse/cholmod.c: In function
‘__pyx_f_7scikits_6sparse_7cholmod__py_sparse’:
scikits/sparse/cholmod.c:1713: error: storage size of ‘__pyx_t_10’
isn’t known



I've never used Cython and am having a hard time figuring this out.



Could you send me the file 'scikits/sparse/cholmod.c'? This means that
there's some C type that was forward-declared, but never actually
defined, and then we tried to instantiate an instance of it. But I'll
need to see the generated code to figure out which type '__pyx_t_10'
is supposed to be.


Huh, this appears to be some bad interaction between numpy and cython,
rather than anything to do with my code. The offending variable comes
from doing 'cimport numpy as np' and then referring to
'np.NPY_F_CONTIGUOUS' -- this is being translated to:
   enum requirements __pyx_t_10;
   __pyx_t_10 = NPY_F_CONTIGUOUS;
and then gcc is complaining that 'enum requirements' is an undefined type.

What version of Numpy and Cython do you have installed?


Cython 0.14.1, Numpy 1.5.1. Which versions do you have?


It looks like with Cython 0.12.1, which is what I was using before, it
happens not to generate a temporary variable in this case, but Cython
0.14.1 generates the temporary variable.

I've just committed a workaround to the scikits.sparse repository:
   
https://code.google.com/p/scikits-sparse/source/detail?r=ad106e9c2c2d55f2022a3fb8b9282003b55666fc#
(I believe it works -- it does compile -- but technically I can't
guarantee it since for me the tests are now failing with an "illegal
instruction" error inside BLAS. But I think this must be an unrelated
Ubuntu screwup. Yay software.)

And I'll see about poking Cython upstream to get this fixed...


Awh. Thanks!

https://github.com/cython/cython/commit/a6ec50077990a9767695896076a8b573a5bdccc0

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


[Cython] Git workflow, branches, pull requests

2011-05-05 Thread Dag Sverre Seljebotn
There was just a messup in git history: Mark's OpenMP pull request got 
merged twice; all commits show up two times.


It doesn't matter, since the two openmp branches with the same changes 
merged OK, but we shouldn't make this a habit. For instance, the openMP 
commits also show up as part of vitja's pull request, which is confusing.


In Mercurial speak: The openmp branch was used like you would use a 
Mercurial "patch queue" in one case, and as a branch in another case. In 
git they are the same technically and you rely on conventions to make 
sure you don't treat a "queue" as a "branch".


OPTION A) Either i) only branch from master, or ii) make sure you agree 
with whoever you're branching from that this is a "branch", not a "patch 
queue", so that it isn't rebased under your feet.


We could also, say, prepend all patch queues with an underscore (its 
private).


OPTION B) Stop rebasing. I'd have a very hard time doing that myself, 
but nobody are pulling from dagss/cython these days anyway.


Opinions?

FYI,

The workflow me and Mark is currently using is:

 a) Fork off a feature branch from master (with master I'll always 
refer to cython/master)


 b) When one gets in sync with master, do NOT merge master, but rather 
rebase on top of it:


 git pull --rebase origin master

 c) Continue rebasing, and eventually .

The advantage of this approach is that ugly merges disappear from 
history, since commits are rewritten. And the history graph looks very 
nice and is easy to follow.


BUT, if the result is duplication, we should avoid this practice, and 
rather always merge.



Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-05 Thread Dag Sverre Seljebotn

On 05/05/2011 10:09 PM, mark florisson wrote:

On 5 May 2011 21:52, Dag Sverre Seljebotn  wrote:

There was just a messup in git history: Mark's OpenMP pull request got
merged twice; all commits show up two times.

It doesn't matter, since the two openmp branches with the same changes
merged OK, but we shouldn't make this a habit. For instance, the openMP
commits also show up as part of vitja's pull request, which is confusing.

In Mercurial speak: The openmp branch was used like you would use a
Mercurial "patch queue" in one case, and as a branch in another case. In git
they are the same technically and you rely on conventions to make sure you
don't treat a "queue" as a "branch".

OPTION A) Either i) only branch from master, or ii) make sure you agree with
whoever you're branching from that this is a "branch", not a "patch queue",
so that it isn't rebased under your feet.

We could also, say, prepend all patch queues with an underscore (its
private).

OPTION B) Stop rebasing. I'd have a very hard time doing that myself, but
nobody are pulling from dagss/cython these days anyway.

Opinions?

FYI,

The workflow me and Mark is currently using is:

  a) Fork off a feature branch from master (with master I'll always refer to
cython/master)

  b) When one gets in sync with master, do NOT merge master, but rather
rebase on top of it:

 git pull --rebase origin master

  c) Continue rebasing, and eventually .

The advantage of this approach is that ugly merges disappear from history,
since commits are rewritten. And the history graph looks very nice and is
easy to follow.

BUT, if the result is duplication, we should avoid this practice, and rather
always merge.


Dag Sverre

>

I think the rebasing is pretty elegant, so I'm +1 on that, as long as
everyone agrees because those duplicated commits are nasty. I'm
surprised git didn't issue an error to prevent this.


Going OT:

I guess a principle of git is to be as dumb as possible, so that you can 
predict what it does.


When you rebase, you really get entirely new versions of all your 
commits. There's no way to link the old commits with the new commits, 
except to compare the commit message. And not even that, since you can 
change commit messages during rebases.


Git doesn't even store commits by their diff, it stores each commit by 
the resulting full tree contents, so even if the patch is the exact 
same, git couldn't really know (without going through your entire 
history and check for "similar changes"...ugh)


What I wish they did was to add a simple "rebased-from" header field on 
commits, which could be trivially checked on merges to issue a warning. 
I guess the reason it is not there is because usually you're about to 
(automatically) throw away the commits, so the only purpose it would 
have would be sort of pedagogical.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-05 Thread Dag Sverre Seljebotn
Yes, that is the only time it happens.

Do we agree on a) ask before you pull anything that is not in cython/* (ie in 
private repos), b) document it in hackerguide?

DS


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Robert Bradshaw  wrote:

On Thu, May 5, 2011 at 1:22 PM, Stefan Behnel  wrote: > 
Dag Sverre Seljebotn, 05.05.2011 21:52: >> >> There was just a messup in git 
history: Mark's OpenMP pull request got >> merged twice; all commits show up 
two times. > > What (I think) happened, was that Vitja pulled in Mark's changes 
into his > unreachable code removal branch, and they ended up in his pull 
request. I > guess I was assuming that git wouldn't care too much about branch 
> duplication, so I just accepted the pull request via the web interface. > 
Apparently, it did care. > > I tend to rebase my local change sets before 
pushing them, and I think it > makes sense to continue doing that. +1, I think 
for as-yet-unpublished changes, it makes the most sense to rebase, but for a 
longer-term branch, merging isn't as disruptive to the history (in fact is 
probably more reflective of what's going on) and is much better than 
duplication. To clarify, is this only a problem when we have A cloned from m
 aster B
cloned from A (or from master and then pulls in A) A rebases A+B merged into 
master ? If this is the case, then we could simply make the rule that you 
should ask before hacking a clone atop anything but master. (Multiple people 
can share a repeatedly-rebased branch, right.) We could also us the underscore 
(or another) convention to mean "this branch is being used as a queue, puller 
beware." Surely other projects have dealt with this. - 
Robert_
cython-devel mailing list cython-devel@python.org 
http://mail.python.org/mailman/listinfo/cython-devel 

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-05 Thread Dag Sverre Seljebotn

On 05/06/2011 08:20 AM, Vitja Makarov wrote:

2011/5/6 Robert Bradshaw:

I don't like the default to be "don't pull from me"--I'd rather there
be some convention to indicate a branch is being used as a queue.
Maybe even foo-queue, or a leading underscore if people like that.

On Thu, May 5, 2011 at 2:03 PM, Dag Sverre Seljebotn
  wrote:

Yes, that is the only time it happens.

Do we agree on a) ask before you pull anything that is not in cython/* (ie
in private repos), b) document it in hackerguide?

DS


--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Robert Bradshaw  wrote:


On Thu, May 5, 2011 at 1:22 PM, Stefan Behnel  wrote:

Dag Sverre Seljebotn, 05.05.2011 21:52:>>  >>  There was just a messup in

git history: Mark's OpenMP pull request got>>  merged twice; all commits
show up two times.>  >  What (I think) happened, was that Vitja pulled in
Mark's changes into his>  unreachable code removal branch, and they ended up
in his pull request. I>  guess I was assuming that git wouldn't care too
much about branch>  duplication, so I just accepted the pull request via the
web interface.>  Apparently, it did care.>  >  I tend to rebase my local
change sets before pushing them, and I think it>  makes sense to continue
doing that. +1, I think for as-yet-unpublished changes, it makes the most
sense to rebase, but for a longer-term branch, merging isn't as disruptive
to the history (in fact is probably more reflective of what's going on) and
is much better than duplication. To clarify, is this only a problem when we
have A cloned from master B cloned from A (or from master and then pulls in
A) A rebases A+B merged into master ? If this is the case, then we could
simply make the rule that you should ask before hacking a clone atop
anything but master. (Multiple people can share a repeatedly-rebased branch,
right.) We could also us the underscore (or another) convention to mean
"this branch is being used as a queue, puller beware." Surely other projects
have dealt with this. - Robert



About my branch:

I've rebased it from upstream/master at home and made "forced push"
At work I pulled it back and rebased from origin, then I tried to
rebase if again from upstream/master


Do I understand correctly that you:

 a) You make local changes at home
 b) Rebase them on cython/master
 c) Force-push to vitja/somebranch
 d) Go to work, where you have other local changes
 e) Rebase your work changes at work on top of vitja/somebranch

If this is correct; then this can't work. The reason is that after the 
force-push in c), there are no shared commits (apart from what's shared 
from cython/master) between your work computer and vitja/somebranch.


So the rule is: If you rebase a branch, then if you have other copies of 
that branch (like on a work computer), destroy them (e.g., git branch 
-D)!  And then fetch new copies of the branches.


(And as you say, if you do have different changes in many places then 
you can recover from an unfortunate rebase by cherry-picking. And you 
can always undo a rebase by looking at "git reflog" and manually check 
out the old HEAD.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-06 Thread Dag Sverre Seljebotn

On 05/05/2011 11:07 PM, Robert Bradshaw wrote:

I don't like the default to be "don't pull from me"--I'd rather there
be some convention to indicate a branch is being used as a queue.
Maybe even foo-queue, or a leading underscore if people like that.


I've seen leading underscore being used by other people on github, so 
let's settle on that for now.


Of course, if you do pull from a non-master branch, you should be 
communicating a lot about that fact anyway; it's a bad idea for a lot of 
other reasons as well.


I've updated http://wiki.cython.org/HackerGuide.

Here's an example of prior art in git workflows, developed I think 
primarily for IPython:


https://github.com/matthew-brett/gitwash

It's essentially some Sphinx documentation with replacable names ("To 
contribute to PROJECTNAME, you should get an account on github...") that 
one can merge into ones own documentation. If anybody is interested in 
looking at that I'm all for it.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-06 Thread Dag Sverre Seljebotn

On 05/06/2011 09:14 AM, Dag Sverre Seljebotn wrote:

On 05/05/2011 11:07 PM, Robert Bradshaw wrote:

I don't like the default to be "don't pull from me"--I'd rather there
be some convention to indicate a branch is being used as a queue.
Maybe even foo-queue, or a leading underscore if people like that.


I've seen leading underscore being used by other people on github, so
let's settle on that for now.

Of course, if you do pull from a non-master branch, you should be
communicating a lot about that fact anyway; it's a bad idea for a lot of
other reasons as well.

I've updated http://wiki.cython.org/HackerGuide.

Here's an example of prior art in git workflows, developed I think
primarily for IPython:

https://github.com/matthew-brett/gitwash

It's essentially some Sphinx documentation with replacable names ("To
contribute to PROJECTNAME, you should get an account on github...") that
one can merge into ones own documentation. If anybody is interested in
looking at that I'm all for it.


Here's an example of resulting docs from "gitwash-dumper.py":

http://nipy.sourceforge.net/nipy/stable/devel/guidelines/gitwash/index.html

DS
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-06 Thread Dag Sverre Seljebotn

On 05/06/2011 09:24 AM, Vitja Makarov wrote:

2011/5/6 Dag Sverre Seljebotn:

On 05/06/2011 08:20 AM, Vitja Makarov wrote:


2011/5/6 Robert Bradshaw:


I don't like the default to be "don't pull from me"--I'd rather there
be some convention to indicate a branch is being used as a queue.
Maybe even foo-queue, or a leading underscore if people like that.

On Thu, May 5, 2011 at 2:03 PM, Dag Sverre Seljebotn
wrote:


Yes, that is the only time it happens.

Do we agree on a) ask before you pull anything that is not in cython/*
(ie
in private repos), b) document it in hackerguide?

DS


--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Robert Bradshawwrote:


On Thu, May 5, 2011 at 1:22 PM, Stefan Behnel
  wrote:


Dag Sverre Seljebotn, 05.05.2011 21:52:>>>>There was just a messup
in


git history: Mark's OpenMP pull request got>>merged twice; all
commits
show up two times.>>What (I think) happened, was that Vitja pulled
in
Mark's changes into his>unreachable code removal branch, and they
ended up
in his pull request. I>guess I was assuming that git wouldn't care
too
much about branch>duplication, so I just accepted the pull request
via the
web interface.>Apparently, it did care.>>I tend to rebase my
local
change sets before pushing them, and I think it>makes sense to
continue
doing that. +1, I think for as-yet-unpublished changes, it makes the
most
sense to rebase, but for a longer-term branch, merging isn't as
disruptive
to the history (in fact is probably more reflective of what's going on)
and
is much better than duplication. To clarify, is this only a problem
when we
have A cloned from master B cloned from A (or from master and then
pulls in
A) A rebases A+B merged into master ? If this is the case, then we
could
simply make the rule that you should ask before hacking a clone atop
anything but master. (Multiple people can share a repeatedly-rebased
branch,
right.) We could also us the underscore (or another) convention to mean
"this branch is being used as a queue, puller beware." Surely other
projects
have dealt with this. - Robert



About my branch:

I've rebased it from upstream/master at home and made "forced push"
At work I pulled it back and rebased from origin, then I tried to
rebase if again from upstream/master


Do I understand correctly that you:

  a) You make local changes at home
  b) Rebase them on cython/master
  c) Force-push to vitja/somebranch
  d) Go to work, where you have other local changes
  e) Rebase your work changes at work on top of vitja/somebranch




Right.


If this is correct; then this can't work. The reason is that after the
force-push in c), there are no shared commits (apart from what's shared from
cython/master) between your work computer and vitja/somebranch.

So the rule is: If you rebase a branch, then if you have other copies of
that branch (like on a work computer), destroy them (e.g., git branch -D)!
  And then fetch new copies of the branches.

(And as you say, if you do have different changes in many places then you
can recover from an unfortunate rebase by cherry-picking. And you can always
undo a rebase by looking at "git reflog" and manually check out the old
HEAD.)



Thank you for explanation.

So btw, when I do rebase and my changes were already pushed I have to
use forced push.
Is forced push ok?


Forced push is trivially OK if the commits you are "overwriting" has not 
been fetched anywhere else (or, you plan to immediately erase them at 
their other location). Otherwise, you really need to pay attention.


In general, if you make the following cycle "atomic", you're OK:

 - Fetch branch from github
 - Make some commits
 - Force-push back to github

However, if you interrupt the cycle in the middle, you'll need to spend 
time to recover from your "race" :-)


Here's a similar question which lists some convenient commands:

http://stackoverflow.com/questions/3815193/how-can-i-safely-use-git-rebase-when-working-on-multiple-computers

(Although I'm not sure if I recommend getting into the habit of doing 
"git push -f", just typing "git push -f origin mybranch" seems a lot safer.)


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] buffers and flow-control

2011-05-09 Thread Dag Sverre Seljebotn

On 05/09/2011 09:29 AM, Vitja Makarov wrote:

I've never been using buffers so my question is:

Should uninitialized buffer access raise UnboundLocalVariable error?

Like this:

def foo():
  cdef object  bar
  print bar



"object[int]" should behave exactly the same way as "object" does during 
control flow analysis.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-12 Thread Dag Sverre Seljebotn

On 05/13/2011 12:36 AM, Ondrej Certik wrote:

Hi,

On Thu, May 5, 2011 at 12:52 PM, Dag Sverre Seljebotn
  wrote:

There was just a messup in git history: Mark's OpenMP pull request got
merged twice; all commits show up two times.

It doesn't matter, since the two openmp branches with the same changes
merged OK, but we shouldn't make this a habit. For instance, the openMP
commits also show up as part of vitja's pull request, which is confusing.

In Mercurial speak: The openmp branch was used like you would use a
Mercurial "patch queue" in one case, and as a branch in another case. In git
they are the same technically and you rely on conventions to make sure you
don't treat a "queue" as a "branch".

OPTION A) Either i) only branch from master, or ii) make sure you agree with
whoever you're branching from that this is a "branch", not a "patch queue",
so that it isn't rebased under your feet.

We could also, say, prepend all patch queues with an underscore (its
private).

OPTION B) Stop rebasing. I'd have a very hard time doing that myself, but
nobody are pulling from dagss/cython these days anyway.


What about:

OPTION C) The one who pushes things into the master knows master
enough to see whether or not it makes sense to merge this, or if it
was already in, he/she will simply comment into the pull request and
close it manually


This doesn't make sense to me. Are you sure you read the scenario correctly?

Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Git workflow, branches, pull requests

2011-05-13 Thread Dag Sverre Seljebotn

On 05/13/2011 09:05 AM, Ondrej Certik wrote:

On Thu, May 12, 2011 at 11:34 PM, Dag Sverre Seljebotn
  wrote:

On 05/13/2011 12:36 AM, Ondrej Certik wrote:


Hi,

On Thu, May 5, 2011 at 12:52 PM, Dag Sverre Seljebotn
wrote:


There was just a messup in git history: Mark's OpenMP pull request got
merged twice; all commits show up two times.

It doesn't matter, since the two openmp branches with the same changes
merged OK, but we shouldn't make this a habit. For instance, the openMP
commits also show up as part of vitja's pull request, which is confusing.

In Mercurial speak: The openmp branch was used like you would use a
Mercurial "patch queue" in one case, and as a branch in another case. In
git
they are the same technically and you rely on conventions to make sure
you
don't treat a "queue" as a "branch".

OPTION A) Either i) only branch from master, or ii) make sure you agree
with
whoever you're branching from that this is a "branch", not a "patch
queue",
so that it isn't rebased under your feet.

We could also, say, prepend all patch queues with an underscore (its
private).

OPTION B) Stop rebasing. I'd have a very hard time doing that myself, but
nobody are pulling from dagss/cython these days anyway.


What about:

OPTION C) The one who pushes things into the master knows master
enough to see whether or not it makes sense to merge this, or if it
was already in, he/she will simply comment into the pull request and
close it manually


This doesn't make sense to me. Are you sure you read the scenario correctly?


You wrote:

"
There was just a messup in git history: Mark's OpenMP pull request got
merged twice; all commits show up two times.
"

So somebody pushed in Marks' patches twice. My OPTION C) is that the
one, who pushes patches in is responsible to make sure that they only
get pushed in once.

That's what we do in sympy, we don't have any formal option A or B,
but people with push access must prove that they are capable of using
git, and not breaking (or messing up) things. Of course, everybody can
make a mistake though.

It seems to be working just great, so I just wanted to share our
experience. Let me know what doesn't make sense.


Ah ok. So in this case, the reviewer would have to request that the 
second pull request was fixed/rebased. I guess that is still the safety 
mechanism, but it's nice to also discuss how to not get into those 
situations. I'm not saying that there will be serious repercussions if 
one doesn't follow the rules, I was more talking guidelines for not 
getting into trouble without having to learn all of git.


Note that a big part of this thread was to actually make sure everybody 
(in particular the core devs) knew about how Git rebasing works. The 
mistake was made in the first place because the reviewer assumed that 
"git will figure this out". Keep in mind that we just switched, and I 
think some core devs are still using hg-git, for instance.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] nonecheck directive

2011-05-21 Thread Dag Sverre Seljebotn

On 05/21/2011 07:57 AM, Stefan Behnel wrote:

Robert Bradshaw, 20.05.2011 17:33:

On Fri, May 20, 2011 at 8:13 AM, Stefan Behnel wrote:

why is the "nonecheck" directive set to False by default? Shouldn't it
rather be a "I know what I'm doing" option that allows advanced users to
trade speed for safety?


Erm, trade safety for speed, obviously ...



The reason I'm asking is that I just enabled its evaluation in
NoneCheckNode
and immediately got crashes in the test suite. So its currently only
half-heartedly safe because it's not being evaluated in a lot of places.
That's a rather fragile situation, not only for refactorings.


The reasoning was that we didn't want to have a major performance
regression on existing code has already been written knowing these
semantics, and also that we eventually plan to solve this more
gracefully using control flow.


I can see that there could have been a slight, potential performance
regression due to additional None checks, even considering that the C
compiler can often drop many of them due to its own control flow
analysis, and even though the CPU's branch prediction can be expected to
handle this quite well even in loops.

However, for users, it's hard to predict where Cython can avoid None
checks and where it cannot, so having to explicitly tell it to do None
checks in a specific code section means that users encounter and analyse
a crash first, potentially when switching to a newer Cython version. The
opt-out way would have allowed them to disable it only for code sections
where it is really getting in the way, and would have made it clear in
their own code that something potentially unsafe is happening where they
are on their own.

I think that even in the face of future control flow analysis in Cython,
it would still have been better to make it an opt-out rather than opt-in
option, but I would expect that we can still switch the default setting
when a suitable CFA step becomes available.

In the future, I think we should be more careful with potentially
harmful options, and always prefer safety over speed - *especially* when
we know that the safe way will improve at some point.


There wasn't a point where anybody wasn't careful about this; it is 
simply something that was inherited from Pyrex. The nonecheck directive 
came much later.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] local variable handling in generators

2011-05-22 Thread Dag Sverre Seljebotn

On 05/22/2011 02:33 PM, Stefan Behnel wrote:

Hi,

I've been looking at the nqueens benchmark for a while, and I think it's
actually not that a bad benchmark for generators.

http://hg.python.org/benchmarks/file/tip/performance/bm_nqueens.py

A better implementation only for Py2.7/Py3 is here:

https://github.com/cython/cython/blob/master/Demos/benchmarks/nqueens.py

Cython currently runs the first implementation about as fast as Py3.3:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-pybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html


and the second one more than 3x as fast:

https://sage.math.washington.edu:8091/hudson/view/bench/job/cython-devel-cybenchmarks-py3k/lastSuccessfulBuild/artifact/chart.html


However, I think there's still some space for improvements, and local
variables are part of that. For generator functions that do non-trivial
things between yields, I think that local variables will quickly become
a bottleneck. Currently, they are always closure fields, so any access
to them will use a pointer indirection to a foreign struct, originally
passed in as an argument to the function. Given that generators often do
Python object manipulation through C-API calls, any such call will
basically require the C compiler to assume that all values in the
closure may have changed, thus disabling any optimisations for them. The
same applies to many other object related operations or pointer
operations (even DECREF!), as the C compiler cannot know that the
generator function owns the closure during its lifetime exclusively.

I think it would be worth changing the current implementation to use
local C variables for local Cython variables in the generator, and to
copy the values back into/from the closure around yields. I'd even let
local Python references start off as NULL when the generator is created,
given that Vitek's branch can eliminate None initialisations now.


Keep in mind that if speed is the objective, another idea is to use real 
C coroutines. This would likely be faster than anything we can make up 
ourselves; a single stack jump is bound to be faster than copying things 
in and out of the stack.


Of course, probably more work. And then portability as the API is 
seperate for Windows (fibers) and POSIX (makecontext). But I don't think 
there's a lack of compatability layer libraries which unite the 
platform-specific APIs.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] local variable handling in generators

2011-05-23 Thread Dag Sverre Seljebotn

On 05/23/2011 10:50 AM, Vitja Makarov wrote:

2011/5/23 Stefan Behnel:

Vitja Makarov, 23.05.2011 10:13:


With live variable analysis that should be easy to save/restore only
active variables at the yield point.


"Active" in the sense of "modified", I suppose? That's what I was expecting.



Active means that variable value will be used. In my example after
'print a' a isn't used anymore.




Btw now only reaching definitions analysis is implemented. I'm going
to optimize by replacing sets with bitsets. And then try to implement
live varaiables.

I'm going to delete variable reference using active variable info, but
that could introduce small incompatiblity with CPython:
a = X
print a #<- a will be decrefed here
print 'the end'


That incompatibility is not small at all. It breaks this code:

x = b'abc'
cdef char* c = x

Even if 'x' is no longer used after this point, it *must not* get freed
before 'c' is going away as well. That's basically impossible to decide, as
users may pass 'c' into a function that stores it away for alter use.



Yeah. That's hard to detect. But x could be marked as "don't decref
when not-active"



def f(object o):
cdef char* buf
buf = get_buffer_of_obj(o)
call_c_func(buf)

So there's a lot of variables that would have to be marked this way (but 
not all, I can see that).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] [cython] Initial startswith / endswith optimization (#35)

2011-05-26 Thread Dag Sverre Seljebotn

On 05/26/2011 10:12 AM, Stefan Behnel wrote:

Robert Bradshaw, 26.05.2011 09:40:

the pattern of swapping out builtin methods (and perhaps
functions) with more optimized C versions is something that perhaps it
would be good to be able to do more generally, rather than hard coding
the list into Optimize.py.


Right. All that would really be needed is a way to define default values
for arguments of builtin methods. Then most of the method optimisations
could be moved into Builtin.py.


BTW, the idea of the overlay stuff Robert referred to was that we could 
add syntax to pxd files so that the "unicode" type and its alternative 
method implementations could be fleshed out in a pxd file (and the same 
with other standard library or third-party types that are not written 
with Cython support in mind, but may have a C API that we want to 
dispatch to instead of their Python API).


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


  1   2   3   4   >