Re: [Cython] compiler performance issue for extended utility code
On 8 October 2011 10:22, mark florisson wrote: > On 8 October 2011 08:03, Stefan Behnel wrote: > > Vitja Makarov, 07.10.2011 18:01: > >>> > >>> 2011/10/7 Stefan Behnel: > > Vitja Makarov, 06.10.2011 23:12: > > > > Here is small comparison on compiling urllib.py with cython: > > > > ((e8527c5...)) vitja@mchome:~/work/cython-vitek-git/zzz$ time python > > ../cython.py urllib.py > > > > real0m1.699s > > user0m1.650s > > sys 0m0.040s > > (master) vitja@mchome:~/work/cython-vitek-git/zzz$ time python > > ../cython.py urllib.py > > > > real0m2.830s > > user0m2.790s > > sys 0m0.030s > > > > > > It's about 1.5 times slower. > > That's a pretty serious regression for > plain Python code then. Again, this needs proper profiling. > >> > >> I've added return statement on top of CythonScope.test_cythonscope, > >> now I have these timings: > >> > >> (master) vitja@mchome:~/work/cython-vitek-git/zzz$ time python > >> ../cython.py urllib.py > >> > >> real0m1.764s > >> user0m1.700s > >> sys 0m0.060s > > > > Ok, then it's only a bug. "create_testscope" is on by default in Main.py, > > Context.__init__(). I don't know what it does exactly, but my guess is > that > > the option should a) be off by default and b) should rather be passed in > by > > the test runner as part of the compile options rather than being a > parameter > > of the Context class. AFAICT, it's currently only used in > TreeFragment.py, > > where it is being switched off explicitly for parsing code snippets. > > > > Stefan > > ___ > > cython-devel mailing list > > cython-devel@python.org > > http://mail.python.org/mailman/listinfo/cython-devel > > > > It turns it off to avoid infinite recursion. This basically means that > you cannot use stuf from the Cython scope in your Cython utilities. So > in your Cython utilities, you have to declare the C version of it > (which you declared with the @cname decorator). > > This is not really something that can just be avoided loading like > this. Perhaps one solution could be to load the test scope when you do > a lookup in the cython scope for which no entry is found. But really, > libcython and serializing entries will solve all this, so I suppose > the real question is, do we want to do a release before we support > such functionality? > Anyway, the cython scope lookup would be a simple hack worth a try. > I applied the hack, i.e. defer loading the scope until the first entry in the cython scope can't be found: https://github.com/markflorisson88/cython/commit/ad4cf6303d1bf8a81e3afccc9572559a34827a3b [0] [11:16] ~ ➤ time cython urllib.py # conditionally load scope cython urllib.py 2.75s user 0.14s system 99% cpu 2.893 total [0] [11:17] ~ ➤ time cython urllib.py # always load scope cython urllib.py 4.08s user 0.16s system 99% cpu 4.239 total ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
[Cython] cython.parallel tasks, single, master, critical, barriers
Hey, So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, assume parallel has been imported from cython: with parallel.master(): this is executed in the master thread in a parallel (non-prange) section with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. with parallel.task(): create a task to be executed by some thread in the team once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread) C variables will be firstprivate Python objects will be shared parallel.taskwait() # wait on any direct descendent tasks to finish with parallel.critical(): this section of code is mutually exclusive with other critical sections optional keyword argument 'name' specifies a name for the critical section, which means all sections with that name will exclude each other, but not critical sections with different names Note: all threads that encounter the section will execute it, just not at the same time with parallel.barrier(): all threads wait until everyone has reached the barrier either no one or everyone should encounter the barrier shared variables are flushed Unfortunately, gcc again manages to horribly break master and single constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first file a bug report. Other (better) compilers like Portland (and I'm sure Intel) work fine. I suppose a warning in the documentation will suffice there. If we at some point implement vector/SIMD operations we could also try out the Fortran openmp workshare construct. What do you guys think? Mark ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 10/09/2011 02:11 PM, mark florisson wrote: Hey, So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, Great!! I only have time for a very short feedback now, perhaps more will follow. assume parallel has been imported from cython: with parallel.master(): this is executed in the master thread in a parallel (non-prange) section with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. with parallel.task(): create a task to be executed by some thread in the team once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread) C variables will be firstprivate Python objects will be shared parallel.taskwait() # wait on any direct descendent tasks to finish Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run. with parallel.critical(): this section of code is mutually exclusive with other critical sections optional keyword argument 'name' specifies a name for the critical section, which means all sections with that name will exclude each other, but not critical sections with different names Note: all threads that encounter the section will execute it, just not at the same time with parallel.barrier(): all threads wait until everyone has reached the barrier either no one or everyone should encounter the barrier shared variables are flushed Unfortunately, gcc again manages to horribly break master and single constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first file a bug report. Other (better) compilers like Portland (and I'm sure Intel) work fine. I suppose a warning in the documentation will suffice there. If we at some point implement vector/SIMD operations we could also try out the Fortran openmp workshare construct. I'm starting to learn myself OpenCL as part of a course. It's very neat for some kinds of parallelism. What I'm saying is that at least of the case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same die design). Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: On 10/09/2011 02:11 PM, mark florisson wrote: Hey, So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, Great!! I only have time for a very short feedback now, perhaps more will follow. assume parallel has been imported from cython: with parallel.master(): this is executed in the master thread in a parallel (non-prange) section with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. I like if parallel.is_master(): ... explicit_barrier_somehow() # see below better as a Pythonization. One could easily support is_master to be used in other contexts as well, simply by assigning a status flag in the master block. Using an if-test flows much better with Python I feel, but that naturally lead to making the barrier explicit. But I like the barrier always being explicit, rather than having it as a predicate on all the different constructs like in OpenMP I'm less sure about single, since making it a function indicates one could use it in other contexts and the whole thing becomes too magic (since it's tied to the position of invocation). I'm tempted to suggest for _ in prange(1): ... as our syntax for single. with parallel.task(): create a task to be executed by some thread in the team once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread) C variables will be firstprivate Python objects will be shared parallel.taskwait() # wait on any direct descendent tasks to finish Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run. with parallel.critical(): this section of code is mutually exclusive with other critical sections optional keyword argument 'name' specifies a name for the critical section, which means all sections with that name will exclude each other, but not critical sections with different names Note: all threads that encounter the section will execute it, just not at the same time Yes, this works well as a with-statement... ..except that it is slightly magic in that it binds to call position (unlike anything in Python). I.e. this would be more "correct", or at least Pythonic: with parallel.critical(__file__, __line__): ... with parallel.barrier(): all threads wait until everyone has reached the barrier either no one or everyone should encounter the barrier shared variables are flushed I have problems with requiring a noop with block... I'd much rather write parallel.barrier() However, that ties a function call to the place of invocation, and suggests that one could do if rand() > .5: barrier() else: i += 3 barrier() and have the same barrier in each case. Again, barrier(__file__, __line__) gets us purity at the cost of practicality. Another way is the pthreads approach (although one may have to use pthread rather then OpenMP to get it, unless there are named barriers?): barrier_a = parallel.barrier() barrier_b = parallel.barrier() with parallel: barrier_a.wait() if rand() > .5: barrier_b.wait() else: i += 3 barrier_b.wait() I'm really not sure here. Unfortunately, gcc again manages to horribly break master and single constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first file a bug report. Other (better) compilers like Portland (and I'm sure Intel) work fine. I suppose a warning in the documentation will suffice there. If we at some point implement vector/SIMD operations we could also try out the Fortran openmp workshare construct. I'm starting to learn myself OpenCL as part of a course. It's very neat for some kinds of parallelism. What I'm saying is that at least of the case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same die design). Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 13:18, Dag Sverre Seljebotn wrote: > > On 10/09/2011 02:11 PM, mark florisson wrote: >> >> Hey, >> >> So far people have been enthusiastic about the cython.parallel features, >> I think we should introduce some new features. I propose the following, > > Great!! > > I only have time for a very short feedback now, perhaps more will follow. > >> assume parallel has been imported from cython: >> >> with parallel.master(): >> this is executed in the master thread in a parallel (non-prange) >> section >> >> with parallel.single(): >> same as master, except any thread may do the execution >> >> An optional keyword argument 'nowait' specifies whether there will be a >> barrier at the end. The default is to wait. >> >> with parallel.task(): >> create a task to be executed by some thread in the team >> once a thread takes up the task it shall only be executed by that >> thread and no other thread (so the task will be tied to the thread) >> >> C variables will be firstprivate >> Python objects will be shared >> >> parallel.taskwait() # wait on any direct descendent tasks to finish > > Regarding tasks, I think this is mapping OpenMP too close to Python. Closures > are excellent for the notion of a task, so I think something based on the > futures API would work better. I realize that makes the mapping to OpenMP and > implementation a bit more difficult, but I think it is worth it in the long > run. Hmm, that would be cool as well. Something like parallel.submit_task(myclosure)? The problem I see with that is that parallel stuff can't have the GIL, and you can only have 'def' closures at the moment. I realize that you won't actually have to use closure support here though, and could just transform the inner function to OpenMP task code. This would maybe look inconsistent with other closures though, and you'd also have to restrict the use of such a closure to parallel.submit_task(). Anyway, perhaps you have a concrete proposal that addresses these problems. >> >> with parallel.critical(): >> this section of code is mutually exclusive with other critical sections >> optional keyword argument 'name' specifies a name for the critical >> section, >> which means all sections with that name will exclude each other, >> but not >> critical sections with different names >> >> Note: all threads that encounter the section will execute it, just >> not at the same time >> >> with parallel.barrier(): >> all threads wait until everyone has reached the barrier >> either no one or everyone should encounter the barrier >> shared variables are flushed >> >> Unfortunately, gcc again manages to horribly break master and single >> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >> first file a bug report. Other (better) compilers like Portland (and I'm >> sure Intel) work fine. I suppose a warning in the documentation will >> suffice there. >> >> If we at some point implement vector/SIMD operations we could also try >> out the Fortran openmp workshare construct. > > I'm starting to learn myself OpenCL as part of a course. It's very neat for > some kinds of parallelism. What I'm saying is that at least of the case of > SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but > also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same > die design). Oh, definitely. The good thing is that code generation backends needn't be that hard. If you figure all semantics out in the Python code you could based on a backend load a different utility template as a string. It's probably not that easy, but the point is that as long as your code semantics don't prevent other backends, you keep your options open. In the end I want to be able to write a parallel program almost serially and have Cython compile it to OpenMP, MPI, GPU's or whatever else I need. At the same time I need to stay in touch with reality, so it's one step at a time :) > Dag Sverre > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 13:57, Dag Sverre Seljebotn wrote: > On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >> >> On 10/09/2011 02:11 PM, mark florisson wrote: >>> >>> Hey, >>> >>> So far people have been enthusiastic about the cython.parallel features, >>> I think we should introduce some new features. I propose the following, >> >> Great!! >> >> I only have time for a very short feedback now, perhaps more will follow. >> >>> assume parallel has been imported from cython: >>> >>> with parallel.master(): >>> this is executed in the master thread in a parallel (non-prange) >>> section >>> >>> with parallel.single(): >>> same as master, except any thread may do the execution >>> >>> An optional keyword argument 'nowait' specifies whether there will be a >>> barrier at the end. The default is to wait. > > I like > > if parallel.is_master(): > ... > explicit_barrier_somehow() # see below > > better as a Pythonization. One could easily support is_master to be used in > other contexts as well, simply by assigning a status flag in the master > block. > > Using an if-test flows much better with Python I feel, but that naturally > lead to making the barrier explicit. But I like the barrier always being > explicit, rather than having it as a predicate on all the different > constructs like in OpenMP Hmm, that might mean you also want the barrier for a prange in a parallel to be explicit. I like the 'if' test though, although it wouldn't make sense for 'single'. > I'm less sure about single, since making it a function indicates one could > use it in other contexts and the whole thing becomes too magic (since it's > tied to the position of invocation). I'm tempted to suggest > > for _ in prange(1): > ... > > as our syntax for single. I think that syntax is absolutely terrible :) Perhaps single is not so important and one can just use master instead (or, if really needed, master + a task with the actual work). >>> >>> with parallel.task(): >>> create a task to be executed by some thread in the team >>> once a thread takes up the task it shall only be executed by that >>> thread and no other thread (so the task will be tied to the thread) >>> >>> C variables will be firstprivate >>> Python objects will be shared >>> >>> parallel.taskwait() # wait on any direct descendent tasks to finish >> >> Regarding tasks, I think this is mapping OpenMP too close to Python. >> Closures are excellent for the notion of a task, so I think something >> based on the futures API would work better. I realize that makes the >> mapping to OpenMP and implementation a bit more difficult, but I think >> it is worth it in the long run. >> >>> >>> with parallel.critical(): >>> this section of code is mutually exclusive with other critical sections >>> optional keyword argument 'name' specifies a name for the critical >>> section, >>> which means all sections with that name will exclude each other, >>> but not >>> critical sections with different names >>> >>> Note: all threads that encounter the section will execute it, just >>> not at the same time > > Yes, this works well as a with-statement... > > ..except that it is slightly magic in that it binds to call position (unlike > anything in Python). I.e. this would be more "correct", or at least > Pythonic: > > with parallel.critical(__file__, __line__): > ... > I'm not entirely sure what you mean here. Critical is really about the block contained within, not about a position in a file. Not all threads have to encounter the critical region, and not specifying a name means you exclude with *all other* unnamed critical sections (not just this one). >>> >>> with parallel.barrier(): >>> all threads wait until everyone has reached the barrier >>> either no one or everyone should encounter the barrier >>> shared variables are flushed > > I have problems with requiring a noop with block... > > I'd much rather write > > parallel.barrier() Although in OpenMP it doesn't have any associated code, but we could give it those semantics: apply the barrier at the end of the block of code. The con is that the barrier is at the top while it only affects leaving the block, you would write: with parallel.barrier(): if rand() > .5: ... else: ... # the barrier is here > However, that ties a function call to the place of invocation, and suggests > that one could do > > if rand() > .5: > barrier() > else: > i += 3 > barrier() > > and have the same barrier in each case. Again, > > barrier(__file__, __line__) > > gets us purity at the cost of practicality. In this case (unlike the critical construct), yes. I think a warning in the docs stating that either all or none of the threads must encounter the barrier should suffice. > Another way is the pthreads > approach (although one may have to use pthread rather then OpenMP to get it, > unless there are named barriers?): > > barrier_a = parallel.barrier() > barrier_b = parallel.barrier() > with parallel: > barrier_a.wa
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 14:30, mark florisson wrote: > On 9 October 2011 13:57, Dag Sverre Seljebotn > wrote: >> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>> >>> On 10/09/2011 02:11 PM, mark florisson wrote: Hey, So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, >>> >>> Great!! >>> >>> I only have time for a very short feedback now, perhaps more will follow. >>> assume parallel has been imported from cython: with parallel.master(): this is executed in the master thread in a parallel (non-prange) section with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. >> >> I like >> >> if parallel.is_master(): >> ... >> explicit_barrier_somehow() # see below >> >> better as a Pythonization. One could easily support is_master to be used in >> other contexts as well, simply by assigning a status flag in the master >> block. >> >> Using an if-test flows much better with Python I feel, but that naturally >> lead to making the barrier explicit. But I like the barrier always being >> explicit, rather than having it as a predicate on all the different >> constructs like in OpenMP > > Hmm, that might mean you also want the barrier for a prange in a > parallel to be explicit. I like the 'if' test though, although it > wouldn't make sense for 'single'. > >> I'm less sure about single, since making it a function indicates one could >> use it in other contexts and the whole thing becomes too magic (since it's >> tied to the position of invocation). I'm tempted to suggest >> >> for _ in prange(1): >> ... >> >> as our syntax for single. > > I think that syntax is absolutely terrible :) Perhaps single is not so > important and one can just use master instead (or, if really needed, > master + a task with the actual work). > with parallel.task(): create a task to be executed by some thread in the team once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread) C variables will be firstprivate Python objects will be shared parallel.taskwait() # wait on any direct descendent tasks to finish >>> >>> Regarding tasks, I think this is mapping OpenMP too close to Python. >>> Closures are excellent for the notion of a task, so I think something >>> based on the futures API would work better. I realize that makes the >>> mapping to OpenMP and implementation a bit more difficult, but I think >>> it is worth it in the long run. >>> with parallel.critical(): this section of code is mutually exclusive with other critical sections optional keyword argument 'name' specifies a name for the critical section, which means all sections with that name will exclude each other, but not critical sections with different names Note: all threads that encounter the section will execute it, just not at the same time >> >> Yes, this works well as a with-statement... >> >> ..except that it is slightly magic in that it binds to call position (unlike >> anything in Python). I.e. this would be more "correct", or at least >> Pythonic: >> >> with parallel.critical(__file__, __line__): >> ... >> > > I'm not entirely sure what you mean here. Critical is really about the > block contained within, not about a position in a file. Not all > threads have to encounter the critical region, and not specifying a > name means you exclude with *all other* unnamed critical sections (not > just this one). > with parallel.barrier(): all threads wait until everyone has reached the barrier either no one or everyone should encounter the barrier shared variables are flushed >> >> I have problems with requiring a noop with block... >> >> I'd much rather write >> >> parallel.barrier() > > Although in OpenMP it doesn't have any associated code, but we could > give it those semantics: apply the barrier at the end of the block of > code. The con is that the barrier is at the top while it only affects > leaving the block, you would write: > > with parallel.barrier(): > if rand() > .5: > ... > else: > ... > # the barrier is here > >> However, that ties a function call to the place of invocation, and suggests >> that one could do >> >> if rand() > .5: >> barrier() >> else: >> i += 3 >> barrier() >> >> and have the same barrier in each case. Again, >> >> barrier(__file__, __line__) >> >> gets us purity at the cost of practicality. > > In this case (unlike the critical construct), yes. I think a warning > in the docs stating that either all or none of the threads must > encounter the barrier should suffice. > >> Another way is the pthreads
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 14:39, mark florisson wrote: > On 9 October 2011 14:30, mark florisson wrote: >> On 9 October 2011 13:57, Dag Sverre Seljebotn >> wrote: >>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: On 10/09/2011 02:11 PM, mark florisson wrote: > > Hey, > > So far people have been enthusiastic about the cython.parallel features, > I think we should introduce some new features. I propose the following, Great!! I only have time for a very short feedback now, perhaps more will follow. > assume parallel has been imported from cython: > > with parallel.master(): > this is executed in the master thread in a parallel (non-prange) > section > > with parallel.single(): > same as master, except any thread may do the execution > > An optional keyword argument 'nowait' specifies whether there will be a > barrier at the end. The default is to wait. >>> >>> I like >>> >>> if parallel.is_master(): >>> ... >>> explicit_barrier_somehow() # see below >>> >>> better as a Pythonization. One could easily support is_master to be used in >>> other contexts as well, simply by assigning a status flag in the master >>> block. >>> >>> Using an if-test flows much better with Python I feel, but that naturally >>> lead to making the barrier explicit. But I like the barrier always being >>> explicit, rather than having it as a predicate on all the different >>> constructs like in OpenMP >> >> Hmm, that might mean you also want the barrier for a prange in a >> parallel to be explicit. I like the 'if' test though, although it >> wouldn't make sense for 'single'. >> >>> I'm less sure about single, since making it a function indicates one could >>> use it in other contexts and the whole thing becomes too magic (since it's >>> tied to the position of invocation). I'm tempted to suggest >>> >>> for _ in prange(1): >>> ... >>> >>> as our syntax for single. >> >> I think that syntax is absolutely terrible :) Perhaps single is not so >> important and one can just use master instead (or, if really needed, >> master + a task with the actual work). >> > > with parallel.task(): > create a task to be executed by some thread in the team > once a thread takes up the task it shall only be executed by that > thread and no other thread (so the task will be tied to the thread) > > C variables will be firstprivate > Python objects will be shared > > parallel.taskwait() # wait on any direct descendent tasks to finish Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run. > > with parallel.critical(): > this section of code is mutually exclusive with other critical sections > optional keyword argument 'name' specifies a name for the critical > section, > which means all sections with that name will exclude each other, > but not > critical sections with different names > > Note: all threads that encounter the section will execute it, just > not at the same time >>> >>> Yes, this works well as a with-statement... >>> >>> ..except that it is slightly magic in that it binds to call position (unlike >>> anything in Python). I.e. this would be more "correct", or at least >>> Pythonic: >>> >>> with parallel.critical(__file__, __line__): >>> ... >>> >> >> I'm not entirely sure what you mean here. Critical is really about the >> block contained within, not about a position in a file. Not all >> threads have to encounter the critical region, and not specifying a >> name means you exclude with *all other* unnamed critical sections (not >> just this one). >> > > with parallel.barrier(): > all threads wait until everyone has reached the barrier > either no one or everyone should encounter the barrier > shared variables are flushed >>> >>> I have problems with requiring a noop with block... >>> >>> I'd much rather write >>> >>> parallel.barrier() >> >> Although in OpenMP it doesn't have any associated code, but we could >> give it those semantics: apply the barrier at the end of the block of >> code. The con is that the barrier is at the top while it only affects >> leaving the block, you would write: >> >> with parallel.barrier(): >> if rand() > .5: >> ... >> else: >> ... >> # the barrier is here >> >>> However, that ties a function call to the place of invocation, and suggests >>> that one could do >>> >>> if rand() > .5: >>> barrier() >>> else: >>> i += 3 >>> barrier() >>> >>> and have the same barrier in each case. Again, >>> >>> barrier(__file__, __line__) >>> >>> gets us purity at the cost of practicality. >> >> In this case (
[Cython] PyCon-DE wrap-up by Kay Hayen
Hi, Kay Hayen wrote a blog post about his view of the first PyCon-DE, including a bit on the discussions I had with him about Nuitka. http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/ It was interesting to see that Nuitka actually comes from the other side, meaning that it tries to be a pure Python compiler, but should at some point start to support (Python) type hints for the compiler. Cython made static types a language feature from the very beginning and is now fixing up the Python compatibility. So both systems will eventually become rather similar in what they achieve, with Cython being essentially a superset of the feature set of Nuitka due to its additional focus on talking to external libraries efficiently and supporting things like parallel loops or the PEP-3118 buffer interface. One of the impressions I took out of the technical discussions with Kay is that there isn't really a good reason why Cython should refuse to duplicate some of the inner mechanics of CPython for optimisation purposes. Nuitka appears to be somewhat more aggressive here, partly because Kay doesn't currently care all that much about portability (e.g. to Python 3). I was previously very opposed to that (you may remember my opposition to the list.pop() optimisation), but now I think that we have to fix up the generated code for each new major CPython release anyway, so it won't make a difference if we have to rework some more of the code because a bit of those inner workings changed. They sure won't change for released CPython versions anymore, and many implementation details are unlikely enough to change for years to come. It's good to continue to be considerate about such changes, but some of them may well bring another serious bit of performance without introducing real portability risks. Changes like the Unicode string restructuring in PEP-393 show that even relying on official and long standing parts of the C-API isn't enough to guarantee that code still works as expected in new releases, so we may just as well start digging deeper. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] PyCon-DE wrap-up by Kay Hayen
On 9 October 2011 18:35, Stefan Behnel wrote: > Hi, > > Kay Hayen wrote a blog post about his view of the first PyCon-DE, including > a bit on the discussions I had with him about Nuitka. > > http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/ > > It was interesting to see that Nuitka actually comes from the other side, > meaning that it tries to be a pure Python compiler, but should at some point > start to support (Python) type hints for the compiler. Cython made static > types a language feature from the very beginning and is now fixing up the > Python compatibility. So both systems will eventually become rather similar > in what they achieve, with Cython being essentially a superset of the > feature set of Nuitka due to its additional focus on talking to external > libraries efficiently and supporting things like parallel loops or the > PEP-3118 buffer interface. > > One of the impressions I took out of the technical discussions with Kay is > that there isn't really a good reason why Cython should refuse to duplicate > some of the inner mechanics of CPython for optimisation purposes. Nuitka > appears to be somewhat more aggressive here, partly because Kay doesn't > currently care all that much about portability (e.g. to Python 3). Interesting. What kind of (significant) optimizations could be made by duplicating code? Do you want to duplicate entire functions or do you want to inline parts of those? I actually think we should not get too tied to CPython, e.g. what if PyPy gets a CPython compatible API, or possibly a subset like PEP 384? > I was previously very opposed to that (you may remember my opposition to the > list.pop() optimisation), but now I think that we have to fix up the > generated code for each new major CPython release anyway, so it won't make a > difference if we have to rework some more of the code because a bit of those > inner workings changed. They sure won't change for released CPython versions > anymore, and many implementation details are unlikely enough to change for > years to come. It's good to continue to be considerate about such changes, > but some of them may well bring another serious bit of performance without > introducing real portability risks. Changes like the Unicode string > restructuring in PEP-393 show that even relying on official and long > standing parts of the C-API isn't enough to guarantee that code still works > as expected in new releases, so we may just as well start digging deeper. > > Stefan > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn wrote: >>> with parallel.single(): >>> same as master, except any thread may do the execution >>> >>> An optional keyword argument 'nowait' specifies whether there will be a >>> barrier at the end. The default is to wait. > > I like > > if parallel.is_master(): > ... > explicit_barrier_somehow() # see below > > better as a Pythonization. One could easily support is_master to be used in > other contexts as well, simply by assigning a status flag in the master > block. > > Using an if-test flows much better with Python I feel, but that naturally > lead to making the barrier explicit. But I like the barrier always being > explicit, rather than having it as a predicate on all the different > constructs like in OpenMP Personally, I think I'd prefer find context managers as a very readable way to deal with parallelism, similar to the "threading" module: http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 19:54, Jon Olav Vik wrote: > On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn > wrote: with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. >> >> I like >> >> if parallel.is_master(): >> ... >> explicit_barrier_somehow() # see below >> >> better as a Pythonization. One could easily support is_master to be used in >> other contexts as well, simply by assigning a status flag in the master >> block. >> >> Using an if-test flows much better with Python I feel, but that naturally >> lead to making the barrier explicit. But I like the barrier always being >> explicit, rather than having it as a predicate on all the different >> constructs like in OpenMP > > Personally, I think I'd prefer find context managers as a very > readable way to deal with parallelism, similar to the "threading" > module: > > http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > Yeah it makes a lot of sense for mutual exclusion, but 'master' really means "only the master thread executes this peace of code, even though other threads encounter the same code", which is more akin to 'if' than 'with'. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On Sun, Oct 9, 2011 at 9:01 PM, mark florisson wrote: > On 9 October 2011 19:54, Jon Olav Vik wrote: >> Personally, I think I'd prefer context managers as a very >> readable way to deal with parallelism > > Yeah it makes a lot of sense for mutual exclusion, but 'master' really > means "only the master thread executes this peace of code, even though > other threads encounter the same code", which is more akin to 'if' > than 'with'. I see your point. However, another similarity with "with" statements as an encapsulated "try..finally" is when there's a barrier at the end of the block. I can live with some magic if it saves me from having a boilerplate line of "barrier" everywhere 8-) ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] cython.parallel tasks, single, master, critical, barriers
On 9 October 2011 21:48, Jon Olav Vik wrote: > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson > wrote: >> On 9 October 2011 19:54, Jon Olav Vik wrote: >>> Personally, I think I'd prefer context managers as a very >>> readable way to deal with parallelism >> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really >> means "only the master thread executes this peace of code, even though >> other threads encounter the same code", which is more akin to 'if' >> than 'with'. > > I see your point. However, another similarity with "with" statements > as an encapsulated "try..finally" is when there's a barrier at the end > of the block. I can live with some magic if it saves me from having a > boilerplate line of "barrier" everywhere 8-) > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > Hm, indeed. I just noticed that unlike single constructs, master constructs don't have barriers. Both are also not allowed to be closely nested in worksharing constructs. I think the single directive is more useful with respect to tasks, e.g. have a single thread generate tasks and have other threads waiting at the barrier execute them. In that sense I suppose 'if parallel.is_master():' makes sense (no barrier, master thread) and 'with single():' (with barrier, any thread). We could still support single in prange though, if we simply have the master thread execute it ('if (omp_get_thread_num() == 0)') and put a barrier after the block. This makes me wonder what the point of master was supposed to be... ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel