On 12/06/2010 11:35 AM, Stefan Behnel wrote: > Vitja Makarov, 06.12.2010 10:56: > >> 2010/12/6 Stefan Behnel: >> >>> Vitja Makarov, 06.12.2010 09:56: >>> >>>> 2010/12/6 Stefan Behnel: >>>> >>>>> Vitja Makarov, 05.12.2010 20:12: >>>>> >>>>>> About temps: >>>>>> now temps are allocated in closure using declare_var() and it works fine. >>>>>> >>>>> I think that could seriously slow down things inside of the generator, >>>>> though. It means that it needs to jump through the closure indirection for >>>>> every little value that it keeps around temporarily, including lots of >>>>> intermediate C results. Also, I would expect that a lot more temps are >>>>> used >>>>> inside of the generator body than what actually needs to be kept alive >>>>> during yields, so a single yield inside of a lengthy generator body could >>>>> have a tremendous impact on the overall runtime and closure size. >>>>> >>>>> If you only store away the live temp values before a yield, you don't even >>>>> need to do any reference counting for them. It's just a quick bunch of C >>>>> assignments on yield and resume. That's about the smallest impact you can >>>>> get. >>>>> >>>>> BTW, if you change the object struct of the generator closure anyway, >>>>> maybe >>>>> a dynamically sized PyVarObject would be a way to incorporate the temps? >>>>> >>>> I don't understand why do you think about performance problems? >>>> >>>> It seems to me that if temps are stored in tuple (or another kind of >>>> object) you have to manully save/restore them, and this code will >>>> really slow down generator. >>>> >>> But it only has to be done when suspending/resuming, not every time a temp >>> is used. Given that there is quite some overhead involved in executing a >>> yield/resume cycle, it doesn't matter if we add a couple of fast C >>> assignments to that. >>> >>> >>>> On the other hand when all temps are declared in closure you just have >>>> to say "closure->pyx_temp_v1" >>>> >>>> Do you think that closure-> will be much slower then stack operations? >>>> >>> Yes, *much* slower. Temps are not meant to live on the stack. Most of the >>> time, they will just live in a CPU register, especially on x86_64 and other >>> architectures with plenty of registers. When you put them into the closure, >>> the C compiler must assume that it's an external operation when accessing >>> them and can't know that the next time it reads the variable it still has >>> the same value that it just wrote. That's kills tons of possible C >>> optimisations. >>> >> Hmm. It is not really necessery to check each for value each time, >> it's not defined as volatile. >> > My knowledge of C is admittedly limited. Why should the C compiler be > allowed to assume that a value stored somewhere in a foreign memory > location doesn't change its value between separate store and read operations? >
Unless you call another function, the C compiler will know just from the actions of the executing thread. (You can't use a variable/struct field to e.g. communicate between threads...unless perhaps if you mark it volatile, but one should use locking instead). But, of course, the closure may well call other functions in libraries where the C compiler has no access. So I believe Stefan is right. It corresponds to providing hints to the compiler that "no external code will modify this memory". Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
