Re: [Numpy-discussion] automatically avoiding temporary arrays
All, On Wed, Oct 5, 2016 at 11:46 AM, Francesc Alted wrote: > 2016-10-05 8:45 GMT+02:00 srean : > >> Good discussion, but was surprised by the absence of numexpr in the >> discussion., given how relevant it (numexpr) is to the topic. >> >> Is the goal to fold in the numexpr functionality (and beyond) into Numpy ? >> > > Yes, the question about merging numexpr into numpy has been something that > periodically shows up in this list. I think mostly everyone agree that it > is a good idea, but things are not so easy, and so far nobody provided a > good patch for this. Also, the fact that numexpr relies on grouping an > expression by using a string (e.g. (y = ne.evaluate("x**3 + tanh(x**2) + > 4")) does not play well with the way in that numpy evaluates expressions, > so something should be suggested to cope with this too. > As Francesc said, Numexpr is going to get most of its power through grouping a series of operations so it can send blocks to the CPU cache and run the entire series of operations on the cache before returning the block to system memory. If it was just used to back-end NumPy, it would only gain from the multi-threading portion inside each function call. I'm not sure how one would go about grouping successive numpy expressions without modifying the Python interpreter? I put a bit of effort into extending numexpr to use 4-byte word opcodes instead of 1-byte. Progress has been very slow, however, due to time constraints, but I have most of the numpy data types (u[1-4], i[1-4], f[4,8], c[8,16], S[1-4], U[1-4]). On Tuesday I finished writing a Python generator script that writes all the C-side opcode macros for opcodes.hpp. Now I have about 900 opcodes, and this could easily grow into thousands if more functions are added, so I also built a reverse lookup tree (based on collections.defaultdict) for the Python-side of numexpr. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch robbmcl...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
Thanks Francesc, Robert for giving me a broader picture of where this fits in. I believe numexpr does not handle slicing, so that might be another thing to look at. On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod wrote: > > As Francesc said, Numexpr is going to get most of its power through > grouping a series of operations so it can send blocks to the CPU cache and > run the entire series of operations on the cache before returning the block > to system memory. If it was just used to back-end NumPy, it would only > gain from the multi-threading portion inside each function call. > Is that so ? I thought numexpr also cuts down on number of temporary buffers that get filled (in other words copy operations) if the same expression was written as series of operations. My understanding can be wrong, and would appreciate correction. The 'out' parameter in ufuncs can eliminate extra temporaries but its not composable. Right now I have to manually carry along the array where the in place operations take place. I think the goal here is to eliminate that. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] automatically avoiding temporary arrays
On Wed, Oct 5, 2016 at 1:11 PM, srean wrote: > Thanks Francesc, Robert for giving me a broader picture of where this fits > in. I believe numexpr does not handle slicing, so that might be another > thing to look at. > Dereferencing would be relatively simple to add into numexpr, as it would just be some getattr() calls. Personally I will add that at some point because it will clean up my code. Slicing, maybe only for continuous blocks in memory? I.e. imageStack[0,:,:] would be possible, but imageStack[:, ::2, ::2] would not be trivial (I think...). I seem to remember someone asked David Cooke about slicing and he said something along the lines of, "that's what Numba is for." Perhaps NumPy backended by Numba is more so what you are looking for, as it hooks into the byte compiler? The main advantage of numexpr is that a series of numpy functions in can be enclosed in ne.evaluate( "" ) and it provides a big acceleration for little programmer effort, but it's not nearly as sophisticated as Numba or PyPy. > On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod > wrote: > >> >> As Francesc said, Numexpr is going to get most of its power through >> grouping a series of operations so it can send blocks to the CPU cache and >> run the entire series of operations on the cache before returning the block >> to system memory. If it was just used to back-end NumPy, it would only >> gain from the multi-threading portion inside each function call. >> > > Is that so ? > > I thought numexpr also cuts down on number of temporary buffers that get > filled (in other words copy operations) if the same expression was written > as series of operations. My understanding can be wrong, and would > appreciate correction. > > The 'out' parameter in ufuncs can eliminate extra temporaries but its not > composable. Right now I have to manually carry along the array where the in > place operations take place. I think the goal here is to eliminate that. > The numexpr virtual machine does create temporaries where needed when it parses the abstract syntax tree for all the operations it has to do. I believe the main advantage is that the temporaries are created on the CPU cache, and not in system memory. It's certainly true that numexpr doesn't create a lot of OP_COPY operations, rather it's optimized to minimize them, so probably it's fewer ops than naive successive calls to numpy within python, but I'm unsure if there's any difference in operation count between a hand-optimized numpy with out= set and numexpr. Numexpr just does it for you. This blog post from Tim Hochberg is useful for understanding the performance advantages of blocking versus multithreading: http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/ Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universität Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcl...@unibas.ch robert.mcl...@bsse.ethz.ch robbmcl...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion