On Wed, Jan 8, 2014 at 3:40 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Wed, Jan 8, 2014 at 12:13 PM, Julian Taylor > <jtaylor.deb...@googlemail.com> wrote: >> On 18.07.2013 15:36, Nathaniel Smith wrote: >>> On Wed, Jul 17, 2013 at 5:57 PM, Frédéric Bastien <no...@nouiz.org> wrote: >>>> On the usefulness of doing only 1 memory allocation, on our old gpu >>>> ndarray, >>>> we where doing 2 alloc on the GPU, one for metadata and one for data. I >>>> removed this, as this was a bottleneck. allocation on the CPU are faster >>>> the >>>> on the GPU, but this is still something that is slow except if you reuse >>>> memory. Do PyMem_Malloc, reuse previous small allocation? >>> >>> Yes, at least in theory PyMem_Malloc is highly-optimized for small >>> buffer re-use. (For requests >256 bytes it just calls malloc().) And >>> it's possible to define type-specific freelists; not sure if there's >>> any value in doing that for PyArrayObjects. See Objects/obmalloc.c in >>> the Python source tree. >> >> PyMem_Malloc is just a wrapper around malloc, so its only as optimized >> as the c library is (glibc is not good for small allocations). >> PyObject_Malloc uses a small object allocator for requests smaller 512 >> bytes (256 in python2). > > Right, I meant PyObject_Malloc of course. > >> I filed a pull request [0] replacing a few functions which I think are >> safe to convert to this API. The nditer allocation which is completely >> encapsulated and the construction of the scalar and array python objects >> which are deleted via the tp_free slot (we really should not support >> third party libraries using PyMem_Free on python objects without checks). >> >> This already gives up to 15% improvements for scalar operations compared >> to glibc 2.17 malloc. >> Do I understand the discussions here right that we could replace >> PyDimMem_NEW which is used for strides in PyArray with the small object >> allocation too? >> It would still allow swapping the stride buffer, but every application >> must then delete it with PyDimMem_FREE which should be a reasonable >> requirement. > > That sounds reasonable to me. > > If we wanted to get even more elaborate, we could by default stick the > shape/strides into the same allocation as the PyArrayObject, and then > defer allocating a separate buffer until someone actually calls > PyArray_Resize. (With a new flag, similar to OWNDATA, that tells us > whether we need to free the shape/stride buffer when deallocating the > array.) It's got to be a vanishingly small proportion of arrays where > PyArray_Resize is actually called, so for most arrays, this would let > us skip the allocation entirely, and the only cost would be that for > arrays where PyArray_Resize *is* called to add new dimensions, we'd > leave the original buffers sitting around until the array was freed, > wasting a tiny amount of memory. Given that no-one has noticed that > currently *every* array wastes 50% of this much memory (see upthread), > I doubt anyone will care...
Seam a good plan. When is it planed to remove the old interface? We can't do it before I think. Fred _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion