[Numpy-discussion] Re: advice for using datetime64 in C binding
Hello, How can I create np.dtype("datetime64[ms]") in C to get the coresponding PyArray_Descr ? Thank you by advance On Wed, 2022-01-19 at 10:08 +0100, Benoit Gschwind wrote: > Hello Sebastian, > > Thanks for the precision > > Best regards > > On Tue, 2022-01-18 at 11:52 -0600, Sebastian Berg wrote: > > On Tue, 2022-01-18 at 18:29 +0100, Benoit Gschwind wrote: > > > Hello Sebastian, > > > > > > Thanks for detail. > > > > > > The last call has the NPY_ARRAY_C_CONTIGUOUS and > > > NPY_ARRAY_ALIGNED, > > > thus I guess it can be no-op most of the time but for some > > > unknown > > > case > > > it's may be safer ? > > > > Ah, yes, you are right, and it should indeed by very quick, > > anyway. > > My > > guess is, that you can do _only_ the last call with the appropriate > > `datetime64[ms]` descriptor passed in. > > (Since whatever C-code that follows is allowed to work with > > datetime64 > > as if it were int64.) > > > > Cheers, > > > > Sebastian > > > > > > > > > > Best regards > > > > > > On Tue, 2022-01-18 at 09:22 -0600, Sebastian Berg wrote: > > > > On Tue, 2022-01-18 at 14:56 +0100, Benoit Gschwind wrote: > > > > > Hello, > > > > > > > > > > I using the following code: > > > > > > > > > > if (PyArray_TYPE(arr1) == NPY_DATETIME) { > > > > > // Ensure datetime64[ms] > > > > > auto tmp = > > > > > reinterpret_cast(PyObject_CallMethod(reinterp > > > > > re > > > > > t_ > > > > > ca > > > > > st > > > > > (arr1), "astype", "(s)", "datetime64[ms]")); > > > > > std::swap(arr1, tmp); > > > > > Py_XDECREF(tmp); > > > > > // Ensure integer > > > > > tmp = > > > > > reinterpret_cast(PyObject_CallMethod(reinterp > > > > > re > > > > > t_ > > > > > ca > > > > > st > > > > > (arr1), "astype", "(s)", "i8")); > > > > > std::swap(arr1, tmp); > > > > > Py_XDECREF(tmp); > > > > > tmp = > > > > > reinterpret_cast(PyArray_FromArray(arr1, > > > > > PyArray_DescrFromType(NPY_INT64), NPY_ARRAY_IN_ARRAY)); > > > > > std::swap(arr1, tmp); > > > > > Py_XDECREF(tmp); > > > > > } > > > > > > > > > > First, if something is wrong with my code let me known. Then > > > > > I > > > > > wonder > > > > > if I can have a safe shortcut to avoid converting datetime64 > > > > > to > > > > > i8. > > > > > I > > > > > guess the internal data of datetime64[ms] is i8 thus copying > > > > > and > > > > > casting the array may be avoided. > > > > > > > > Yes, you can assume datetime64 is stored as an i8 with the unit > > > > and > > > > possible byteswapping. Both of which, your initial cast will > > > > ensure. > > > > > > > > The internal data is i8, except for the special NaT value. > > > > > > > > Code-wise, you can avoid calling `astype`, but if you do (also > > > > in > > > > python), I suggest to pass `copy=False`, so that it does not > > > > copy > > > > if > > > > it > > > > clearly is not necessary (I am hoping to improve on the > > > > "clearly" > > > > here > > > > at some point). > > > > > > > > The last call again seems to be a no-op? Just the last call > > > > with > > > > the > > > > correct `datetime64[ms]` descriptor could be enough. > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > > > > > > > Thanks by advance > > > > > Best regards. > > > > > > > > > > > > > > > ___ > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > > To unsubscribe send an email to > > > > > numpy-discussion-le...@python.org > > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > > > > > ___ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: gschw...@gnu-log.net > > > > > > > > > ___ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sebast...@sipsolutions.net > > > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: gschw...@gnu-log.net > > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: gschw...@gnu-log.net ___
[Numpy-discussion] Re: Performance mystery
On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted wrote: > On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert > wrote: > >> Given that this seems to be Linux only, is this related to how glibc does >> large allocations (>128kB) using mmap()? >> >> https://stackoverflow.com/a/33131385 >> > > That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of `z` is > almost 4 MB, mmap machinery is probably getting involved here. Also, as > pages acquired via anonymous mmap are not actually allocated until you > access them the first time, that would explain that the first access is > slow. What puzzles me is that the timeit loops access `z` data 3*1 > times, which is plenty of time for doing the allocation (just should > require just a single iteration). > I think I have more evidence that what is happening here has to see of how the malloc mechanism works in Linux. I find the next explanation to be really good: https://sourceware.org/glibc/wiki/MallocInternals In addition, this excerpt of the mallopt manpage ( https://man7.org/linux/man-pages/man3/mallopt.3.html) is very significant: Note: Nowadays, glibc uses a dynamic mmap threshold by default. The initial value of the threshold is 128*1024, but when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted upward to the size of the freed block. When dynamic mmap thresholding is in effect, the threshold for trimming the heap is also dynamically adjusted to be twice the dynamic mmap threshold. Dynamic adjustment of the mmap threshold is disabled if any of the M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set. This description matches closely what is happening here: after `z` is freed (replaced by another random array in the second part of the calculation), then dynamic mmap threshold enters and the threshold is increased by 2x of the freed block (~4MB in this case), so for the second part, the program break (i.e. where the heap ends) is increased instead, which is faster because this memory does not need to be zeroed before use. Interestingly, the M_MMAP_THRESHOLD for system malloc can be set by using the MALLOC_MMAP_THRESHOLD_ environment variable. For example, the original times are: $ python mmap-numpy.py numpy version 1.20.3 635.4752 microseconds 635.8906 microseconds 636.0661 microseconds 144.7238 microseconds 143.9147 microseconds 144.0621 microseconds but if we enforce to always use mmap: $ MALLOC_MMAP_THRESHOLD_=0 python mmap-numpy.py numpy version 1.20.3 628.8890 microseconds 628.0965 microseconds 628.7590 microseconds 640.9369 microseconds 641.5104 microseconds 642.4027 microseconds so first and second parts executes at the same (slow) speed. And, if we set the threshold to be exactly 4 MB: $ MALLOC_MMAP_THRESHOLD_=4194304 python mmap-numpy.py numpy version 1.20.3 630.7381 microseconds 631.3634 microseconds 632.2200 microseconds 382.6925 microseconds 380.1790 microseconds 380.0340 microseconds we see how performance is increased for the second part (although that not as much as without specifying the threshold manually; probably this manual setting prevents other optimizations to quick in). As a final check, if we use other malloc systems, like the excellent mimalloc (https://github.com/microsoft/mimalloc), we can get really good performance for the two parts: $ LD_PRELOAD=/usr/local/lib/libmimalloc.so python mmap-numpy.py numpy version 1.20.3 147.5968 microseconds 146.9028 microseconds 147.1794 microseconds 148.0905 microseconds 147.7667 microseconds 147.5180 microseconds However, as this is avoiding the mmap() calls, this approach probably uses more memory, specially when large arrays need to be handled. All in all, this is testimonial of how much memory handling can affect performance in modern computers. Perhaps it is time for testing different memory allocation strategies in NumPy and come up with suggestions for users. Francesc > > >> >> >> On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg < >> sebast...@sipsolutions.net> wrote: >> >>> On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote: >>> > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt >>> > >>> > wrote: >>> > >>> > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote: >>> > > > expr = 'z.real**2 + z.imag**2' >>> > > > >>> > > > z = generate_sample(n, rng) >>> > > >>> > > 🤔 If I duplicate the `z = ...` line, I get the fast result >>> > > throughout. >>> > > If, however, I use `generate_sample(1, rng)` (or any other value >>> > > than `n`), >>> > > it does not improve matters. >>> > > >>> > > Could this be a memory caching issue? >>> > > >>> >>> Yes, it is a caching issue for sure. We have seen similar random >>> fluctuations before. >>> You can proof that it is a
[Numpy-discussion] Re: Performance mystery
On Thu, 2022-01-20 at 14:41 +0100, Francesc Alted wrote: > On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted > wrote: > > > On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert > > > > wrote: > > > > > Given that this seems to be Linux only, is this related to how > > > glibc does > > > large allocations (>128kB) using mmap()? > > > > > > https://stackoverflow.com/a/33131385 > > > > > > > That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of > > `z` is > > almost 4 MB, mmap machinery is probably getting involved here. > > Also, as > > pages acquired via anonymous mmap are not actually allocated until > > you > > access them the first time, that would explain that the first > > access is > > slow. What puzzles me is that the timeit loops access `z` data > > 3*1 > > times, which is plenty of time for doing the allocation (just > > should > > require just a single iteration). > > > > I think I have more evidence that what is happening here has to see > of how > the malloc mechanism works in Linux. I find the next explanation to > be > really good: > > https://sourceware.org/glibc/wiki/MallocInternals > Thanks for figuring this out! It has been bugging me a lot before. So it rather depends on how `malloc` works, and not the kernel. It is surprising how "random" this can look, but I suppose some examples just happen to sit almost exactly at the threshold. It might be interesting if we could tweak `mallopt` parameters for typical NumPy usage. But unless it is very clear, maybe a standalone module is better? > In addition, this excerpt of the mallopt manpage ( > https://man7.org/linux/man-pages/man3/mallopt.3.html) is very > significant: > All in all, this is testimonial of how much memory handling can > affect > performance in modern computers. Perhaps it is time for testing > different > memory allocation strategies in NumPy and come up with suggestions > for > users. You are probably aware, but Matti and Elias now added the ability to customize array data allocation in NumPy, so it should be straight forward to write a small package/module that tweaks the allocation strategy here. Cheers, Sebastian > > Francesc > > > > > > > > > > > > > > > > On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg < > > > sebast...@sipsolutions.net> wrote: > > > > > > > On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote: > > > > > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt > > > > > > > > > > wrote: > > > > > > > > > > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote: > > > > > > > expr = 'z.real**2 + z.imag**2' > > > > > > > > > > > > > > z = generate_sample(n, rng) > > > > > > > > > > > > 🤔 If I duplicate the `z = ...` line, I get the fast result > > > > > > throughout. > > > > > > If, however, I use `generate_sample(1, rng)` (or any other > > > > > > value > > > > > > than `n`), > > > > > > it does not improve matters. > > > > > > > > > > > > Could this be a memory caching issue? > > > > > > > > > > > > > > Yes, it is a caching issue for sure. We have seen similar > > > > random > > > > fluctuations before. > > > > You can proof that it is a cache page-fault issue by running it > > > > with > > > > `perf --stat`. I did this twice, once with the second loop > > > > removed > > > > (page-faults only): > > > > > > > > 28333629 page-faults # 936.234 K/sec > > > > 28362718 page-faults # 1.147 M/sec > > > > > > > > The number of page faults is low. Running only the second one > > > > (or > > > > running the first one only once, rather), gave me: > > > > > > > > 15024 page-faults # 1.837 K/sec > > > > > > > > So that is the *reason*. I had before tried to figure out why > > > > the page > > > > faults differ too much, or if we can do something about it. > > > > But I > > > > never had any reasonable lead on it. > > > > > > > > In general, these fluctuations are pretty random, in the sense > > > > that > > > > unrelated code changes and recompilation can swap the behaviour > > > > easily. > > > > As Andras noted in that he sees the opposite. > > > > > > > > I would love to have an idea if there is a way to figure out > > > > why the > > > > page-faults are so imbalanced between the two. > > > > > > > > (I have not looked at CPU cache misses this time, but since > > > > page-faults > > > > happen, I assume that should not matter?) > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > > > > > > > I can also reproduce that, but only on my Linux boxes. My > > > > > MacMini > > > > > does not > > > > > notice the difference. > > > > > > > > > > Interestingly enough, you don't even need an additional call > > > > > to > > > > > `generate_sample(n, rng)`. If one use `z = np.empty(...)` and > > > > > then do > > > > > an > > > > > assignment, like: > > > > > > > > > > z = np.empty(n, dtype=np.complex128) > > > > > z[:] = generate_sample(n, rng) > > > > > > > > > > then
[Numpy-discussion] Re: Performance mystery
On Thu, Jan 20, 2022 at 4:10 PM Sebastian Berg wrote: > Thanks for figuring this out! It has been bugging me a lot before. So > it rather depends on how `malloc` works, and not the kernel. It is > surprising how "random" this can look, but I suppose some examples just > happen to sit almost exactly at the threshold. > Also, one should be aware that temporaries can appear in many situations in NumPy, it is not exactly easy to predict when this is going to bit you. In Warren's example he already noticed that the difference in performance was only noticeable when a complex expression was used ('z.real**2 + z.imag**2', so with temporaries), whereas a simple expression ('z.real + z.imag', no temporaries) did not trigger the behavior. > > It might be interesting if we could tweak `mallopt` parameters for > typical NumPy usage. But unless it is very clear, maybe a standalone > module is better? > Well, I can see `mallopt` handling being useful just for NumPy purposes, but it is up to you to decide where to put this logic. > All in all, this is testimonial of how much memory handling can > > affect > > performance in modern computers. Perhaps it is time for testing > > different > > memory allocation strategies in NumPy and come up with suggestions > > for > > users. > > You are probably aware, but Matti and Elias now added the ability to > customize array data allocation in NumPy, so it should be straight > forward to write a small package/module that tweaks the allocation > strategy here. > That's good to hear. Cheers, -- Francesc Alted ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Fit 2D points to closed curve
Hi, I use a CAD package called Rhino which lets users write python scripts that run in the cad environment. I have a closed curve that is a design surface border and I have a sparse data set of 2D surveyed points taken around that border, so some error is included in the 2D data set. I would like to know what python function I could use to do a 2D best fit of the points to the curve. Plenty of examples of fitting curves to points but not of best fitting points to a closed curve of which I do not know the formula. Any help or advice greatly appreciated. ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Fit 2D points to closed curve
On Thu, Jan 20, 2022 at 1:15 PM wrote: > Hi, > I use a CAD package called Rhino which lets users write python scripts > that run in the cad environment. I have a closed curve that is a design > surface border and I have a sparse data set of 2D surveyed points taken > around that border, so some error is included in the 2D data set. I would > like to know what python function I could use to do a 2D best fit of the > points to the curve. > Plenty of examples of fitting curves to points but not of best fitting > points to a closed curve of which I do not know the formula. > You can use a parametric spline (where X and Y are each functions of a "time" coordinate `t`) with periodic boundary conditions. See this StackOverflow answer for an example using scipy.interpolate: https://stackoverflow.com/a/31466013/163633 Whether our spline representation can be converted to Rhino NURBS or if Rhino itself can do a periodic parametric NURBS and do the fitting itself, I don't know. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Fit 2D points to closed curve
On Thu, Jan 20, 2022 at 11:28 AM Robert Kern wrote: > On Thu, Jan 20, 2022 at 1:15 PM wrote: > >> Hi, >> I use a CAD package called Rhino which lets users write python scripts >> that run in the cad environment. I have a closed curve that is a design >> surface border and I have a sparse data set of 2D surveyed points taken >> around that border, so some error is included in the 2D data set. I would >> like to know what python function I could use to do a 2D best fit of the >> points to the curve. >> Plenty of examples of fitting curves to points but not of best fitting >> points to a closed curve of which I do not know the formula. >> > > You can use a parametric spline (where X and Y are each functions of a > "time" coordinate `t`) with periodic boundary conditions. See this > StackOverflow answer for an example using scipy.interpolate: > > https://stackoverflow.com/a/31466013/163633 > > Whether our spline representation can be converted to Rhino NURBS or if > Rhino itself can do a periodic parametric NURBS and do the fitting itself, > I don't know. > > -- > Robert Kern > Fitting curve is choosing a family of parametrized curves and picking the best one in the family. The number of parameters should not exceed the number of data points. Picking that family is a tradeoff between expectation (theory) and convenience. The splines Robert mentions are one family of curves which have nice properties and are good when there is little theory. If you know what the curve *should* be, you can try adding perturbations, i.e., effectively fitting the deviations. It is a bit of an art. Chuck ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Fit 2D points to closed curve
Hi Roger, On Thu, Jan 20, 2022, at 07:13, roger.dav...@geo-spatial.co.uk wrote: > I use a CAD package called Rhino which lets users write python scripts > that run in the cad environment. I have a closed curve that is a design > surface border and I have a sparse data set of 2D surveyed points taken > around that border, so some error is included in the 2D data set. I > would like to know what python function I could use to do a 2D best fit > of the points to the curve. This isn't exactly what you asked for, but in case you are looking to refine these points into a smoother, I've found subdivision splines very handy. To get the boundary conditions right is usually just a matter of repeating the start and end points enough times to cover the subdivision mask. Here is an example implementation of Catmul-Clark subdivision: https://github.com/matplotlib/viscm/blob/master/viscm/bezierbuilder.py#L323 You can choose subdivision coefficients so that the resulting curve goes exactly through your data points or not. Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com