[Numpy-discussion] Re: advice for using datetime64 in C binding

2022-01-20 Thread Benoit Gschwind
Hello,

How can I create np.dtype("datetime64[ms]") in C to get the
coresponding PyArray_Descr ?

Thank you by advance

On Wed, 2022-01-19 at 10:08 +0100, Benoit Gschwind wrote:
> Hello Sebastian,
> 
> Thanks for the precision 
> 
> Best regards
> 
> On Tue, 2022-01-18 at 11:52 -0600, Sebastian Berg wrote:
> > On Tue, 2022-01-18 at 18:29 +0100, Benoit Gschwind wrote:
> > > Hello Sebastian,
> > > 
> > > Thanks for detail.
> > > 
> > > The last call has the NPY_ARRAY_C_CONTIGUOUS and
> > > NPY_ARRAY_ALIGNED,
> > > thus I guess it can be no-op most of the time but for some
> > > unknown
> > > case
> > > it's may be safer ?
> > 
> > Ah, yes, you are right, and it should indeed by very quick,
> > anyway. 
> > My
> > guess is, that you can do _only_ the last call with the appropriate
> > `datetime64[ms]` descriptor passed in.
> > (Since whatever C-code that follows is allowed to work with
> > datetime64
> > as if it were int64.)
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > > 
> > > Best regards
> > > 
> > > On Tue, 2022-01-18 at 09:22 -0600, Sebastian Berg wrote:
> > > > On Tue, 2022-01-18 at 14:56 +0100, Benoit Gschwind wrote:
> > > > > Hello,
> > > > > 
> > > > > I using the following code:
> > > > > 
> > > > > if (PyArray_TYPE(arr1) == NPY_DATETIME) {
> > > > > // Ensure datetime64[ms]
> > > > > auto tmp =
> > > > > reinterpret_cast(PyObject_CallMethod(reinterp
> > > > > re
> > > > > t_
> > > > > ca
> > > > > st
> > > > > (arr1), "astype", "(s)", "datetime64[ms]"));
> > > > > std::swap(arr1, tmp);
> > > > > Py_XDECREF(tmp);
> > > > > // Ensure integer
> > > > > tmp =
> > > > > reinterpret_cast(PyObject_CallMethod(reinterp
> > > > > re
> > > > > t_
> > > > > ca
> > > > > st
> > > > > (arr1), "astype", "(s)", "i8"));
> > > > > std::swap(arr1, tmp);
> > > > > Py_XDECREF(tmp);
> > > > > tmp =
> > > > > reinterpret_cast(PyArray_FromArray(arr1,
> > > > > PyArray_DescrFromType(NPY_INT64), NPY_ARRAY_IN_ARRAY));
> > > > > std::swap(arr1, tmp);
> > > > > Py_XDECREF(tmp);
> > > > > }
> > > > > 
> > > > > First, if something is wrong with my code let me known. Then
> > > > > I
> > > > > wonder
> > > > > if I can have a safe shortcut to avoid converting datetime64
> > > > > to
> > > > > i8.
> > > > > I
> > > > > guess the internal data of datetime64[ms] is i8 thus copying
> > > > > and
> > > > > casting the array may be avoided.
> > > > 
> > > > Yes, you can assume datetime64 is stored as an i8 with the unit
> > > > and
> > > > possible byteswapping.  Both of which, your initial cast will
> > > > ensure.
> > > > 
> > > > The internal data is i8, except for the special NaT value.
> > > > 
> > > > Code-wise, you can avoid calling `astype`, but if you do (also
> > > > in
> > > > python), I suggest to pass `copy=False`, so that it does not
> > > > copy
> > > > if
> > > > it
> > > > clearly is not necessary (I am hoping to improve on the
> > > > "clearly"
> > > > here
> > > > at some point).
> > > > 
> > > > The last call again seems to be a no-op?  Just the last call
> > > > with
> > > > the
> > > > correct `datetime64[ms]` descriptor could be enough.
> > > > 
> > > > Cheers,
> > > > 
> > > > Sebastian
> > > >  
> > > > 
> > > > > 
> > > > > Thanks by advance
> > > > > Best regards.
> > > > > 
> > > > > 
> > > > > ___
> > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > > To unsubscribe send an email to
> > > > > numpy-discussion-le...@python.org
> > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > > Member address: sebast...@sipsolutions.net
> > > > > 
> > > > 
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to
> > > > numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: gschw...@gnu-log.net
> > > 
> > > 
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: sebast...@sipsolutions.net
> > 
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: gschw...@gnu-log.net
> 
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: gschw...@gnu-log.net


___

[Numpy-discussion] Re: Performance mystery

2022-01-20 Thread Francesc Alted
On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted  wrote:

> On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert 
> wrote:
>
>> Given that this seems to be Linux only, is this related to how glibc does
>> large allocations (>128kB) using mmap()?
>>
>> https://stackoverflow.com/a/33131385
>>
>
> That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of `z` is
> almost 4 MB, mmap machinery is probably getting involved here.  Also, as
> pages acquired via anonymous mmap are not actually allocated until you
> access them the first time, that would explain that the first access is
> slow.  What puzzles me is that the timeit loops access `z` data 3*1
> times, which is plenty of time for doing the allocation (just should
> require just a single iteration).
>

I think I have more evidence that what is happening here has to see of how
the malloc mechanism works in Linux.  I find the next explanation to be
really good:

https://sourceware.org/glibc/wiki/MallocInternals

In addition, this excerpt of the mallopt manpage (
https://man7.org/linux/man-pages/man3/mallopt.3.html) is very significant:

  Note: Nowadays, glibc uses a dynamic mmap threshold by
  default.  The initial value of the threshold is 128*1024,
  but when blocks larger than the current threshold and less
  than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the
  threshold is adjusted upward to the size of the freed
  block.  When dynamic mmap thresholding is in effect, the
  threshold for trimming the heap is also dynamically
  adjusted to be twice the dynamic mmap threshold.  Dynamic
  adjustment of the mmap threshold is disabled if any of the
  M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or
  M_MMAP_MAX parameters is set.

This description matches closely what is happening here: after `z` is freed
(replaced by another random array in the second part of the calculation),
then dynamic mmap threshold enters and the threshold is increased by 2x of
the freed block (~4MB in this case), so for the second part, the program break
(i.e. where the heap ends) is increased instead, which is faster because
this memory does not need to be zeroed before use.

Interestingly, the M_MMAP_THRESHOLD for system malloc can be set by using
the MALLOC_MMAP_THRESHOLD_ environment variable.  For example, the original
times are:

$ python mmap-numpy.py
numpy version 1.20.3

 635.4752 microseconds
 635.8906 microseconds
 636.0661 microseconds

 144.7238 microseconds
 143.9147 microseconds
 144.0621 microseconds

but if we enforce to always use mmap:

$ MALLOC_MMAP_THRESHOLD_=0 python mmap-numpy.py
numpy version 1.20.3

 628.8890 microseconds
 628.0965 microseconds
 628.7590 microseconds

 640.9369 microseconds
 641.5104 microseconds
 642.4027 microseconds

so first and second parts executes at the same (slow) speed.  And, if we
set the threshold to be exactly 4 MB:

$ MALLOC_MMAP_THRESHOLD_=4194304 python mmap-numpy.py
numpy version 1.20.3

 630.7381 microseconds
 631.3634 microseconds
 632.2200 microseconds

 382.6925 microseconds
 380.1790 microseconds
 380.0340 microseconds

we see how performance is increased for the second part (although that not
as much as without specifying the threshold manually; probably this manual
setting prevents other optimizations to quick in).

As a final check, if we use other malloc systems, like the excellent
mimalloc (https://github.com/microsoft/mimalloc), we can get really good
performance for the two parts:

$ LD_PRELOAD=/usr/local/lib/libmimalloc.so  python mmap-numpy.py
numpy version 1.20.3

 147.5968 microseconds
 146.9028 microseconds
 147.1794 microseconds

 148.0905 microseconds
 147.7667 microseconds
 147.5180 microseconds

However, as this is avoiding the mmap() calls, this approach probably uses
more memory, specially when large arrays need to be handled.

All in all, this is testimonial of how much memory handling can affect
performance in modern computers.  Perhaps it is time for testing different
memory allocation strategies in NumPy and come up with suggestions for
users.

Francesc



>
>
>>
>>
>> On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>> On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote:
>>> > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt
>>> > 
>>> > wrote:
>>> >
>>> > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote:
>>> > > > expr = 'z.real**2 + z.imag**2'
>>> > > >
>>> > > > z = generate_sample(n, rng)
>>> > >
>>> > > 🤔 If I duplicate the `z = ...` line, I get the fast result
>>> > > throughout.
>>> > > If, however, I use `generate_sample(1, rng)` (or any other value
>>> > > than `n`),
>>> > > it does not improve matters.
>>> > >
>>> > > Could this be a memory caching issue?
>>> > >
>>>
>>> Yes, it is a caching issue for sure.  We have seen similar random
>>> fluctuations before.
>>> You can proof that it is a

[Numpy-discussion] Re: Performance mystery

2022-01-20 Thread Sebastian Berg
On Thu, 2022-01-20 at 14:41 +0100, Francesc Alted wrote:
> On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted 
> wrote:
> 
> > On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert
> > 
> > wrote:
> > 
> > > Given that this seems to be Linux only, is this related to how
> > > glibc does
> > > large allocations (>128kB) using mmap()?
> > > 
> > > https://stackoverflow.com/a/33131385
> > > 
> > 
> > That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of
> > `z` is
> > almost 4 MB, mmap machinery is probably getting involved here. 
> > Also, as
> > pages acquired via anonymous mmap are not actually allocated until
> > you
> > access them the first time, that would explain that the first
> > access is
> > slow.  What puzzles me is that the timeit loops access `z` data
> > 3*1
> > times, which is plenty of time for doing the allocation (just
> > should
> > require just a single iteration).
> > 
> 
> I think I have more evidence that what is happening here has to see
> of how
> the malloc mechanism works in Linux.  I find the next explanation to
> be
> really good:
> 
> https://sourceware.org/glibc/wiki/MallocInternals
> 

Thanks for figuring this out!  It has been bugging me a lot before.  So
it rather depends on how `malloc` works, and not the kernel.  It is
surprising how "random" this can look, but I suppose some examples just
happen to sit almost exactly at the threshold.

It might be interesting if we could tweak `mallopt` parameters for
typical NumPy usage.  But unless it is very clear, maybe a standalone
module is better?


> In addition, this excerpt of the mallopt manpage (
> https://man7.org/linux/man-pages/man3/mallopt.3.html) is very
> significant:




> All in all, this is testimonial of how much memory handling can
> affect
> performance in modern computers.  Perhaps it is time for testing
> different
> memory allocation strategies in NumPy and come up with suggestions
> for
> users.

You are probably aware, but Matti and Elias now added the ability to
customize array data allocation in NumPy, so it should be straight
forward to write a small package/module that tweaks the allocation
strategy here.

Cheers,

Sebastian


> 
> Francesc
> 
> 
> 
> > 
> > 
> > > 
> > > 
> > > On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg <
> > > sebast...@sipsolutions.net> wrote:
> > > 
> > > > On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote:
> > > > > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote:
> > > > > > > expr = 'z.real**2 + z.imag**2'
> > > > > > > 
> > > > > > > z = generate_sample(n, rng)
> > > > > > 
> > > > > > 🤔 If I duplicate the `z = ...` line, I get the fast result
> > > > > > throughout.
> > > > > > If, however, I use `generate_sample(1, rng)` (or any other
> > > > > > value
> > > > > > than `n`),
> > > > > > it does not improve matters.
> > > > > > 
> > > > > > Could this be a memory caching issue?
> > > > > > 
> > > > 
> > > > Yes, it is a caching issue for sure.  We have seen similar
> > > > random
> > > > fluctuations before.
> > > > You can proof that it is a cache page-fault issue by running it
> > > > with
> > > > `perf --stat`.  I did this twice, once with the second loop
> > > > removed
> > > > (page-faults only):
> > > > 
> > > > 28333629  page-faults   #  936.234 K/sec
> > > > 28362718  page-faults   #    1.147 M/sec
> > > > 
> > > > The number of page faults is low.  Running only the second one
> > > > (or
> > > > running the first one only once, rather), gave me:
> > > > 
> > > > 15024  page-faults   #    1.837 K/sec
> > > > 
> > > > So that is the *reason*.  I had before tried to figure out why
> > > > the page
> > > > faults differ too much, or if we can do something about it. 
> > > > But I
> > > > never had any reasonable lead on it.
> > > > 
> > > > In general, these fluctuations are pretty random, in the sense
> > > > that
> > > > unrelated code changes and recompilation can swap the behaviour
> > > > easily.
> > > > As Andras noted in that he sees the opposite.
> > > > 
> > > > I would love to have an idea if there is a way to figure out
> > > > why the
> > > > page-faults are so imbalanced between the two.
> > > > 
> > > > (I have not looked at CPU cache misses this time, but since
> > > > page-faults
> > > > happen, I assume that should not matter?)
> > > > 
> > > > Cheers,
> > > > 
> > > > Sebastian
> > > > 
> > > > 
> > > > > 
> > > > > I can also reproduce that, but only on my Linux boxes.  My
> > > > > MacMini
> > > > > does not
> > > > > notice the difference.
> > > > > 
> > > > > Interestingly enough, you don't even need an additional call
> > > > > to
> > > > > `generate_sample(n, rng)`. If one use `z = np.empty(...)` and
> > > > > then do
> > > > > an
> > > > > assignment, like:
> > > > > 
> > > > > z = np.empty(n, dtype=np.complex128)
> > > > > z[:] = generate_sample(n, rng)
> > > > > 
> > > > > then

[Numpy-discussion] Re: Performance mystery

2022-01-20 Thread Francesc Alted
On Thu, Jan 20, 2022 at 4:10 PM Sebastian Berg 
wrote:


> Thanks for figuring this out!  It has been bugging me a lot before.  So
> it rather depends on how `malloc` works, and not the kernel.  It is
> surprising how "random" this can look, but I suppose some examples just
> happen to sit almost exactly at the threshold.
>

Also, one should be aware that temporaries can appear in many situations in
NumPy, it is not exactly easy to predict when this is going to bit you.  In
Warren's example he already noticed that the difference in performance was
only noticeable when a complex expression was used ('z.real**2 +
z.imag**2', so with temporaries), whereas a simple expression ('z.real +
z.imag', no temporaries) did not trigger the behavior.


>
> It might be interesting if we could tweak `mallopt` parameters for
> typical NumPy usage.  But unless it is very clear, maybe a standalone
> module is better?
>

Well, I can see `mallopt` handling being useful just for NumPy purposes,
but it is up to you to decide where to put this logic.

> All in all, this is testimonial of how much memory handling can
> > affect
> > performance in modern computers.  Perhaps it is time for testing
> > different
> > memory allocation strategies in NumPy and come up with suggestions
> > for
> > users.
>
> You are probably aware, but Matti and Elias now added the ability to
> customize array data allocation in NumPy, so it should be straight
> forward to write a small package/module that tweaks the allocation
> strategy here.
>

That's good to hear.

Cheers,
-- 
Francesc Alted
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Fit 2D points to closed curve

2022-01-20 Thread roger . davies
Hi,
I use a CAD package called Rhino which lets users write python scripts that run 
in the cad environment. I have a closed curve that is a design surface border 
and I have a sparse data set of 2D surveyed points taken around that border, so 
some error is included in the 2D data set. I would like to know what python 
function I could use to do a 2D best fit of the points to the curve.
Plenty of examples of fitting curves to points but not of best fitting points 
to a closed curve of which I do not know the formula.
Any help or advice greatly appreciated.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fit 2D points to closed curve

2022-01-20 Thread Robert Kern
On Thu, Jan 20, 2022 at 1:15 PM  wrote:

> Hi,
> I use a CAD package called Rhino which lets users write python scripts
> that run in the cad environment. I have a closed curve that is a design
> surface border and I have a sparse data set of 2D surveyed points taken
> around that border, so some error is included in the 2D data set. I would
> like to know what python function I could use to do a 2D best fit of the
> points to the curve.
> Plenty of examples of fitting curves to points but not of best fitting
> points to a closed curve of which I do not know the formula.
>

You can use a parametric spline (where X and Y are each functions of a
"time" coordinate `t`) with periodic boundary conditions. See this
StackOverflow answer for an example using scipy.interpolate:

https://stackoverflow.com/a/31466013/163633

Whether our spline representation can be converted to Rhino NURBS or if
Rhino itself can do a periodic parametric NURBS and do the fitting itself,
I don't know.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fit 2D points to closed curve

2022-01-20 Thread Charles R Harris
On Thu, Jan 20, 2022 at 11:28 AM Robert Kern  wrote:

> On Thu, Jan 20, 2022 at 1:15 PM  wrote:
>
>> Hi,
>> I use a CAD package called Rhino which lets users write python scripts
>> that run in the cad environment. I have a closed curve that is a design
>> surface border and I have a sparse data set of 2D surveyed points taken
>> around that border, so some error is included in the 2D data set. I would
>> like to know what python function I could use to do a 2D best fit of the
>> points to the curve.
>> Plenty of examples of fitting curves to points but not of best fitting
>> points to a closed curve of which I do not know the formula.
>>
>
> You can use a parametric spline (where X and Y are each functions of a
> "time" coordinate `t`) with periodic boundary conditions. See this
> StackOverflow answer for an example using scipy.interpolate:
>
> https://stackoverflow.com/a/31466013/163633
>
> Whether our spline representation can be converted to Rhino NURBS or if
> Rhino itself can do a periodic parametric NURBS and do the fitting itself,
> I don't know.
>
> --
> Robert Kern
>

Fitting curve is choosing a family of parametrized curves and picking the
best one in the family. The number of parameters should not exceed the
number of data points. Picking that family is a tradeoff between
expectation (theory) and convenience. The splines Robert mentions are one
family of curves which have nice properties and are good when there is
little theory. If you know what the curve *should* be, you can try adding
perturbations, i.e., effectively fitting the deviations. It is a bit of an
art.

Chuck
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fit 2D points to closed curve

2022-01-20 Thread Stefan van der Walt
Hi Roger,

On Thu, Jan 20, 2022, at 07:13, roger.dav...@geo-spatial.co.uk wrote:
> I use a CAD package called Rhino which lets users write python scripts 
> that run in the cad environment. I have a closed curve that is a design 
> surface border and I have a sparse data set of 2D surveyed points taken 
> around that border, so some error is included in the 2D data set. I 
> would like to know what python function I could use to do a 2D best fit 
> of the points to the curve.

This isn't exactly what you asked for, but in case you are looking to refine 
these points into a smoother, I've found subdivision splines very handy.  To 
get the boundary conditions right is usually just a matter of repeating the 
start and end points enough times to cover the subdivision mask. 

Here is an example implementation of Catmul-Clark subdivision: 
https://github.com/matplotlib/viscm/blob/master/viscm/bezierbuilder.py#L323

You can choose subdivision coefficients so that the resulting curve goes 
exactly through your data points or not.

Best regards,
Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com