[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Ronald van Elburg
I think ultimately the copy is unnecessary.

That being said introducing prepend and append functions concentrates the 
complexity of the mapping in one place. Trying to avoid the extra copy would 
probably lead to a more complex implementation of accumulate.  

How would in your view the prepend interface differ from concatenation or 
stacking?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-08 Thread Ronald van Elburg
Our Numpy arrays are pickled when they are transported over Pipes between 
Processors (using multiprocessing). Just to point out that there uses of 
pickling not involving files. Would that affect your analysis?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-08 Thread Ronald van Elburg
If needed I can try to construct a minimal example for testing purposes.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Ronald van Elburg
OK. Then we will just weight for 2.x and test then.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-10 Thread Ronald van Elburg
I have one more useCase to consider from our ecosystem. 

We dump numpy arrays into a MongoDB using GridFS for subsequent visualization, 
some snippets:

'''Python
with BytesIO() as BIO:
np.save(BIO, numpy_array)
serialized_A = BIO.getvalue()
filehandle_id = self.representations_files.put(serialized_A)
'''

and then restore them in the other application:

'''Python
numpy_array = np.load(BytesIO(serializedA))
'''
For us this is for development work only and I am less concerned about having 
mixed versions in my database, but in principle that is a scenario. But it 
seems to me that for this to work the reading application needs to be migrated 
to version 2 and temporarily extended with the NumpyUnpickler before the 
writing application is migrated. Or they need to be migrated at the same time. 
Is that correct?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Switching default order to column-major

2023-11-15 Thread Ronald van Elburg
My Cython code and my swig wrapped C++ code assumes the C-ordering and 
contiguous layout which allows for super fast code. I guess making it agnostic 
for the ordering would require implementing everything twice and then switch 
between them based on what comes in. That is a lot of work for no gain. 
Rewriting it for F-ordering would also be a pain.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Enhancement: np.convolve(..., mode="normalized")

2023-11-22 Thread Ronald van Elburg
I wonder whether you are looking for the solution in the right direction. Is 
there theory for the shape of the curve? In that case it might be better to see 
the problem as a fitting problem.

Other than that I think option 2 is too ad hoc for scientific work. I would opt 
for simply not showing the smoothed curve where it is not available. The convol 
function you specified here is a very narrow Gaussian, is that the function you 
actually used?

Note: The code you provided can not be executed
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] mean_std function returning both mean and std

2023-06-01 Thread Ronald van Elburg
I created a solution  for ENH: Computing std/var and mean at the same time, 
issue #23741. The solution can be found here: 
https://github.com/soundappraisal/numpy/tree/stdmean-dev-001 

I still need to add tests and the solution does touch the implementation of 
var. But before starting a pull request I like to check whether mean_std is a 
welcome addition.

Also I was struggling with the internally needed format of the arrays 
containing mean and std and the format produced as a result. Which makes me 
uncertain whether the chosen solution is correct for all cases.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-01 Thread Ronald van Elburg
Steps to make this complete:
- move resize of the mean array out of _mean_var and into the calling 
mean_std function (to reduce the impact of the code changes on existing 
functions)
- establish whether numpy/core/_add_newdocs.py needs to be updated (What is 
the function of this file?)
- add tests at  numpy/core/tests/test_numeric.py ‎
- add tests that establish whether the specified out matrix returns as 
output (It is easy to make mistakes and introduce changes which are not in 
place.)

Should we add mean_var aswell?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-02 Thread Ronald van Elburg
Mean_var, mean_std and tests are now ready. 
(https://github.com/soundappraisal/numpy/tree/stdmean-dev-001)

Some decisions made during implementation:
  - the output shape of mean follows the output shape of the variance or the 
standard deviation. So it responds in the same way to the keepdims flag as the 
variance and the standard deviation.
  - the resizing of the mean is placed in _mean_var the overhead on the old 
functions std and var is minimal as they set mean_out to None.
  - the intermediate mean used can not be replaced with the mean produced by 
_mean as the output of the latter can not be broadcast to the incoming data.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-02 Thread Ronald van Elburg
I think I left those aspects of the implementation untouched. But having 
someone more experienced look at it is probably a good idea.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-02 Thread Ronald van Elburg
Aha, the unnecessary copy mentioned in the  
https://dbs.ifi.uni-heidelberg.de/files/Team/eschubert/publications/SSDBM18-covariance-authorcopy.pdf.
 paper is a copy of the input. Here it is about discarding a valuable output 
(the mean) and then calculating that result separately.

Not throwing the mean away saves about 20% computation time. Or phrased 
differently the calculation of the variance spends about a 25% of the 
computation time on calculating the mean.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-02 Thread Ronald van Elburg
I am agnostic to the order of those changes. Also this is my first attempt to 
contribute to numpy, so I am not aware of all the ongoing discussions. I'll try 
to read the issue you just mentioned.

But in the code I rewrote replacing _mean_var with a faster version would 
benefit var, std, mean_var and mean_std because they all call _mean_var. 

The mean function is untouched.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-02 Thread Ronald van Elburg
I had a closer look at the paper. When I have more brain and time I may check 
the mathematics. The focus is however more on streaming data, which is an 
application with completely different demands. I think that here we can not 
afford to sample the data, which is an option in streaming database systems.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-03 Thread Ronald van Elburg
I had a look at C-solution, it delegates the summation over one axis from the 
axis tuple to the C-helper. And then the remaining axes are summed from 
_methods.py.  Worst case: if the axis delegated to helper is very short 
compared to the other axes I would expect hardly any speed-up, and savings on 
memory usage would also be limited. 

Sticking with this solution it would be a better from the point of view of 
speed and memory use to delegate the longest axis from the axis tuple to 
C-code. 

In my view a solution with which many would be happier  
(https://github.com/numpy/numpy/pull/13263#issuecomment-1048122467) would 
probably delegate all the axes to the helper function.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-05 Thread Ronald van Elburg
I had a look at what it would take to improve the C-solution. However I find 
that it is beyond my C-programming skils. 

The gufunc defintion seems to be at odds with current working of the axis 
keyword for mean, std and var. The latter support computation over multiple 
axes, whereas the gufunc only seems to support calculation over a single axis.  
As the behaviour of std, mean and var is largely inherited from ufuncs those 
might offer a better starting point.

If the operator used in the ufunc could take a parameter from the outer_loop 
accessing in this case the mean, then it would be possible to calculate the 
required intermediate quantities. This should be a possibility as somewhere the 
out array is also accessed in the correct manner and we should step through 
both arrays in the same way.

Instead of:

'''Pseudocode
result = np.full(result_shape, op.identity) # op = ufunc

loop_outer_axes_result_array:
loop_over_inner_axes_input_array:
result[outer_axes] = op(result[outer_axes],
InArray[outer_axes + inner_axes])
'''

we would then get:


'''Pseudocode
result = np.full(result_shape, op.identity) # op = ufunc

loop_outer_axes_result_array:
loop_over_inner_axes_input_array:
result[outer_axes] = op(result[outer_axes],
InArray[outer_axes + inner_axes],
ParameterArray[outer_axes])
'''

Using for op:

'''Pseudocode
op(a,b,c) = a+b-c
'''

and for b the original data and for c the mean (M_1) you would obtain the Neely 
correction for the mean.

Similarly using:

'''Pseudocode
op(a,b,c) = a+(b-c)^2
'''
you would obtain the sum of squared errors.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-05 Thread Ronald van Elburg
Note: the suggested solution requires no allocation of memory beyond that 
needed for storing the result.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-05 Thread Ronald van Elburg
2nd note: I implicit based this on the reduce function.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-07 Thread Ronald van Elburg
I have a pull request, but I am stuck for a day now on how to handle the masked 
arrays. 

I made some progress by calling the MaskedArray methods, but in some cases 
those methods call back the ndarray methods via their super class. The method 
_mean_var for ndarray  need to resize the produced mean to align the shape of 
the mean and variance or standard deviation, but if the incoming and therefore 
the outgoing object is a MaskedArray that is not allowed.  

Also I sometimes see some uppredictable behavior which gives me the feeling I 
am looking at pointer problems.

python runtests.py -t numpy/core/tests/test_numeric.py 
passes now

python runtests.py -t numpy/ma/tests/
is fialing with weird erros on complex masked arrays, particularly:
 test_varstd
 test_complex
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-07 Thread Ronald van Elburg
OK, same two tests fail on main (50984037) aswell.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-06-08 Thread Ronald van Elburg
Issue #23896 is the cause of these two failing tests. 

With  CFLAGS="NPY_DISABLE_OPTIMIZATION=1" the tests pass.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: mean_std function returning both mean and std

2023-07-06 Thread Ronald van Elburg
Second attempt after the triage review of last week: ENH: add mean keyword to 
std and var #24126 (https://github.com/numpy/numpy/pull/24126)
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
Ilhan Polat wrote:

> I think all these point to the missing convenient functionality that
> extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
> one but in NumPy you need to punch quite some code and some courage to
> remember whether it is hstack or vstack or concat or block as the correct
> naming which decreases the "code morale". 

Not having a convenient workaround is not the only problem. The workaround is 
wastefull with memory and involves unnecessary copying of  an array. Having a 
keyword implemented with these concerns in mind might avoid this.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
I was trying to get a feel for how often the work around occurs. I found three 
clear examples in Scipy and one unclear case. One case in holoviews. Two in 
numpy. One from soundappraisal's code base.

Next to prepending to the output, I also see prepending to the input as a 
workaround.

Some examples of workarounds:

scipy: (prepending to the output)

scipy/scipy/sparse/construct.py:

'''Python
row_offsets = np.append(0, np.cumsum(brow_lengths))
col_offsets = np.append(0, np.cumsum(bcol_lengths))
'''

scipy/scipy/sparse/dia.py:

'''Python
indptr = np.zeros(num_cols + 1, dtype=idx_dtype)
indptr[1:offset_len+1] = np.cumsum(mask.sum(axis=0))
'''

scipy/scipy/sparse/csgraph/_tools.pyx:

'''Python
indptr = np.zeros(N + 1, dtype=ITYPE)
indptr[1:] = mask.sum(1).cumsum()
'''

Not sure whether this is also an example:

scipy/scipy/stats/_hypotests_pythran.py
'''Python
# Now fill in the values. We cannot use cumsum, unfortunately.
val = 0.0 if minj == 0 else 1.0
for jj in range(maxj - minj):
j = jj + minj
val = (A[jj + minj - lastminj] * i + val * j) / (i + j)
A[jj] = val
'''

holoviews: (prepending to the input)

'''Python
# We add a zero in the begging for the cumulative sum
points = np.zeros((areas_in_radians.shape[0] + 1))
points[1:] = areas_in_radians
points = points.cumsum()
'''


numpy (prepending to the input):

numpy/numpy/lib/_iotools.py :

'''Python
idx = np.cumsum([0] + list(delimiter))
'''

numpy/numpy/lib/histograms.py

'''Python
cw = np.concatenate((zero, sw.cumsum()))
'''



soundappraisal own code: (prepending to the output)

'''Python
def get_cumulativepixelareas(whiteboard):
whiteboard['cumulativepixelareas'] = \
np.concatenate((np.array([0, ]), 
np.cumsum(whiteboard['pixelareas'])))
return True
'''
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Ronald van Elburg
> Whether it's necessary to have other keywords to prepend anything other
> than zero, or append rather than prepend, is a lot less clear. Did you find
> a clear need for those things?

No, I haven't found them. For streaming data there might be usecases for 
starting with an initial offset, but I expect there might be no need for a 
returned offset there.

What is notable is that all examples above are 1D.  

To get the behavior of the API right, the simplest solution is to make the 
workaround part of the implementation. What I was pondering on is whether it is 
desirable to allocate the memory once and avoid copying the data. What is the 
price to pay  in terms of code complexity and developer time? Also if the 
accumulation would run in place on a copy of the input data then prepending the 
input might be a good  option introducing very little new overhead.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Automatic binning for np.histogram

2025-03-07 Thread Ronald van Elburg
I don't think there is an automatic  method for correct binning.

The methods mentioned in the pull request and related issue are all based on 
the assumption that the underlying distribution is Gaussian. There is 
absolutely no reason to assume that.  

Reasonable expectations for automatic binning:
- it will be wrong most of the time.

Reasonable number of bins for a sample of size  n:
- max(10, sqrt(n)) to make sure there is a large number of filled bins, 
while still providing information about the data values for low numbers.

The documentation could point out that automatic binning should only be used 
for exploring a single data set as it is unsuited for comparing two different 
datasets. Also for later use in testing distribution similarity automatically 
binned data is not suited.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com