[Numpy-discussion] Port Powersort Improvement of Timsort

2023-06-14 Thread Ben Weston
Hi all,

I'm working with the University of Liverpool to implement Munro & Wild's 
Powersort in various libraries.

Powersort has been included in CPython 3.11 and PyPy as a replacement for 
Timsort (https://github.com/python/cpython/issues/78742), therefore porting 
Powersort to numpy is a logical next step. Compared to Timsort's original merge 
policy, Powersort finds a near-optimal order of merges without measurable 
overhead and so seems like a no-detriment improvement. (See also benchmarks 
referenced in the GitHub issue).

I wonder if you'd be receptive to a PR bringing the same change to numpy.  I do 
have an existing implementation in C++ to work from 
(https://github.com/sebawild/powersort), in addition to the C implementations 
in CPython and PyPy and am willing to invest time into porting Powersort for 
numpy.

Being a new contributor to this project, any pointers to relevant processes, 
documentation, and source files I should bear in mind would greatly appreciated.

Kind regards,
Ben
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Port Powersort Improvement of Timsort

2023-06-14 Thread Stefan van der Walt
Hi Ben,

Thanks for your proposal!

On Wed, Jun 14, 2023, at 03:24, Ben Weston wrote:
> I wonder if you'd be receptive to a PR bringing the same change to 
> numpy.  I do have an existing implementation in C++ to work from 
> (https://github.com/sebawild/powersort), in addition to the C 
> implementations in CPython and PyPy and am willing to invest time into 
> porting Powersort for numpy.

Powersort looks interesting.  I see in our docs:

```
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
and 'mergesort' use timsort or radix sort under the covers and, in general,
the actual implementation will vary with data type. The 'mergesort' option
is retained for backwards compatibility.
```
Given that timsort is used "underneath the hood", it may be an option to 
improve it, especially given that it does not change fundamental aspects such 
as stability.

You can take a look at src/npysort/timsort.cpp to see what we currently have (a 
whole bunch of code for handling various object types).

Best regards,
Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com