[Numpy-discussion] ndarray.sort x86 dispatch

2023-01-04 Thread Peter Schneider-Kamp
Hi guys,

I am trying to understand how the x86 dispatch for ndarray sort works. The 
following call in Line 137 of numpy/core/src/npysort/quicksort.cpp returns 0 
for my test cases:

if (x86_dispatch::quicksort(start, num))
return 0;

I have tried to compile with --cpu-dispatch="AVX512_KNL AVX512_CLX AVX512_CNL 
AVX512_ICL AVX512_SKX" but for dtype=uint64 (or int64 or uint8 or float32 or 
float64) it always the same result, i.e., the standard quicksort is used 
instead of the AVX512 one with bitonic sorting base cases.

What do I have to do to be able to use the AVX512 implementation?

I am currently compiling on a MacBook Pro with Monterey. I have all kinds of 
Linux machines available, if that should be a requirements.

Thanks in advance for any insights!

Cheers,
Peter
--
Peter Schneider-Kamp
Professor in Artificial Intelligence
Department of Mathematics & Computer Science
University of Southern Denmark

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ndarray.sort x86 dispatch

2023-01-05 Thread Peter Schneider-Kamp
Hi Sebastian,

This is what I get when I compile with just “python3.8 setup.py build -j 16”:

- BEGIN -
### CLIB COMPILER OPTIMIZATION ###
INFO: Platform  :
  Architecture: x64
  Compiler: gcc

CPU baseline  :
  Requested   : 'min'
  Enabled : SSE SSE2 SSE3
  Flags   : -msse -msse2 -msse3
  Extra checks: none

CPU dispatch  :
  Requested   : 'max -xop -fma4'
  Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD 
AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
  Generated   : none
- END -

I was wondering why it says “Generated  : none” for the CPU dispatch?

This is the output of “np.show_runtime()”:

- BEGIN -
[{'numpy_version': '0+untagged.31149.g6d474f2',
  'python': '3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]',
  'uname': uname_result(system='Linux', node='lambda', 
release='5.4.0-135-generic', version='#152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 
2022', machine='x86_64', processor='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
  'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX'],
  'not_found': ['AVX512_KNL',
'AVX512_KNM',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}}]
- END -

Apparently, the CPU supports AVX512_SKX, i.e., avx512bw, avx512dq, avx512vl 
(bold face is mine, rest is from /cpu/procinfo):

- BEGIN -
model name : Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp 
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 
smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap 
clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 
xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln 
pts hwp hwp_act_window hwp_epp hwp_pkg_req md_clear flush_l1d arch_capabilities
- END -

Am I building numpy incorrectly?

I really need to be able to execute the AVX512 quicksort implementation as part 
of a research project, where we generate efficient sorting implementations that 
we would like to contribute to numpy to the degree that they improve upon 
existing solutions.

Any help is highly appreciated!

Cheers,
Peter

From: Sebastian Berg 
Date: Friday, 6 January 2023 at 08.04
To: numpy-discussion@python.org 
Subject: [Numpy-discussion] Re: ndarray.sort x86 dispatch
[You don't often get email from sebast...@sipsolutions.net. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Wed, 2023-01-04 at 04:06 +, Peter Schneider-Kamp wrote:
> Hi guys,
>
> I am trying to understand how the x86 dispatch for ndarray sort
> works. The following call in Line 137 of
> numpy/core/src/npysort/quicksort.cpp returns 0 for my test cases:
>
> if (x86_dispatch::quicksort(start, num))
> return 0;
>
> I have tried to compile with --cpu-dispatch="AVX512_KNL AVX512_CLX
> AVX512_CNL AVX512_ICL AVX512_SKX" but for dtype=uint64 (or int64 or
> uint8 or float32 or float64) it always the same result, i.e., the
> standard quicksort is used instead of the AVX512 one with bitonic
> sorting base cases.
>
> What do I have to do to be able to use the AVX512 implementation?


You can check what is found with:

np.show_runtime()

Also just google your CPU or check `cat /proc/cpuinfo`.  AVX512 exists
currently only on high end Intel CPUs (IIRC), presumably, you simply do
not have the hardware.

E.g. on M1, `show_runtime()

[Numpy-discussion] Re: ENH: Efficient operations on already sorted arrays

2024-01-16 Thread Peter Schneider-Kamp via NumPy-Discussion
Dear all,

I second Yagiz’ proposal. I do, however, see that we need to ensure code style 
(and probably other forms of) consistency before merging these new functions 
in. And, particularly important, adherence to the conventions of function and 
method names.

Cheers,
Peter

From: Yağız Ölmez 
Date: Monday, 15 January 2024 at 14.16
To: numpy-discussion@python.org 
Subject: [Numpy-discussion] ENH: Efficient operations on already sorted arrays
You don't often get email from yagiz.ol...@gmail.com. Learn why this is 
important
Dear Numpy Community

It has come to my attention that there is no function in Numpy to merge two 
sorted arrays. There was a request for it in 2014, but it did not go anywhere:

https://github.com/numpy/numpy/issues/5000

I have come across this package by Frank Sauerburger, which implements this and 
many other operations on sorted arrays:

https://gitlab.sauerburger.com/frank/sortednp

This package is distributed under MIT License, so it can be merged into Numpy.
Please let me know what you think!

Best
Yagiz Olmez
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com