On Mon, Mar 20, 2023 at 10:48:45AM +0100, Jan Beulich wrote: > On 17.03.2023 13:26, Andrew Cooper wrote: > > On 17/03/2023 11:22 am, Roger Pau Monné wrote: > >> On Mon, Jul 15, 2019 at 02:39:04PM +0000, Jan Beulich wrote: > >>> This is faster than using the software implementation, and the insn is > >>> available on all half-way recent hardware. Therefore convert > >>> generic_hweight<N>() to out-of-line functions (without affecting Arm) > >>> and use alternatives patching to replace the function calls. > >>> > >>> Note that the approach doesn#t work for clang, due to it not recognizing > >>> -ffixed-*. > >> I've been giving this a look, and I wonder if it would be fine to > >> simply push and pop the scratch registers in the 'call' path of the > >> alternative, as that won't require any specific compiler option. > > Hmm, ... > > > It's been a long while, and in that time I've learnt a lot more about > > performance, but my root objection to the approach taken here still > > stands - it is penalising the common case to optimise some pointless > > corner cases. > > > > Yes - on the call path, an extra push/pop pair (or few) to get temp > > registers is basically free. > > ... what is "a few"? We'd need to push/pop all call-clobbered registers > except %rax, i.e. a total of eight. I consider this too much. Unless, > as you suggest further down, we wrote the fallback in assembly. Which I > have to admit I'm surprised you propose when we strive to reduce the > amount of assembly we have to maintain.
AMD added popcnt in 2007 and Intel in 2008. While we shouldn't mandate popcnt support, I think we also shouldn't overly worry about the non-popcnt path. Also, how can you assert that the code generated without the scratch registers being usable won't be worse than the penalty of pushing and popping such registers on the stack and letting the routines use all registers freely? I very much prefer to have a non-optimal non-popcnt path, but have popcnt support for both gcc and clang, and without requiring any compiler options. Thanks, Roger.
