On 14/03/2024 2:30 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct
>> domain_iommu *hd,
>> if ( contig_mask )
>> {
>> /* See pt-contig-markers.h for a description of the marker scheme.
>> */
>> - unsigned int i, shift = find_first_set_bit(contig_mask);
>> + unsigned int i, shift = ffsl(contig_mask) - 1;
> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).
It's sad that there are competing APIs with different bit-labelling, but
the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
ARM that I studied in detail).
I firmly believe that fewer APIs which are fully well defined (and can
optimise based on the compiler's idea of safety) is still better than a
maze of APIs with different behaviours.
> But what I first of all would like to have clarification on is what your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to be odd
> if the difference in bit labeling was retained. Since we're switching to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.
I honestly hadn't got that far yet. I was mainly trying to dis-entangle
the existing mess so RISC-V wasn't making it yet-worse.
But yes - it warrants thinking about.
I was intending to do the fls() next then popcnt(). The latter has
quite a lot of cleanup wanting to come with it, and is more
architecturally invasive, and I know I've got a years-old outstanding
piece of work to try and do popcnt more nicely on x86.
I have wanted ffz() in the past. I think I just went with explicit ~
because I didn't want to continue this debate at the time.
However, I (very much more) do not want a situation where ffs() and
ffz() have different bit-labellings.
There are no builtins, and having now studied the architectures we care
about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
leading/trailing zeros" instruction.
So using ffs(~val) really will get you the best code generation
available, and seeing as it halves the number of bitops to maintain, I
think this is the best tradeoff overall.
I intend to put ffz() and __ffs() into linux-compat.h and leave them
there to discourage their use generally.
~Andrew