Re: [PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le}

2021-11-28 Thread Greg Kroah-Hartman
On Sat, Nov 27, 2021 at 07:56:58PM -0800, Yury Norov wrote:
> bitmap_weight() counts all set bits in the bitmap unconditionally.
> However in some cases we can traverse a part of bitmap when we
> only need to check if number of set bits is greater, less or equal
> to some number.
> 
> This patch replaces bitmap_weight() with one of
> bitmap_{empty,full,eq,gt,le), as appropriate.
> 
> In some places driver code has been optimized further, where it's
> trivial.
> 
> Signed-off-by: Yury Norov 
> ---
>  arch/nds32/kernel/perf_event_cpu.c |  4 +---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 ++--
>  arch/x86/kvm/hyperv.c  |  8 
>  drivers/crypto/ccp/ccp-dev-v5.c|  5 +
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c   |  2 +-
>  drivers/iio/adc/mxs-lradc-adc.c|  3 +--
>  drivers/iio/dummy/iio_simple_dummy_buffer.c|  4 ++--
>  drivers/iio/industrialio-buffer.c  |  2 +-
>  drivers/iio/industrialio-trigger.c |  2 +-
>  drivers/memstick/core/ms_block.c   |  4 ++--
>  drivers/net/dsa/b53/b53_common.c   |  2 +-
>  drivers/net/ethernet/broadcom/bcmsysport.c |  6 +-
>  drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c   |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  2 +-
>  .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c  |  2 +-
>  .../ethernet/marvell/octeontx2/nic/otx2_flows.c|  8 
>  .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c   |  2 +-
>  drivers/net/ethernet/mellanox/mlx4/cmd.c   | 10 +++---
>  drivers/net/ethernet/mellanox/mlx4/eq.c|  4 ++--
>  drivers/net/ethernet/mellanox/mlx4/main.c  |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  2 +-
>  drivers/net/ethernet/qlogic/qed/qed_dev.c  |  3 +--
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c |  4 ++--
>  drivers/net/ethernet/qlogic/qed/qed_roce.c |  2 +-
>  drivers/perf/arm-cci.c |  2 +-
>  drivers/perf/arm_pmu.c |  4 ++--
>  drivers/perf/hisilicon/hisi_uncore_pmu.c   |  2 +-
>  drivers/perf/thunderx2_pmu.c   |  3 +--
>  drivers/perf/xgene_pmu.c   |  2 +-
>  drivers/pwm/pwm-pca9685.c  |  2 +-
>  drivers/staging/media/tegra-video/vi.c |  2 +-
>  drivers/thermal/intel/intel_powerclamp.c   | 10 --
>  fs/ocfs2/cluster/heartbeat.c   | 14 +++---
>  33 files changed, 57 insertions(+), 75 deletions(-)

After you get the new functions added to the kernel tree, this patch
should be broken up into one-patch-per-subsystem and submitted through
the various subsystem trees.

thanks,

greg k-h

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Nicholas Piggin
Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> In many cases people use bitmap_weight()-based functions like this:
> 
>   if (num_present_cpus() > 1)
>   do_something();
> 
> This may take considerable amount of time on many-cpus machines because
> num_present_cpus() will traverse every word of underlying cpumask
> unconditionally.
> 
> We can significantly improve on it for many real cases if stop traversing
> the mask as soon as we count present cpus to any number greater than 1:
> 
>   if (num_present_cpus_gt(1))
>   do_something();
> 
> To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> functions together with corresponding wrappers in cpumask and nodemask.

There would be no change to callers if you maintain counters like what
is done for num_online_cpus() today. Maybe some fixes to arch code that
does not use set_cpu_possible() etc APIs required, but AFAIKS it would
be better to fix such cases anyway.

Thanks,
Nick

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Joe Perches
On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> with one of new functions where appropriate. This allows num_*_cpus_*()
> to return earlier depending on the condition.
[]
> diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
[]
> @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>* if platform didn't set the present map already, do it now
>* boot cpu is set to present already by init/main.c
>*/
> - if (num_present_cpus() <= 1)
> + if (num_present_cpus_le(2))
>   init_cpu_present(cpu_possible_mask);

?  is this supposed to be 2 or 1

> diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
[]
> @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
>   return ret;
>   }
>  
> - if (num_present_cpus() > 4) {
> + if (num_present_cpus_gt(4)) {
>   pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
>   pr_err("%s: Too many CPUs, dynamic performance scaling 
> disabled\n",
>  __func__);

It looks as if the present variants should be using the same values
so the _le test above with 1 changed to 2 looks odd.



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > with one of new functions where appropriate. This allows num_*_cpus_*()
> > to return earlier depending on the condition.
> []
> > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> []
> > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> >  * if platform didn't set the present map already, do it now
> >  * boot cpu is set to present already by init/main.c
> >  */
> > -   if (num_present_cpus() <= 1)
> > +   if (num_present_cpus_le(2))
> > init_cpu_present(cpu_possible_mask);
> 
> ?  is this supposed to be 2 or 1

X <= 1 is the equivalent of X < 2.

> > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> []
> > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > return ret;
> > }
> >  
> > -   if (num_present_cpus() > 4) {
> > +   if (num_present_cpus_gt(4)) {
> > pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > pr_err("%s: Too many CPUs, dynamic performance scaling 
> > disabled\n",
> >__func__);
> 
> It looks as if the present variants should be using the same values
> so the _le test above with 1 changed to 2 looks odd.
 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Dennis Zhou
Hello,

On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >* if platform didn't set the present map already, do it now
> > >* boot cpu is set to present already by init/main.c
> > >*/
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > >   init_cpu_present(cpu_possible_mask);
> > 
> > ?  is this supposed to be 2 or 1
> 
> X <= 1 is the equivalent of X < 2.
> 
> > > diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
> > []
> > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > >   return ret;
> > >   }
> > >  
> > > - if (num_present_cpus() > 4) {
> > > + if (num_present_cpus_gt(4)) {
> > >   pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > >   pr_err("%s: Too many CPUs, dynamic performance scaling 
> > > disabled\n",
> > >  __func__);
> > 
> > It looks as if the present variants should be using the same values
> > so the _le test above with 1 changed to 2 looks odd.
>  

I think the confusion comes from le meaning less than rather than lt.
Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
gt (>), I'd consider renaming your le to lt.

Thanks,
Dennis

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible, present, active}_cpus_{eq, gt, le}

2021-11-28 Thread Emil Renner Berthing
On Sun, 28 Nov 2021 at 18:43, Yury Norov  wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >  * if platform didn't set the present map already, do it now
> > >  * boot cpu is set to present already by init/main.c
> > >  */
> > > -   if (num_present_cpus() <= 1)
> > > +   if (num_present_cpus_le(2))
> > > init_cpu_present(cpu_possible_mask);
> >
> > ?  is this supposed to be 2 or 1
>
> X <= 1 is the equivalent of X < 2.

Ah, then the function is confusing. Usually it's lt = less than and lt
= less than or equal. Same idea for gt vs ge.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Joe Perches
On Sun, 2021-11-28 at 09:43 -0800, Yury Norov wrote:
> On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus()
> > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > to return earlier depending on the condition.
> > []
> > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > []
> > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > >* if platform didn't set the present map already, do it now
> > >* boot cpu is set to present already by init/main.c
> > >*/
> > > - if (num_present_cpus() <= 1)
> > > + if (num_present_cpus_le(2))
> > >   init_cpu_present(cpu_possible_mask);
> > 
> > ?  is this supposed to be 2 or 1
> 
> X <= 1 is the equivalent of X < 2.

True. The call though is _le not _lt



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 12:54:00PM -0500, Dennis Zhou wrote:
> Hello,
> 
> On Sun, Nov 28, 2021 at 09:43:20AM -0800, Yury Norov wrote:
> > On Sun, Nov 28, 2021 at 09:07:52AM -0800, Joe Perches wrote:
> > > On Sat, 2021-11-27 at 19:57 -0800, Yury Norov wrote:
> > > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace 
> > > > num_*_cpus()
> > > > with one of new functions where appropriate. This allows num_*_cpus_*()
> > > > to return earlier depending on the condition.
> > > []
> > > > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
> > > []
> > > > @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > > >  * if platform didn't set the present map already, do it now
> > > >  * boot cpu is set to present already by init/main.c
> > > >  */
> > > > -   if (num_present_cpus() <= 1)
> > > > +   if (num_present_cpus_le(2))
> > > > init_cpu_present(cpu_possible_mask);
> > > 
> > > ?  is this supposed to be 2 or 1
> > 
> > X <= 1 is the equivalent of X < 2.
> > 
> > > > diff --git a/drivers/cpufreq/pcc-cpufreq.c 
> > > > b/drivers/cpufreq/pcc-cpufreq.c
> > > []
> > > > @@ -593,7 +593,7 @@ static int __init pcc_cpufreq_init(void)
> > > > return ret;
> > > > }
> > > >  
> > > > -   if (num_present_cpus() > 4) {
> > > > +   if (num_present_cpus_gt(4)) {
> > > > pcc_cpufreq_driver.flags |= 
> > > > CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> > > > pr_err("%s: Too many CPUs, dynamic performance scaling 
> > > > disabled\n",
> > > >__func__);
> > > 
> > > It looks as if the present variants should be using the same values
> > > so the _le test above with 1 changed to 2 looks odd.
> >  
> 
> I think the confusion comes from le meaning less than rather than lt.
> Given the general convention of: lt (<), le (<=), eg (=), ge (>=),
> gt (>), I'd consider renaming your le to lt.

Ok, makes sense. I'll rename in v2 and add <= and >= versions.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 09:08:41PM +1000, Nicholas Piggin wrote:
> Excerpts from Yury Norov's message of November 28, 2021 1:56 pm:
> > In many cases people use bitmap_weight()-based functions like this:
> > 
> > if (num_present_cpus() > 1)
> > do_something();
> > 
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> > 
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> > 
> > if (num_present_cpus_gt(1))
> > do_something();
> > 
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
> 
> There would be no change to callers if you maintain counters like what
> is done for num_online_cpus() today. Maybe some fixes to arch code that
> does not use set_cpu_possible() etc APIs required, but AFAIKS it would
> be better to fix such cases anyway.

Thanks, Nick. I'll try to do this.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

2021-11-28 Thread Yury Norov
On Sun, Nov 28, 2021 at 07:03:41PM +0100, mirq-t...@rere.qmqm.pl wrote:
> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> > In many cases people use bitmap_weight()-based functions like this:
> > 
> > if (num_present_cpus() > 1)
> > do_something();
> > 
> > This may take considerable amount of time on many-cpus machines because
> > num_present_cpus() will traverse every word of underlying cpumask
> > unconditionally.
> > 
> > We can significantly improve on it for many real cases if stop traversing
> > the mask as soon as we count present cpus to any number greater than 1:
> > 
> > if (num_present_cpus_gt(1))
> > do_something();
> > 
> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> > functions together with corresponding wrappers in cpumask and nodemask.
> 
> Having slept on it I have more structured thoughts:
> 
> First, I like substituting bitmap_empty/full where possible - I think
> the change stands on its own, so could be split and sent as is.

Ok, I can do it.

> I don't like the proposed API very much. One problem is that it hides
> the comparison operator and makes call sites less readable:
> 
>   bitmap_weight(...) > N
> 
> becomes:
> 
>   bitmap_weight_gt(..., N)
> 
> and:
>   bitmap_weight(...) <= N
> 
> becomes:
> 
>   bitmap_weight_lt(..., N+1)
> or:
>   !bitmap_weight_gt(..., N)
> 
> I'd rather see something resembling memcmp() API that's known enough
> to be easier to grasp. For above examples:
> 
>   bitmap_weight_cmp(..., N) > 0
>   bitmap_weight_cmp(..., N) <= 0
>   ...

bitmap_weight_cmp() cannot be efficient. Consider this example:

bitmap_weight_lt(1000   , 1) == false
 ^
 stop here

bitmap_weight_cmp(1000   , 1) == 0
 ^
 stop here

I agree that '_gt' is less verbose than '>', but the advantage of 
'_gt' over '>' is proportional to length of bitmap, and it means
that this API should exist.

> This would also make the implementation easier in not having to
> copy and paste the code three times. Could also use a simple
> optimization reducing code size:

In the next version I'll reduce code duplication like this:

bool bitmap_eq(..., N);
bool bitmap_ge(..., N);

#define bitmap_weight_gt(..., N)  bitmap_weight_ge(..., N + 1)
#define bitmap_weight_lt(..., N) !bitmap_weight_ge(..., N)
#define bitmap_weight_le(..., N) !bitmap_weight_gt(..., N)

Thanks,
Yury

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc