[PATCH 1/9] lib/bitmap: add bitmap_weight_{eq,gt,le}
Many kernel users call bitmap_weight() to compare the result against some number or expression: if (bitmap_weight(...) > 1) do_something(); It works OK, but may be significantly improved for large bitmaps: if first few words count set bits to a number greater than given, we can stop counting and immediately return. The same idea would work in other direction: if we know that the number of set bits that we counted so far is small enough, so that it would be smaller than required number even if all bits of the rest of the bitmap are set, we can return earlier. This patch adds new bitmap_weight_{eq,gt,le} functions to allow this optimization, and the following patches apply them where appropriate. Signed-off-by: Yury Norov --- include/linux/bitmap.h | 33 ++ lib/bitmap.c | 63 ++ 2 files changed, 96 insertions(+) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 7dba0847510c..996041f771c8 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -51,6 +51,9 @@ struct device; * bitmap_empty(src, nbits)Are all bits zero in *src? * bitmap_full(src, nbits) Are all bits set in *src? * bitmap_weight(src, nbits) Hamming Weight: number set bits + * bitmap_weight_eq(src, nbits, num) Hamming Weight is equal to num + * bitmap_weight_gt(src, nbits, num) Hamming Weight is greater than num + * bitmap_weight_le(src, nbits, num) Hamming Weight is less than num * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area @@ -162,6 +165,9 @@ int __bitmap_intersects(const unsigned long *bitmap1, int __bitmap_subset(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); +bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int nbits, unsigned int num); +bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int nbits, unsigned int num); +bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int nbits, unsigned int num); void __bitmap_set(unsigned long *map, unsigned int start, int len); void __bitmap_clear(unsigned long *map, unsigned int start, int len); @@ -403,6 +409,33 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int return __bitmap_weight(src, nbits); } +static __always_inline bool bitmap_weight_eq(const unsigned long *src, + unsigned int nbits, unsigned int num) +{ + if (small_const_nbits(nbits)) + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) == num; + + return __bitmap_weight_eq(src, nbits, num); +} + +static __always_inline bool bitmap_weight_gt(const unsigned long *src, + unsigned int nbits, unsigned int num) +{ + if (small_const_nbits(nbits)) + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) > num; + + return __bitmap_weight_gt(src, nbits, num); +} + +static __always_inline bool bitmap_weight_le(const unsigned long *src, + unsigned int nbits, unsigned int num) +{ + if (small_const_nbits(nbits)) + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) < num; + + return __bitmap_weight_le(src, nbits, num); +} + static __always_inline void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits) { diff --git a/lib/bitmap.c b/lib/bitmap.c index 926408883456..72e7ab2d7bdd 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -348,6 +348,69 @@ int __bitmap_weight(const unsigned long *bitmap, unsigned int bits) } EXPORT_SYMBOL(__bitmap_weight); +bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int bits, unsigned int num) +{ + unsigned int k, w, lim = bits / BITS_PER_LONG; + + for (k = 0, w = 0; k < lim; k++) { + if (w + bits - k * BITS_PER_LONG < num) + return false; + + w += hweight_long(bitmap[k]); + + if (w > num) + return false; + } + + if (bits % BITS_PER_LONG) + w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits)); + + return w == num; +} +EXPORT_SYMBOL(__bitmap_weight_eq); + +bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int bits, unsigned int num) +{ + unsigned int k, w, lim = bits / BITS_PER_LONG; + + for (k = 0, w = 0; k < lim; k++) { + if (w + bits - k * BITS_PER_LONG <= num) + return false; + + w += hweight_long(bitmap[k]); + + if (w > num) + return true; + } + + if (bits %
[PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage
In many cases people use bitmap_weight()-based functions like this: if (num_present_cpus() > 1) do_something(); This may take considerable amount of time on many-cpus machines because num_present_cpus() will traverse every word of underlying cpumask unconditionally. We can significantly improve on it for many real cases if stop traversing the mask as soon as we count present cpus to any number greater than 1: if (num_present_cpus_gt(1)) do_something(); To implement this idea, the series adds bitmap_weight_{eq,gt,le} functions together with corresponding wrappers in cpumask and nodemask. Yury Norov (9): lib/bitmap: add bitmap_weight_{eq,gt,le} lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq() all: replace bitmap_weigth() with bitmap_{empty,full,eq,gt,le} tools: sync bitmap_weight() usage with the kernel lib/cpumask: add cpumask_weight_{eq,gt,le} lib/nodemask: add nodemask_weight_{eq,gt,le} lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le} lib/nodemask: add num_node_state_eq() MAINTAINERS: add cpumask and nodemask files to BITMAP_API MAINTAINERS | 4 ++ arch/alpha/kernel/process.c | 2 +- arch/arc/kernel/smp.c | 2 +- arch/arm/kernel/machine_kexec.c | 2 +- arch/arm/mach-exynos/exynos.c | 2 +- arch/arm/mm/cache-b15-rac.c | 2 +- arch/arm64/kernel/smp.c | 2 +- arch/arm64/mm/context.c | 2 +- arch/csky/mm/asid.c | 2 +- arch/csky/mm/context.c| 2 +- arch/ia64/kernel/setup.c | 2 +- arch/ia64/mm/tlb.c| 8 +-- arch/mips/cavium-octeon/octeon-irq.c | 4 +- arch/mips/kernel/crash.c | 2 +- arch/mips/kernel/i8253.c | 2 +- arch/mips/kernel/perf_event_mipsxx.c | 4 +- arch/mips/kernel/rtlx-cmp.c | 2 +- arch/mips/kernel/smp.c| 4 +- arch/mips/kernel/vpe-cmp.c| 2 +- .../loongson2ef/common/cs5536/cs5536_mfgpt.c | 2 +- arch/mips/mm/context.c| 2 +- arch/mips/mm/tlbex.c | 2 +- arch/nds32/kernel/perf_event_cpu.c| 4 +- arch/nios2/kernel/cpuinfo.c | 2 +- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/kernel/watchdog.c| 4 +- arch/powerpc/platforms/85xx/smp.c | 2 +- arch/powerpc/platforms/pseries/hotplug-cpu.c | 4 +- arch/powerpc/sysdev/mpic.c| 2 +- arch/powerpc/xmon/xmon.c | 10 +-- arch/riscv/kvm/vmid.c | 2 +- arch/s390/kernel/perf_cpum_cf.c | 2 +- arch/sparc/kernel/mdesc.c | 6 +- arch/x86/events/amd/core.c| 2 +- arch/x86/kernel/alternative.c | 8 +-- arch/x86/kernel/apic/apic.c | 4 +- arch/x86/kernel/apic/apic_flat_64.c | 2 +- arch/x86/kernel/apic/probe_32.c | 2 +- arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c| 18 +++--- arch/x86/kernel/hpet.c| 2 +- arch/x86/kernel/i8253.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/kvmclock.c| 2 +- arch/x86/kernel/smpboot.c | 4 +- arch/x86/kernel/tsc.c | 2 +- arch/x86/kvm/hyperv.c | 8 +-- arch/x86/mm/amdtopology.c | 2 +- arch/x86/mm/mmio-mod.c| 2 +- arch/x86/mm/numa_emulation.c | 4 +- arch/x86/platform/uv/uv_nmi.c | 2 +- arch/x86/xen/smp_pv.c | 2 +- arch/x86/xen/spinlock.c | 2 +- drivers/acpi/numa/srat.c | 2 +- drivers/clk/samsung/clk-exynos4.c | 2 +- drivers/clocksource/ingenic-timer.c | 3 +- drivers/cpufreq/pcc-cpufreq.c | 2 +- drivers/cpufreq/qcom-cpufreq-hw.c | 2 +- drivers/cpufreq/scmi-cpufreq.c| 2 +- drivers/crypto/ccp/ccp-dev-v5.c | 5 +- drivers/dma/mv_xor.c | 5 +- drivers/firmware/psci/psci_checker.c | 2 +- drivers/gpu/drm/i810/i810_drv.c | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 2 +- drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +- drivers/hv/channel_mgmt.c | 4 +- drivers/iio/adc/mxs-lradc-adc.c | 3 +- drivers/iio/dummy/iio_simple_dummy_buffer.c | 4 +- drivers/iio/industrialio-buffer.c | 2 +- drivers/iio/industrialio-trigger.c
[PATCH 2/9] lib/bitmap: implement bitmap_{empty, full} with bitmap_weight_eq()
Now as we have bitmap_weight_eq(), switch bitmap_full() and bitmap_empty() to using it. Signed-off-by: Yury Norov --- include/linux/bitmap.h | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 996041f771c8..2d951e4dc814 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1, return __bitmap_subset(src1, src2, nbits); } -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) -{ - if (small_const_nbits(nbits)) - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); - - return find_first_bit(src, nbits) == nbits; -} - -static inline bool bitmap_full(const unsigned long *src, unsigned int nbits) -{ - if (small_const_nbits(nbits)) - return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits)); - - return find_first_zero_bit(src, nbits) == nbits; -} - static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) { if (small_const_nbits(nbits)) @@ -436,6 +420,16 @@ static __always_inline bool bitmap_weight_le(const unsigned long *src, return __bitmap_weight_le(src, nbits, num); } +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, 0); +} + +static __always_inline bool bitmap_full(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, nbits); +} + static __always_inline void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits) { -- 2.25.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 4/9] tools: sync bitmap_weight() usage with the kernel
bitmap_weight() counts all set bits in the bitmap unconditionally. However in some cases we can traverse a part of bitmap when we only need to check if number of set bits is greater, less or equal to some number. This patch adds bitmap_weight_{eq,gt,le}, reimplements bitmap_{empty,full} and replace bitmap_weight() where appropriate. Signed-off-by: Yury Norov --- tools/include/linux/bitmap.h | 42 +++-- tools/lib/bitmap.c | 60 tools/perf/builtin-c2c.c | 4 +-- tools/perf/util/pmu.c| 2 +- 4 files changed, 96 insertions(+), 12 deletions(-) diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index ea97804d04d4..eb2831f7e5a7 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -12,6 +12,9 @@ unsigned long name[BITS_TO_LONGS(bits)] int __bitmap_weight(const unsigned long *bitmap, int bits); +bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int nbits, unsigned int num); +bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int nbits, unsigned int num); +bool __bitmap_weight_le(const unsigned long *bitmap, unsigned int nbits, unsigned int num); void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, int bits); int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, @@ -45,27 +48,48 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits); } -static inline int bitmap_empty(const unsigned long *src, unsigned nbits) +static inline int bitmap_weight(const unsigned long *src, unsigned int nbits) { if (small_const_nbits(nbits)) - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)); + return __bitmap_weight(src, nbits); +} - return find_first_bit(src, nbits) == nbits; +static __always_inline bool bitmap_weight_eq(const unsigned long *src, + unsigned int nbits, unsigned int num) +{ + if (small_const_nbits(nbits)) + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) == num; + + return __bitmap_weight_eq(src, nbits, num); } -static inline int bitmap_full(const unsigned long *src, unsigned int nbits) +static __always_inline bool bitmap_weight_gt(const unsigned long *src, + unsigned int nbits, unsigned int num) { if (small_const_nbits(nbits)) - return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits)); + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) > num; - return find_first_zero_bit(src, nbits) == nbits; + return __bitmap_weight_gt(src, nbits, num); } -static inline int bitmap_weight(const unsigned long *src, unsigned int nbits) +static __always_inline bool bitmap_weight_le(const unsigned long *src, + unsigned int nbits, unsigned int num) { if (small_const_nbits(nbits)) - return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)); - return __bitmap_weight(src, nbits); + return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)) < num; + + return __bitmap_weight_le(src, nbits, num); +} + +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, 0); +} + +static __always_inline bool bitmap_full(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, nbits); } static inline void bitmap_or(unsigned long *dst, const unsigned long *src1, diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c index db466ef7be9d..3aaf1767d237 100644 --- a/tools/lib/bitmap.c +++ b/tools/lib/bitmap.c @@ -18,6 +18,66 @@ int __bitmap_weight(const unsigned long *bitmap, int bits) return w; } +bool __bitmap_weight_eq(const unsigned long *bitmap, unsigned int bits, unsigned int num) +{ + unsigned int k, w, lim = bits / BITS_PER_LONG; + + for (k = 0, w = 0; k < lim; k++) { + if (w + bits - k * BITS_PER_LONG < num) + return false; + + w += hweight_long(bitmap[k]); + + if (w > num) + return false; + } + + if (bits % BITS_PER_LONG) + w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits)); + + return w == num; +} + +bool __bitmap_weight_gt(const unsigned long *bitmap, unsigned int bits, unsigned int num) +{ + unsigned int k, w, lim = bits / BITS_PER_LONG; + + for (k = 0, w = 0; k < lim; k++) { + if (w + bits - k * BITS_PER_LONG <= num) + return false; + + w += hweight_long(bitmap[k]); + + if (w > num) + return true; + } + + if (bits % BITS_PER_LONG) + w += hweigh
[PATCH 3/9] all: replace bitmap_weigth() with bitmap_{empty, full, eq, gt, le}
bitmap_weight() counts all set bits in the bitmap unconditionally. However in some cases we can traverse a part of bitmap when we only need to check if number of set bits is greater, less or equal to some number. This patch replaces bitmap_weight() with one of bitmap_{empty,full,eq,gt,le), as appropriate. In some places driver code has been optimized further, where it's trivial. Signed-off-by: Yury Norov --- arch/nds32/kernel/perf_event_cpu.c | 4 +--- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++-- arch/x86/kvm/hyperv.c | 8 drivers/crypto/ccp/ccp-dev-v5.c| 5 + drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +- drivers/iio/adc/mxs-lradc-adc.c| 3 +-- drivers/iio/dummy/iio_simple_dummy_buffer.c| 4 ++-- drivers/iio/industrialio-buffer.c | 2 +- drivers/iio/industrialio-trigger.c | 2 +- drivers/memstick/core/ms_block.c | 4 ++-- drivers/net/dsa/b53/b53_common.c | 2 +- drivers/net/ethernet/broadcom/bcmsysport.c | 6 +- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++-- drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 2 +- .../ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +- .../ethernet/marvell/octeontx2/nic/otx2_flows.c| 8 .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +- drivers/net/ethernet/mellanox/mlx4/cmd.c | 10 +++--- drivers/net/ethernet/mellanox/mlx4/eq.c| 4 ++-- drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_dev.c | 3 +-- drivers/net/ethernet/qlogic/qed/qed_rdma.c | 4 ++-- drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +- drivers/perf/arm-cci.c | 2 +- drivers/perf/arm_pmu.c | 4 ++-- drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +- drivers/perf/thunderx2_pmu.c | 3 +-- drivers/perf/xgene_pmu.c | 2 +- drivers/pwm/pwm-pca9685.c | 2 +- drivers/staging/media/tegra-video/vi.c | 2 +- drivers/thermal/intel/intel_powerclamp.c | 10 -- fs/ocfs2/cluster/heartbeat.c | 14 +++--- 33 files changed, 57 insertions(+), 75 deletions(-) diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c index a78a879e7ef1..05a1cd258356 100644 --- a/arch/nds32/kernel/perf_event_cpu.c +++ b/arch/nds32/kernel/perf_event_cpu.c @@ -695,10 +695,8 @@ static void nds32_pmu_enable(struct pmu *pmu) { struct nds32_pmu *nds32_pmu = to_nds32_pmu(pmu); struct pmu_hw_events *hw_events = nds32_pmu->get_hw_events(); - int enabled = bitmap_weight(hw_events->used_mask, - nds32_pmu->num_events); - if (enabled) + if (!bitmap_empty(hw_events->used_mask, nds32_pmu->num_events)) nds32_pmu->start(nds32_pmu); } diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index b57b3db9a6a7..94e7e6b420e4 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -2749,10 +2749,10 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s, cfg->new_ctrl = cbm_ensure_valid(cfg->new_ctrl, r); /* * Assign the u32 CBM to an unsigned long to ensure that -* bitmap_weight() does not access out-of-bound memory. +* bitmap_weight_le() does not access out-of-bound memory. */ tmp_cbm = cfg->new_ctrl; - if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < r->cache.min_cbm_bits) { + if (bitmap_weight_le(&tmp_cbm, r->cache.cbm_len, r->cache.min_cbm_bits) { rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id); return -ENOSPC; } diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 5e19e6e4c2ce..8b72c896e0f1 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -90,7 +90,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic, { struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic); struct kvm_hv *hv = to_kvm_hv(vcpu->kvm); - int auto_eoi_old, auto_eoi_new; + bool auto_eoi_old, auto_eoi_new; if (vector < HV_SYNIC_FIRST_VALID_VECTOR) return; @@ -100,16 +100,16 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic, else __clear_bit(vector, synic->vec_bitmap); - auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256); + auto_eoi_old = bitmap_empty(synic->auto_eoi_bitmap, 256); if (synic_has_vector_auto_eoi(synic, vector)) __set_bit(vector, syn
[PATCH 5/9] lib/cpumask: add cpumask_weight_{eq,gt,le}
Add cpumask_weight_{eq,gt,le} and replace cpumask_weight() with one of cpumask_weight_{empty,eq,gt,le} where appropriate. This allows cpumask_weight_*() to return earlier depending on the condition. Signed-off-by: Yury Norov --- arch/alpha/kernel/process.c | 2 +- arch/ia64/kernel/setup.c | 2 +- arch/ia64/mm/tlb.c | 2 +- arch/mips/cavium-octeon/octeon-irq.c | 4 +-- arch/mips/kernel/crash.c | 2 +- arch/powerpc/kernel/smp.c| 2 +- arch/powerpc/kernel/watchdog.c | 4 +-- arch/powerpc/xmon/xmon.c | 4 +-- arch/s390/kernel/perf_cpum_cf.c | 2 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 +-- arch/x86/kernel/smpboot.c| 4 +-- arch/x86/mm/mmio-mod.c | 2 +- arch/x86/platform/uv/uv_nmi.c| 2 +- drivers/cpufreq/qcom-cpufreq-hw.c| 2 +- drivers/cpufreq/scmi-cpufreq.c | 2 +- drivers/firmware/psci/psci_checker.c | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 2 +- drivers/hv/channel_mgmt.c| 4 +-- drivers/infiniband/hw/hfi1/affinity.c| 13 +- drivers/infiniband/hw/qib/qib_file_ops.c | 2 +- drivers/infiniband/hw/qib/qib_iba7322.c | 2 +- drivers/infiniband/sw/siw/siw_main.c | 3 +-- drivers/irqchip/irq-bcm6345-l1.c | 2 +- drivers/scsi/lpfc/lpfc_init.c| 2 +- drivers/soc/fsl/qbman/qman_test_stash.c | 2 +- include/linux/cpumask.h | 32 kernel/irq/affinity.c| 2 +- kernel/padata.c | 2 +- kernel/rcu/tree_nocb.h | 4 +-- kernel/rcu/tree_plugin.h | 2 +- kernel/sched/core.c | 10 kernel/sched/topology.c | 4 +-- kernel/time/clockevents.c| 2 +- kernel/time/clocksource.c| 2 +- mm/vmstat.c | 4 +-- 35 files changed, 89 insertions(+), 59 deletions(-) diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index 5f8527081da9..0d4bc60828bf 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -125,7 +125,7 @@ common_shutdown_1(void *generic_ptr) /* Wait for the secondaries to halt. */ set_cpu_present(boot_cpuid, false); set_cpu_possible(boot_cpuid, false); - while (cpumask_weight(cpu_present_mask)) + while (!cpumask_empty(cpu_present_mask)) barrier(); #endif diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index 5010348fa21b..fd6301eafa9d 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -572,7 +572,7 @@ setup_arch (char **cmdline_p) #ifdef CONFIG_ACPI_HOTPLUG_CPU prefill_possible_map(); #endif - per_cpu_scan_finalize((cpumask_weight(&early_cpu_possible_map) == 0 ? + per_cpu_scan_finalize((cpumask_empty(&early_cpu_possible_map) ? 32 : cpumask_weight(&early_cpu_possible_map)), additional_cpus > 0 ? additional_cpus : 0); #endif /* CONFIG_ACPI_NUMA */ diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c index 135b5135cace..a5bce13ab047 100644 --- a/arch/ia64/mm/tlb.c +++ b/arch/ia64/mm/tlb.c @@ -332,7 +332,7 @@ __flush_tlb_range (struct vm_area_struct *vma, unsigned long start, preempt_disable(); #ifdef CONFIG_SMP - if (mm != current->active_mm || cpumask_weight(mm_cpumask(mm)) != 1) { + if (mm != current->active_mm || !cpumask_weight_eq(mm_cpumask(mm), 1)) { ia64_global_tlb_purge(mm, start, end, nbits); preempt_enable(); return; diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c index 844f882096e6..914871f15fb7 100644 --- a/arch/mips/cavium-octeon/octeon-irq.c +++ b/arch/mips/cavium-octeon/octeon-irq.c @@ -763,7 +763,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data) if (!cpumask_test_cpu(cpu, mask)) return; - if (cpumask_weight(mask) > 1) { + if (cpumask_weight_gt(mask, 1)) { /* * It has multi CPU affinity, just remove this CPU * from the affinity set. @@ -795,7 +795,7 @@ static int octeon_irq_ciu_set_affinity(struct irq_data *data, * This removes the need to do locking in the .ack/.eoi * functions. */ - if (cpumask_weight(dest) != 1) + if (!cpumask_weight_eq(dest, 1)) return -EINVAL; if (!enable_one) diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c index 81845ba04835..4c35004754db 100644 --- a/arch/mips/kernel/crash.c +++ b/arch/mips/kernel/crash.c @@ -72,7 +72,7 @@ static void crash_kexec_prepare_cpus(void) */ pr_emerg("Sending IPI to other cpus...\n"); msecs = 1; - while ((cpumask_weight(&cpus_
[PATCH 6/9] lib/nodemask: add nodemask_weight_{eq,gt,le}
Add nodemask_weight_{eq,gt,le} and replace nodemask_weight() where appropriate. This allows nodemask_weight_*() to return earlier depending on the condition. Signed-off-by: Yury Norov --- arch/x86/mm/amdtopology.c| 2 +- arch/x86/mm/numa_emulation.c | 4 ++-- drivers/acpi/numa/srat.c | 2 +- include/linux/nodemask.h | 24 mm/mempolicy.c | 2 +- 5 files changed, 29 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c index 058b2f36b3a6..b3ca7d23e4b0 100644 --- a/arch/x86/mm/amdtopology.c +++ b/arch/x86/mm/amdtopology.c @@ -154,7 +154,7 @@ int __init amd_numa_init(void) node_set(nodeid, numa_nodes_parsed); } - if (!nodes_weight(numa_nodes_parsed)) + if (nodes_empty(numa_nodes_parsed)) return -ENOENT; /* diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c index 1a02b791d273..9a9305367fdd 100644 --- a/arch/x86/mm/numa_emulation.c +++ b/arch/x86/mm/numa_emulation.c @@ -123,7 +123,7 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei, * Continue to fill physical nodes with fake nodes until there is no * memory left on any of them. */ - while (nodes_weight(physnode_mask)) { + while (!nodes_empty(physnode_mask)) { for_each_node_mask(i, physnode_mask) { u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN); u64 start, limit, end; @@ -270,7 +270,7 @@ static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei, * Fill physical nodes with fake nodes of size until there is no memory * left on any of them. */ - while (nodes_weight(physnode_mask)) { + while (!nodes_empty(physnode_mask)) { for_each_node_mask(i, physnode_mask) { u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN); u64 start, limit, end; diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 66a0142dc78c..c4f80d2d85bf 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -67,7 +67,7 @@ int acpi_map_pxm_to_node(int pxm) node = pxm_to_node_map[pxm]; if (node == NUMA_NO_NODE) { - if (nodes_weight(nodes_found_map) >= MAX_NUMNODES) + if (nodes_weight_gt(nodes_found_map, MAX_NUMNODES + 1)) return NUMA_NO_NODE; node = first_unset_node(nodes_found_map); __acpi_map_pxm_to_node(pxm, node); diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..3801ec5b06f4 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -38,6 +38,9 @@ * int nodes_empty(mask) Is mask empty (no bits sets)? * int nodes_full(mask)Is mask full (all bits sets)? * int nodes_weight(mask) Hamming weight - number of set bits + * bool nodes_weight_eq(src, nbits, num) Hamming Weight is equal to num + * bool nodes_weight_gt(src, nbits, num) Hamming Weight is greater than num + * bool nodes_weight_le(src, nbits, num) Hamming Weight is less than num * * void nodes_shift_right(dst, src, n) Shift right * void nodes_shift_left(dst, src, n) Shift left @@ -240,6 +243,27 @@ static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits) return bitmap_weight(srcp->bits, nbits); } +#define nodes_weight_eq(nodemask, num) __nodes_weight_eq(&(nodemask), MAX_NUMNODES, (num)) +static inline int __nodes_weight_eq(const nodemask_t *srcp, + unsigned int nbits, unsigned int num) +{ + return bitmap_weight_eq(srcp->bits, nbits, num); +} + +#define nodes_weight_gt(nodemask, num) __nodes_weight_gt(&(nodemask), MAX_NUMNODES, (num)) +static inline int __nodes_weight_gt(const nodemask_t *srcp, + unsigned int nbits, unsigned int num) +{ + return bitmap_weight_gt(srcp->bits, nbits, num); +} + +#define nodes_weight_le(nodemask, num) __nodes_weight_le(&(nodemask), MAX_NUMNODES, (num)) +static inline int __nodes_weight_le(const nodemask_t *srcp, + unsigned int nbits, unsigned int num) +{ + return bitmap_weight_le(srcp->bits, nbits, num); +} + #define nodes_shift_right(dst, src, n) \ __nodes_shift_right(&(dst), &(src), (n), MAX_NUMNODES) static inline void __nodes_shift_right(nodemask_t *dstp, diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b1fcdb4d25d6..4a48ce5b86cf 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1154,7 +1154,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, * [0-7] - > [3,4,5] moves only 0,1,2,6,7. */ - if ((nodes_weight(*from) != nodes_weight(*to)) && + if (!nodes
[PATCH 8/9] lib/nodemask: add num_node_state_eq()
Add num_node_state_eq() and replace num_node_state() with it in page_alloc_init(). Signed-off-by: Yury Norov --- include/linux/nodemask.h | 5 + mm/page_alloc.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 3801ec5b06f4..b68ee2a80164 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -455,6 +455,11 @@ static inline int num_node_state(enum node_states state) return nodes_weight(node_states[state]); } +static inline int num_node_state_eq(enum node_states state, unsigned int num) +{ + return nodes_weight_eq(node_states[state], num); +} + #define for_each_node_state(__node, __state) \ for_each_node_mask((__node), node_states[__state]) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 91c1105a9efe..81d55ffb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8323,7 +8323,7 @@ void __init page_alloc_init(void) int ret; #ifdef CONFIG_NUMA - if (num_node_state(N_MEMORY) == 1) + if (num_node_state_eq(N_MEMORY, 1)) hashdist = 0; #endif -- 2.25.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 9/9] MAINTAINERS: add cpumask and nodemask files to BITMAP_API
cpumask and nodemask APIs are thin wrappers around basic bitmap API, and corresponding files are not formally maintained. This patch adds them to BITMAP_API section, so that bitmap folks would have closer look at it. Signed-off-by: Yury Norov --- MAINTAINERS | 4 1 file changed, 4 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 00ad0cb5cb05..ceeffcd81fa4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3375,10 +3375,14 @@ R: Andy Shevchenko R: Rasmus Villemoes S: Maintained F: include/linux/bitmap.h +F: include/linux/cpumask.h F: include/linux/find.h +F: include/linux/nodemask.h F: lib/bitmap.c +F: lib/cpumask.c F: lib/find_bit.c F: lib/find_bit_benchmark.c +F: lib/nodemask.c F: lib/test_bitmap.c F: tools/include/linux/bitmap.h F: tools/include/linux/find.h -- 2.25.1 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 7/9] lib/cpumask: add num_{possible, present, active}_cpus_{eq, gt, le}
Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus() with one of new functions where appropriate. This allows num_*_cpus_*() to return earlier depending on the condition. Signed-off-by: Yury Norov --- arch/arc/kernel/smp.c | 2 +- arch/arm/kernel/machine_kexec.c | 2 +- arch/arm/mach-exynos/exynos.c | 2 +- arch/arm/mm/cache-b15-rac.c | 2 +- arch/arm64/kernel/smp.c | 2 +- arch/arm64/mm/context.c | 2 +- arch/csky/mm/asid.c | 2 +- arch/csky/mm/context.c| 2 +- arch/ia64/mm/tlb.c| 6 ++--- arch/mips/kernel/i8253.c | 2 +- arch/mips/kernel/perf_event_mipsxx.c | 4 ++-- arch/mips/kernel/rtlx-cmp.c | 2 +- arch/mips/kernel/smp.c| 4 ++-- arch/mips/kernel/vpe-cmp.c| 2 +- .../loongson2ef/common/cs5536/cs5536_mfgpt.c | 2 +- arch/mips/mm/context.c| 2 +- arch/mips/mm/tlbex.c | 2 +- arch/nios2/kernel/cpuinfo.c | 2 +- arch/powerpc/platforms/85xx/smp.c | 2 +- arch/powerpc/platforms/pseries/hotplug-cpu.c | 4 ++-- arch/powerpc/sysdev/mpic.c| 2 +- arch/powerpc/xmon/xmon.c | 6 ++--- arch/riscv/kvm/vmid.c | 2 +- arch/sparc/kernel/mdesc.c | 6 ++--- arch/x86/events/amd/core.c| 2 +- arch/x86/kernel/alternative.c | 8 +++ arch/x86/kernel/apic/apic.c | 4 ++-- arch/x86/kernel/apic/apic_flat_64.c | 2 +- arch/x86/kernel/apic/probe_32.c | 2 +- arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +- arch/x86/kernel/hpet.c| 2 +- arch/x86/kernel/i8253.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/kvmclock.c| 2 +- arch/x86/kernel/tsc.c | 2 +- arch/x86/xen/smp_pv.c | 2 +- arch/x86/xen/spinlock.c | 2 +- drivers/clk/samsung/clk-exynos4.c | 2 +- drivers/clocksource/ingenic-timer.c | 3 +-- drivers/cpufreq/pcc-cpufreq.c | 2 +- drivers/dma/mv_xor.c | 5 ++-- drivers/gpu/drm/i810/i810_drv.c | 2 +- drivers/irqchip/irq-gic.c | 2 +- drivers/net/caif/caif_virtio.c| 2 +- .../cavium/liquidio/cn23xx_vf_device.c| 2 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 +- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +- drivers/net/wireless/ath/ath9k/hw.c | 2 +- drivers/net/wireless/marvell/mwifiex/main.c | 4 ++-- drivers/net/wireless/st/cw1200/queue.c| 3 +-- drivers/nvdimm/region.c | 2 +- drivers/nvme/host/pci.c | 2 +- drivers/perf/arm_pmu.c| 2 +- .../intel/speed_select_if/isst_if_common.c| 6 ++--- drivers/soc/bcm/brcmstb/biuctrl.c | 2 +- drivers/soc/fsl/dpio/dpio-service.c | 4 ++-- drivers/spi/spi-dw-bt1.c | 2 +- drivers/virt/acrn/hsm.c | 2 +- fs/xfs/xfs_sysfs.c| 2 +- include/linux/cpumask.h | 23 +++ include/linux/kdb.h | 2 +- kernel/debug/kdb/kdb_bt.c | 2 +- kernel/printk/printk.c| 2 +- kernel/reboot.c | 4 ++-- kernel/time/clockevents.c | 2 +- mm/percpu.c | 6 ++--- mm/slab.c | 2 +- 67 files changed, 110 insertions(+), 90 deletions(-) diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c index 78e6d069b1c1..d4f2765755c9 100644 --- a/arch/arc/kernel/smp.c +++ b/arch/arc/kernel/smp.c @@ -103,7 +103,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) * if platform didn't set the present map already, do it now * boot cpu is set to present already by init/main.c */ - if (num_present_cpus() <= 1) + if (num_present_cpus_le(2)) init_cpu_present(cpu_possible_mask); } diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index f567032a09c0..8875e2ee0083 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -44,7 +44,7 @@ int machine_kexec_prepare(struct kimage *image) * and implements CPU hotplug for the current HW. If not, we won't be * able to kexec reliably, so fail the prepare operation. */ - if (num_possible_cpus() > 1 && platform_can_secondary_
Re: [PATCH 2/9] lib/bitmap: implement bitmap_{empty,full} with bitmap_weight_eq()
On Sun, Nov 28, 2021 at 05:37:19AM +0100, Michał Mirosław wrote: > On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote: > > Now as we have bitmap_weight_eq(), switch bitmap_full() and > > bitmap_empty() to using it. > [...] > > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) > > -{ > > - if (small_const_nbits(nbits)) > > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); > > - > > - return find_first_bit(src, nbits) == nbits; > > -} > [...] > > +static __always_inline bool bitmap_empty(const unsigned long *src, > > unsigned int nbits) > > +{ > > + return bitmap_weight_eq(src, nbits, 0); > > +} > [..] > > What's the speed difference? Have you benchmarked this? bitmap_weight_eq() should be faster than find_first_bit(), but the difference is few cycles, so I didn't bother measuring it. New version looks just better. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 7/9] lib/cpumask: add num_{possible,present,active}_cpus_{eq,gt,le}
(restore CC list) On Sun, Nov 28, 2021 at 05:56:51AM +0100, Michał Mirosław wrote: > On Sat, Nov 27, 2021 at 07:57:02PM -0800, Yury Norov wrote: > > Add num_{possible,present,active}_cpus_{eq,gt,le} and replace num_*_cpus() > > with one of new functions where appropriate. This allows num_*_cpus_*() > > to return earlier depending on the condition. > [...] > > @@ -3193,7 +3193,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size, > > > > /* allocate pages */ > > j = 0; > > - for (unit = 0; unit < num_possible_cpus(); unit++) { > > + for (unit = 0; num_possible_cpus_gt(unit); unit++) { > > This looks dubious. Only this? > The old version I could hope the compiler would call > num_possible_cpus() only once if it's marked const or pure, but the > alternative is going to count the bits every time making this a guaranteed > O(n^2) even though the bitmap doesn't change. num_possible_cpus() is not const neither pure. This is O(n^2) before and after. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc