Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-23 Thread Alexander Monakov
Hello, On Wed, 12 Jun 2024, Paolo Bonzini wrote: > I didn't do this because of RHEL9, I did it because it's silly that > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to > compute the x86 parity flag (and POPCNT was introduced at the same > time as SSE4.2). I do not see where the

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Daniel P. Berrangé wrote: > I learnt that FESCo approved a surprisingly loose rule saying > > "Libraries packaged in Fedora may require ISA extensions, >however any packaged application must not crash on any >officially supported architecture, either by providing >

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Paolo Bonzini wrote: > On Wed, Jun 12, 2024 at 3:34 PM Alexander Monakov wrote: > > On Wed, 12 Jun 2024, Paolo Bonzini wrote: > > > > I found out from the mailing list. My Core2-based desktop would be > > > > affected. > > > &g

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Paolo Bonzini wrote: > > I found out from the mailing list. My Core2-based desktop would be affected. > > Do you run QEMU on it? With KVM or TCG? Excuse me? Are you going to ask for SSH access to ensure my computer really exists and is in working order? Can you tell me wh

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Paolo Bonzini wrote: > Ahah, nice. :) I'm pretty sure that, when I tested "pf = > (__builtin_popcount(x) & 1) * 4;", it was generating a call to > __builtin_popcountsi2. Why write '__builtin_popcount(x) & 1' when you can write '__builtin_parity(x)' in the first place? > S

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Paolo Bonzini wrote: > On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov wrote: > > On Wed, 12 Jun 2024, Paolo Bonzini wrote: > > > I didn't do this because of RHEL9, I did it because it's silly that > > > QEMU cannot use POPCNT and

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Paolo Bonzini wrote: > I didn't do this because of RHEL9, I did it because it's silly that > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to > compute the x86 parity flag (and POPCNT was introduced at the same > time as SSE4.2). >From looking at that POPCNT

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
On Wed, 12 Jun 2024, Daniel P. Berrangé wrote: > On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote: > > Hello, > > > > I'm sending straightforward reverts to recent patches that bumped minimum > > required x86 instruction set to SSE4.2. The older

[PATCH 4/5] Revert "host/i386: assume presence of CMOV"

2024-06-12 Thread Alexander Monakov
This reverts commit e68e97ce55b3d17af22dd62c3b3dc72f761b0862. Revert in preparation to rolling back x86_64-v2 ISA requirement. Signed-off-by: Alexander Monakov --- host/include/i386/host/cpuinfo.h | 1 + tcg/i386/tcg-target.c.inc| 15 ++- util/cpuinfo-i386.c

[PATCH 1/5] Revert "host/i386: assume presence of POPCNT"

2024-06-12 Thread Alexander Monakov
This reverts commit 45ccdbcb24baf99667997fac5cf60318e5e7db51. Revert in preparation to rolling back x86_64-v2 ISA requirement. Signed-off-by: Alexander Monakov --- host/include/i386/host/cpuinfo.h | 1 + tcg/i386/tcg-target.h| 5 +++-- util/cpuinfo-i386.c | 1 + 3

[PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov
very minor gains from the baseline raise, I'm honestly not sure why it happened. It seems better to let distributions handle that. Alexander Monakov (5): Revert "host/i386: assume presence of POPCNT" Revert "host/i386: assume presence of SSSE3" Revert "host/i386

[PATCH 3/5] Revert "host/i386: assume presence of SSE2"

2024-06-12 Thread Alexander Monakov
This reverts commit b18236897ca15c3db1506d8edb9a191dfe51429c. Revert in preparation to rolling back x86_64-v2 ISA requirement. Signed-off-by: Alexander Monakov --- host/include/i386/host/cpuinfo.h | 1 + util/bufferiszero.c | 4 ++-- util/cpuinfo-i386.c | 1 + 3 files

[PATCH 5/5] Revert "meson: assume x86-64-v2 baseline ISA"

2024-06-12 Thread Alexander Monakov
This reverts commit 294ac64e459aca023f43441651d860980c9784f1. Reinstate the ability to use Qemu on x86 hosts that do not meet x86_64-v2 ISA baseline. Signed-off-by: Alexander Monakov --- meson.build | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/meson.build b

[PATCH 2/5] Revert "host/i386: assume presence of SSSE3"

2024-06-12 Thread Alexander Monakov
This reverts commit 433cd6d94a8256af70a5200f236dc8047c3c1468. Revert in preparation to rolling back x86_64-v2 ISA requirement. Signed-off-by: Alexander Monakov --- util/cpuinfo-i386.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/util/cpuinfo-i386.c b/util/cpuinfo

Re: [PATCH 0/6] host/i386: require x86-64-v2 ISA

2024-06-06 Thread Alexander Monakov
Hi, On Fri, 31 May 2024, Paolo Bonzini wrote: > x86-64-v2 processors were released in 2008, assume that we have one. > This provides CMOV on 32-bit processors, and also POPCNT and various > vector ISA extensions. If my contributions to recent cleanups and speedups for buffer_is_zero count for so

Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant

2024-04-29 Thread Alexander Monakov
On Mon, 29 Apr 2024, Daniel P. Berrangé wrote: > On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote: > > From: Alexander Monakov > > > > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD > > routines are invoked much more rar

Re: [PATCH v5 06/10] util/bufferiszero: Improve scalar variant

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Split less-than and greater-than 256 cases. > Use unaligned accesses for head and tail. > Avoid using out-of-bounds pointers in loop boundary conditions. I guess it did not carry typedef uint64_t uint64_a __attribute__((may_alias)); along the

Re: [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to reduc

Re: [PATCH v5 10/10] tests/bench: Add bufferiszero-bench

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Benchmark each acceleration function vs an aligned buffer of zeros. > > Signed-off-by: Richard Henderson > --- > + > +static void test(const void *opaque) > +{ > +size_t len = 64 * KiB; This exceeds L1 cache capacity, so the performance ceil

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-16 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/15/24 13:37, Alexander Monakov wrote: > > Ah, I guess you might be running at low perf_event_paranoid setting that > > allows unprivileged sampling of kernel events? In our submissions the > > percentage was for perf_

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and > > v2, > > are you saying they did not reach your inbox? > > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/ > > https://lore.kernel.org/qe

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:57, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ > >> > >> Changes fo

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:47, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > >> double-check with the

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
e speedup the patchset was bringing, doesn't it? Is there some concern I am not seeing? > - Split out a >= 256 integer routine. > - Simplify acceleration selection for testing. > - Add function pointer typedef. > - Implement new aarch64 accelerations. > > > r~ >

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov
On Wed, 14 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to reduc

Re: [PATCH v3 2/6] util/bufferiszero: introduce an inline wrapper

2024-02-06 Thread Alexander Monakov
On Wed, 7 Feb 2024, Richard Henderson wrote: > On 2/7/24 06:48, Alexander Monakov wrote: > > Make buffer_is_zero a 'static inline' function that tests up to three > > bytes from the buffer before handing off to an unrolled loop. This > > eliminates call overhead

Re: [PATCH v3 3/6] util/bufferiszero: remove AVX512 variant

2024-02-06 Thread Alexander Monakov
On Tue, 6 Feb 2024, Elena Ufimtseva wrote: > Hello Alexander > > On Tue, Feb 6, 2024 at 12:50 PM Alexander Monakov > wrote: > > > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD > > routines are invoked much more rarely in normal use when mo

[PATCH v3 1/6] util/bufferiszero: remove SSE4.1 variant

2024-02-06 Thread Alexander Monakov
ncy is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 29 - 1 file c

[PATCH v3 6/6] util/bufferiszero: improve scalar variant

2024-02-06 Thread Alexander Monakov
Take into account that the inline wrapper ensures len >= 4. Use __attribute__((may_alias)) for accesses via non-char pointers. Avoid using out-of-bounds pointers in loop boundary conditions by reformulating the 'for' loop as 'if (...) do { ... } while (...)'. Signed-of

[PATCH v3 2/6] util/bufferiszero: introduce an inline wrapper

2024-02-06 Thread Alexander Monakov
se in Qemu). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- include/qemu/cutils.h | 28 +++- util/bufferiszero.c | 76 --- 2 files changed, 47 insertions(+), 57 deletions(-) diff --git a/include/qemu/cutils.h b/include/qem

[PATCH v3 3/6] util/bufferiszero: remove AVX512 variant

2024-02-06 Thread Alexander Monakov
performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov --- util/bufferiszero.c | 36 ++-- 1 file changed, 2 insertions(+), 34 deletions(-) diff --git a/util

[PATCH v3 4/6] util/bufferiszero: remove useless prefetches

2024-02-06 Thread Alexander Monakov
in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c037d11d04

[PATCH v3 5/6] util/bufferiszero: optimize SSE2 and AVX2 variants

2024-02-06 Thread Alexander Monakov
variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 108 1 file changed, 69 insertions(+), 39 deletions(-) diff --git

[PATCH v3 0/6] Optimize buffer_is_zero

2024-02-06 Thread Alexander Monakov
. Changed for v3: - separate into 6 patches - fix an oversight which would break the build on non-x86 hosts - properly avoid out-of-bounds pointers in the scalar variant Alexander Monakov (6): util/bufferiszero: remove SSE4.1 variant util/bufferiszero: introduce an inline wrapper util

Re: [PATCH v2] Optimize buffer_is_zero

2024-01-14 Thread Alexander Monakov
On Tue, 9 Jan 2024, Daniel P. Berrangé wrote: > On Thu, Nov 09, 2023 at 03:52:38PM +0300, Alexander Monakov wrote: > > I'd like to ping this patch on behalf of Mikhail. > > > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > > > >

Re: [PATCH v2] Optimize buffer_is_zero

2024-01-09 Thread Alexander Monakov
Ping^3. On Thu, 14 Dec 2023, Alexander Monakov wrote: > Ping^2. > > On Thu, 9 Nov 2023, Alexander Monakov wrote: > > > I'd like to ping this patch on behalf of Mikhail. > > > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > >

Re: [PATCH v2] Optimize buffer_is_zero

2023-12-14 Thread Alexander Monakov
Ping^2. On Thu, 9 Nov 2023, Alexander Monakov wrote: > I'd like to ping this patch on behalf of Mikhail. > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > > If this needs to be split up a bit to ease review, please let us know. > > On

Re: [PATCH v2] Optimize buffer_is_zero

2023-11-09 Thread Alexander Monakov
I'd like to ping this patch on behalf of Mikhail. https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ If this needs to be split up a bit to ease review, please let us know. On Fri, 27 Oct 2023, Mikhail Romanov wrote: > Improve buffer_is_zero function which is often used in qem

[Qemu-devel] [GSoC?] Board autoconfiguration based on DTB info

2018-01-22 Thread Alexander Monakov
Hello, Is it feasible to consume a DTB file in Qemu itself to make the board match the DeviceTree hardware description? For example on Arm there are quite a few .dts files in Linux tree for various boards; having a "generic" Arm board in Qemu that could [to what degree?] emulate any of those soun