Hello,
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).
I do not see where the
On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:
> I learnt that FESCo approved a surprisingly loose rule saying
>
> "Libraries packaged in Fedora may require ISA extensions,
>however any packaged application must not crash on any
>officially supported architecture, either by providing
>
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 3:34 PM Alexander Monakov wrote:
> > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > > I found out from the mailing list. My Core2-based desktop would be
> > > > affected.
> > >
&g
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > I found out from the mailing list. My Core2-based desktop would be affected.
>
> Do you run QEMU on it? With KVM or TCG?
Excuse me? Are you going to ask for SSH access to ensure my computer really
exists and is in working order?
Can you tell me wh
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> Ahah, nice. :) I'm pretty sure that, when I tested "pf =
> (__builtin_popcount(x) & 1) * 4;", it was generating a call to
> __builtin_popcountsi2.
Why write '__builtin_popcount(x) & 1' when you can write
'__builtin_parity(x)' in the first place?
> S
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov wrote:
> > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > I didn't do this because of RHEL9, I did it because it's silly that
> > > QEMU cannot use POPCNT and
On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).
>From looking at that POPCNT
On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:
> On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote:
> > Hello,
> >
> > I'm sending straightforward reverts to recent patches that bumped minimum
> > required x86 instruction set to SSE4.2. The older
This reverts commit e68e97ce55b3d17af22dd62c3b3dc72f761b0862.
Revert in preparation to rolling back x86_64-v2 ISA requirement.
Signed-off-by: Alexander Monakov
---
host/include/i386/host/cpuinfo.h | 1 +
tcg/i386/tcg-target.c.inc| 15 ++-
util/cpuinfo-i386.c
This reverts commit 45ccdbcb24baf99667997fac5cf60318e5e7db51.
Revert in preparation to rolling back x86_64-v2 ISA requirement.
Signed-off-by: Alexander Monakov
---
host/include/i386/host/cpuinfo.h | 1 +
tcg/i386/tcg-target.h| 5 +++--
util/cpuinfo-i386.c | 1 +
3
very minor gains from the baseline raise, I'm honestly not
sure why it happened. It seems better to let distributions handle that.
Alexander Monakov (5):
Revert "host/i386: assume presence of POPCNT"
Revert "host/i386: assume presence of SSSE3"
Revert "host/i386
This reverts commit b18236897ca15c3db1506d8edb9a191dfe51429c.
Revert in preparation to rolling back x86_64-v2 ISA requirement.
Signed-off-by: Alexander Monakov
---
host/include/i386/host/cpuinfo.h | 1 +
util/bufferiszero.c | 4 ++--
util/cpuinfo-i386.c | 1 +
3 files
This reverts commit 294ac64e459aca023f43441651d860980c9784f1.
Reinstate the ability to use Qemu on x86 hosts that do not meet
x86_64-v2 ISA baseline.
Signed-off-by: Alexander Monakov
---
meson.build | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/meson.build b
This reverts commit 433cd6d94a8256af70a5200f236dc8047c3c1468.
Revert in preparation to rolling back x86_64-v2 ISA requirement.
Signed-off-by: Alexander Monakov
---
util/cpuinfo-i386.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo
Hi,
On Fri, 31 May 2024, Paolo Bonzini wrote:
> x86-64-v2 processors were released in 2008, assume that we have one.
> This provides CMOV on 32-bit processors, and also POPCNT and various
> vector ISA extensions.
If my contributions to recent cleanups and speedups for buffer_is_zero
count for so
On Mon, 29 Apr 2024, Daniel P. Berrangé wrote:
> On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote:
> > From: Alexander Monakov
> >
> > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> > routines are invoked much more rar
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Split less-than and greater-than 256 cases.
> Use unaligned accesses for head and tail.
> Avoid using out-of-bounds pointers in loop boundary conditions.
I guess it did not carry
typedef uint64_t uint64_a __attribute__((may_alias));
along the
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> double-check with the compiler flags for __ARM_NEON and don't bother with
> a runtime check. Otherwise, model the loop after the x86 SSE2 function,
> and use VADDV to reduc
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Benchmark each acceleration function vs an aligned buffer of zeros.
>
> Signed-off-by: Richard Henderson
> ---
> +
> +static void test(const void *opaque)
> +{
> +size_t len = 64 * KiB;
This exceeds L1 cache capacity, so the performance ceil
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/15/24 13:37, Alexander Monakov wrote:
> > Ah, I guess you might be running at low perf_event_paranoid setting that
> > allows unprivileged sampling of kernel events? In our submissions the
> > percentage was for perf_
On Thu, 15 Feb 2024, Richard Henderson wrote:
> > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and
> > v2,
> > are you saying they did not reach your inbox?
> > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/
> > https://lore.kernel.org/qe
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/14/24 22:57, Alexander Monakov wrote:
> >
> > On Wed, 14 Feb 2024, Richard Henderson wrote:
> >
> >> v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/
> >>
> >> Changes fo
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/14/24 22:47, Alexander Monakov wrote:
> >
> > On Wed, 14 Feb 2024, Richard Henderson wrote:
> >
> >> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> >> double-check with the
e
speedup the patchset was bringing, doesn't it? Is there some concern I am
not seeing?
> - Split out a >= 256 integer routine.
> - Simplify acceleration selection for testing.
> - Add function pointer typedef.
> - Implement new aarch64 accelerations.
>
>
> r~
>
On Wed, 14 Feb 2024, Richard Henderson wrote:
> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> double-check with the compiler flags for __ARM_NEON and don't bother with
> a runtime check. Otherwise, model the loop after the x86 SSE2 function,
> and use VADDV to reduc
On Wed, 7 Feb 2024, Richard Henderson wrote:
> On 2/7/24 06:48, Alexander Monakov wrote:
> > Make buffer_is_zero a 'static inline' function that tests up to three
> > bytes from the buffer before handing off to an unrolled loop. This
> > eliminates call overhead
On Tue, 6 Feb 2024, Elena Ufimtseva wrote:
> Hello Alexander
>
> On Tue, Feb 6, 2024 at 12:50 PM Alexander Monakov
> wrote:
>
> > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> > routines are invoked much more rarely in normal use when mo
ncy is not important, since it feeds only a conditional jump,
which terminates the dependency chain.
I never observed PTEST variants to be faster on real hardware.
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 29 -
1 file c
Take into account that the inline wrapper ensures len >= 4.
Use __attribute__((may_alias)) for accesses via non-char pointers.
Avoid using out-of-bounds pointers in loop boundary conditions by
reformulating the 'for' loop as 'if (...) do { ... } while (...)'.
Signed-of
se in Qemu).
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
include/qemu/cutils.h | 28 +++-
util/bufferiszero.c | 76 ---
2 files changed, 47 insertions(+), 57 deletions(-)
diff --git a/include/qemu/cutils.h b/include/qem
performance, as described in
https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
Signed-off-by: Mikhail Romanov
Signed-off-by: Alexander Monakov
---
util/bufferiszero.c | 36 ++--
1 file changed, 2 insertions(+), 34 deletions(-)
diff --git a/util
in loops that should be limited by load
port throughput rather than ALU throughput.
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index c037d11d04
variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 108
1 file changed, 69 insertions(+), 39 deletions(-)
diff --git
.
Changed for v3:
- separate into 6 patches
- fix an oversight which would break the build on non-x86 hosts
- properly avoid out-of-bounds pointers in the scalar variant
Alexander Monakov (6):
util/bufferiszero: remove SSE4.1 variant
util/bufferiszero: introduce an inline wrapper
util
On Tue, 9 Jan 2024, Daniel P. Berrangé wrote:
> On Thu, Nov 09, 2023 at 03:52:38PM +0300, Alexander Monakov wrote:
> > I'd like to ping this patch on behalf of Mikhail.
> >
> > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
> >
> >
Ping^3.
On Thu, 14 Dec 2023, Alexander Monakov wrote:
> Ping^2.
>
> On Thu, 9 Nov 2023, Alexander Monakov wrote:
>
> > I'd like to ping this patch on behalf of Mikhail.
> >
> > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
> >
Ping^2.
On Thu, 9 Nov 2023, Alexander Monakov wrote:
> I'd like to ping this patch on behalf of Mikhail.
>
> https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
>
> If this needs to be split up a bit to ease review, please let us know.
>
> On
I'd like to ping this patch on behalf of Mikhail.
https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
If this needs to be split up a bit to ease review, please let us know.
On Fri, 27 Oct 2023, Mikhail Romanov wrote:
> Improve buffer_is_zero function which is often used in qem
Hello,
Is it feasible to consume a DTB file in Qemu itself to make the board match the
DeviceTree hardware description? For example on Arm there are quite a few .dts
files in Linux tree for various boards; having a "generic" Arm board in Qemu
that
could [to what degree?] emulate any of those soun
39 matches
Mail list logo