Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-16 Thread Richard Henderson
On 2/16/24 10:20, Alexander Monakov wrote: FWIW, in situations like these I always recommend to run perf with fixed sampling rate, i.e. 'perf record -e cycles:P -c 10' or 'perf record -e cycles/period=10/P' to make sample counts between runs of different duration directly comparable (disp

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-16 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/15/24 13:37, Alexander Monakov wrote: > > Ah, I guess you might be running at low perf_event_paranoid setting that > > allows unprivileged sampling of kernel events? In our submissions the > > percentage was for perf_event_paranoid=2, i.e. rel

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-16 Thread Richard Henderson
On 2/15/24 13:37, Alexander Monakov wrote: Ah, I guess you might be running at low perf_event_paranoid setting that allows unprivileged sampling of kernel events? In our submissions the percentage was for perf_event_paranoid=2, i.e. relative to Qemu only, excluding kernel time under syscalls. O

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and > > v2, > > are you saying they did not reach your inbox? > > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/ > > https://lore.kernel.org/qe

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Richard Henderson
On 2/15/24 11:36, Alexander Monakov wrote: On Thu, 15 Feb 2024, Richard Henderson wrote: On 2/14/24 22:57, Alexander Monakov wrote: On Wed, 14 Feb 2024, Richard Henderson wrote: v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ Changes for v4: - Keep separate >= 2

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:57, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ > >> > >> Changes for v4: > >>- Keep separate >= 256 entry point,

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Richard Henderson
On 2/14/24 22:57, Alexander Monakov wrote: On Wed, 14 Feb 2024, Richard Henderson wrote: v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ Changes for v4: - Keep separate >= 256 entry point, but only keep constant length check inline. This allows the indirect fun

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Wed, 14 Feb 2024, Richard Henderson wrote: > v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ > > Changes for v4: > - Keep separate >= 256 entry point, but only keep constant length > check inline. This allows the indirect function call to be hidden > and opt

[PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Richard Henderson
v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ Changes for v4: - Keep separate >= 256 entry point, but only keep constant length check inline. This allows the indirect function call to be hidden and optimized away when the pointer is constant. - Split out a >=