https://bugs.kde.org/show_bug.cgi?id=383010

--- Comment #17 from Tanya <tatyana.a.mine...@intel.com> ---
Hello Julian,

Sorry for a late reply.
Thank you very much for the comments. We have fixed most of these bugs, and
hope to finish adding and debugging KNL AVX-512 instructions in about a month.

Regarding the performance on AVX-2 code:
>> As a side note -- before this lands, I would want to do some performance 
>> runs to check that this doesn't impact performance (or correctness) of 
>> existing IA support.
On a few tiny AVX-2 benchmarks, Memcheck overhead is 0-1% bigger than that of a
"clean" Valgrind version. We will run and measure it on bigger AVX-2
benchmarks.
Do you have any obligatory benchmarks for Valgrind correctness and performance? 


Regarding the test files:
>> Is this intended to replace the existing AVX test?  Or is it a new test? 
>> This is unclear.
The attached tests are new tests, usable on AVX-512 machines only. They recheck
AVX and AVX-2 instructions on bigger vector registers, similarly to how avx-2.c
test rechecks avx-1 instructions on ymm registers. Would it be ok to keep them
as three separate tests files for AVX-512, or should they be merged into one
avx-512.c test file?


Regarding the FMA instructions:
>> If I understand this right, that means the existing cases for serial vFMA 
>> insns are wrong, and also the VEX implementation is wrong. Is that 
>> correct? If so, shouldn't we just fix both the test case and 
>> implementation?
The issue was, for serial (32- and 64-bit) FMA instructions, Valgrind used to
set bits [128:32] or [128:64] of the destination to zero, while they should be
left unchanged. We have fixed the implementation and added a new test, because
the none/tests/amd64/fma.c test seems to be designed to only verify one float
or one double value. 
Would you prefer us to provide not-AVX-512-related changes as a separate
patches?


I also have a question on our implementation of translation of EVEX
instructions to IR. 
Currently, we use separate functions for VEX- and EVEX- prefixed instructions
(file VEX/priv/guest_amd64_toIR.c, functions, for example, dis_ESC_0F38__VEX
and dis_ESC_0F38__EVEX, respectively). 

However, looking at the next Intel AVX-512 instruction sets, the VL (Vector
Length) set allows to run EVEX-prefixed instructions on xmm and ymm registers,
so it basically duplicates the VEX code (for example, EVEX-prefixed "vmovpdd
xmm1, xmm2" is an equivalent of VEX-prefixed "vmovpdd xmm1, xmm2").

The easiest way to implement it would be to unite the EVEX- and VEX- translator
functions into something like "dis_ESC_0F38__VEX_EVEX". On the upside, there
would be less duplicated code. On the downside, it means that EVEX-related code
would no longer be contained in separate __EVEX functions, so it would probably
be more difficult to review.
An alternate approach would be to add VL code (basically, copy the __VEX
translations) to the __EVEX functions. As a downside, it may be bothersome to
maintain the __VEX and the EVEX VL implementations identical.

It we were to implement those instructions in the future, what would be a
preferable approach?


Thank you,
Tanya

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to