https://bugs.kde.org/show_bug.cgi?id=383010
--- Comment #17 from Tanya <tatyana.a.mine...@intel.com> --- Hello Julian, Sorry for a late reply. Thank you very much for the comments. We have fixed most of these bugs, and hope to finish adding and debugging KNL AVX-512 instructions in about a month. Regarding the performance on AVX-2 code: >> As a side note -- before this lands, I would want to do some performance >> runs to check that this doesn't impact performance (or correctness) of >> existing IA support. On a few tiny AVX-2 benchmarks, Memcheck overhead is 0-1% bigger than that of a "clean" Valgrind version. We will run and measure it on bigger AVX-2 benchmarks. Do you have any obligatory benchmarks for Valgrind correctness and performance? Regarding the test files: >> Is this intended to replace the existing AVX test? Or is it a new test? >> This is unclear. The attached tests are new tests, usable on AVX-512 machines only. They recheck AVX and AVX-2 instructions on bigger vector registers, similarly to how avx-2.c test rechecks avx-1 instructions on ymm registers. Would it be ok to keep them as three separate tests files for AVX-512, or should they be merged into one avx-512.c test file? Regarding the FMA instructions: >> If I understand this right, that means the existing cases for serial vFMA >> insns are wrong, and also the VEX implementation is wrong. Is that >> correct? If so, shouldn't we just fix both the test case and >> implementation? The issue was, for serial (32- and 64-bit) FMA instructions, Valgrind used to set bits [128:32] or [128:64] of the destination to zero, while they should be left unchanged. We have fixed the implementation and added a new test, because the none/tests/amd64/fma.c test seems to be designed to only verify one float or one double value. Would you prefer us to provide not-AVX-512-related changes as a separate patches? I also have a question on our implementation of translation of EVEX instructions to IR. Currently, we use separate functions for VEX- and EVEX- prefixed instructions (file VEX/priv/guest_amd64_toIR.c, functions, for example, dis_ESC_0F38__VEX and dis_ESC_0F38__EVEX, respectively). However, looking at the next Intel AVX-512 instruction sets, the VL (Vector Length) set allows to run EVEX-prefixed instructions on xmm and ymm registers, so it basically duplicates the VEX code (for example, EVEX-prefixed "vmovpdd xmm1, xmm2" is an equivalent of VEX-prefixed "vmovpdd xmm1, xmm2"). The easiest way to implement it would be to unite the EVEX- and VEX- translator functions into something like "dis_ESC_0F38__VEX_EVEX". On the upside, there would be less duplicated code. On the downside, it means that EVEX-related code would no longer be contained in separate __EVEX functions, so it would probably be more difficult to review. An alternate approach would be to add VL code (basically, copy the __VEX translations) to the __EVEX functions. As a downside, it may be bothersome to maintain the __VEX and the EVEX VL implementations identical. It we were to implement those instructions in the future, what would be a preferable approach? Thank you, Tanya -- You are receiving this mail because: You are watching all bug changes.