On Fri, Feb 28, 2020 at 6:15 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Fri, Feb 28, 2020 at 4:16 PM Jeff Law <l...@redhat.com> wrote: > > > > On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote: > > > > > > How about this? If it looks OK, I will post the whole patch set. > > It's better. I'm guessing the two cases that were previously handled with > > vextract/vbroadcast aren't supposed to happen? They're caught here IIUC: > > > > > + /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL, > > > + we can only use zmm register move without memory operand. */ > > > + if (evex_reg_p > > > + && !TARGET_AVX512VL > > > + && GET_MODE_SIZE (mode) < 64) > > > + { > > > + if (memory_operand (operands[0], mode) > > > + || memory_operand (operands[1], mode)) > > > + gcc_unreachable (); > > > > > > > If they truly can't happen, that's fine. My worry is I don't see changes to > > the operand predicates or constraints which would avoid this case. Is it > > prevented by the mode iterator on the operands? Again, just want to make > > sure > > I understand why the vextract/vbroadcast stuff isn't in the new code. > > There are no GCC testcases to show that they are actually ever used. That is > why I removed them and added gcc_unreachable ().
This is covered by the testcases I added: [hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c #include <x86intrin.h> extern __m128 d; void foo1 (__m128 x) { register __m128 xmm16 __asm ("xmm16") = x; asm volatile ("" : "+v" (xmm16)); d = xmm16; } [hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512 /tmp/x.c -S [hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512 -mno-avx512vl /tmp/x.c -S /tmp/x.c: In function ‘foo1’: /tmp/x.c:8:19: error: register specified for ‘xmm16’ isn’t suitable for data type 8 | register __m128 xmm16 __asm ("xmm16") = x; | ^~~~~ [hjl@gnu-cfl-2 gcc]$ GCC doesn't allow xmm16-xmm31/ymm16-ymm31 without AVX512VL since ix86_hard_regno_mode_ok has /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ if (TARGET_AVX512VL && (mode == OImode || mode == TImode || VALID_AVX256_REG_MODE (mode) || VALID_AVX512VL_128_REG_MODE (mode))) return true; /* xmm16-xmm31 are only available for AVX-512. */ if (EXT_REX_SSE_REGNO_P (regno)) return false; The vextract/vbroadcast stuff is dead code. > > I'm doing a little assuming that the <ssescalarsize> bits in the old code > > are > > mapped correctly to the 32/64 suffixes on the opcodes in the new version. > > > > I'm also assuming that mapping of "size" in the argument to ix86_get_ssemov > > to > > the operand modifiers g, t, and x are right. I'm guessing the operand > > modifiers weren't needed in the original because we had the actual operand > > and > > could look at it to get the right modifier. In the evex, but not avx512vl > > case > > those are forced to a g modifier which seems to match the original. > > > > Are we going to need further refinements to > > ix86_output_ssemov/ix86_get_ssemov? > > If so, then I'd suggest the next patch be those patterns which don't require > > further refinements to ix86_output_ssemov. > > 4 patches don't require changes in ix86_output_ssemov/ix86_get_ssemov: > > https://gitlab.com/x86-gcc/gcc/-/commit/426f2464abb80b97b8533f9efa15bbe72e6aa888 > https://gitlab.com/x86-gcc/gcc/-/commit/ec5b40d77f7a4424935275f1a7ccedbce83b6f54 > https://gitlab.com/x86-gcc/gcc/-/commit/92fdd98234984f86b66fb5403dd828661cd7999f > https://gitlab.com/x86-gcc/gcc/-/commit/f8fa5e571caf6740b36d042d631b4ace11683cd7 > > I can combine them into a single patch. > > Other 5 patches contain a small change to ix86_output_ssemov: > > https://gitlab.com/x86-gcc/gcc/-/commit/b1746392e1d350d689a80fb71b2c72f909c20f30 > https://gitlab.com/x86-gcc/gcc/-/commit/14c3cbdbdcc36fa1edea4572b89a039726a4e2bc > https://gitlab.com/x86-gcc/gcc/-/commit/69c8c928b26242116cc261a9d2f6b1265218f1d3 > https://gitlab.com/x86-gcc/gcc/-/commit/04335f582f0b281d5f357185d154087997fd7cfd > https://gitlab.com/x86-gcc/gcc/-/commit/64f6a5d6d3405331d9c02aaae0faccf449d6647a > > Should I made the change and submit them for review? I am preparing the new patch set. > > If no further refinements to ix86_output_ssemov/ix86_get_ssemov are > > required, > > then I think you can just send the rest of the pattern changes in a single > > unit. > > > > jeff > > -- H.J.