On Wed, Jul 24, 2013 at 8:27 PM, Kirill Yukhin <kirill.yuk...@gmail.com> wrote: > Hello, > By this patch I am starting series of patches toward Intel (R) AVX-512 and > SHA (see [1]) > extensions enabling in GCC. > I've already submitted corresponding patches to BinUtils (see [2],[3]). > > This patch adds comand-line options for avx512* use and relevant cpuid bits > detection. Vector registers are now 512-bit wide, so support for new modes > (e.g. V16SF) is added. AVX512F introduce new 16 registers zmm16-zmm31. Some > instructions are now have EVEX encoding and can now use those new registers > while > old instructions can't. We introduce new register class for them. We also add > new constraint "v" which allows zmm0-zmm31. We can't extend "x" constraint > because it's exposed in inline asm, and so may break some inline asm if we > assign e. g. xmm21 to non-evex encodable instruction. Idea is to replace all > uses of "x" for evex-encodable instructions with "v". And allow only scalar > and > 512-bit modes for registers 16+ in ix86_hard_regno_mode_ok. We update move > instructions to use evex-encodable versions of instructions for AVX512F to > allow usage of new registers. Main problem is with vector mov<mode>_internal > in sse.md. In AVX512F we have some instructions reading/writing e. g. ymm16+ > (for exmaple vinsert64x4/vextract64x4),but there in no ymm mov instruction > with evex encoding, so we have to use insert/extract instead.
Some comments while going through the mega-patch: +(define_insn "*movxi_internal_avx512f" + [(set (match_operand:XI 0 "nonimmediate_operand" "=x,x ,m") + (match_operand:XI 1 "vector_move_operand" "C ,xm,x"))] + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" +{ + switch (which_alternative) + { + case 0: + return standard_sse_constant_opcode (insn, operands[1]); + case 1: + case 2: + return "vmovdqu32\t{%1, %0|%0, %1}"; + default: + gcc_unreachable (); + } +} + [(set_attr "type" "sselog1,ssemov,ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "XI")]) Even with the fact, that the cost of unaligned move is equal to aligned one, I would rather see these moves split to aligned and unaligned case, in the same way as movoi_internal_avx. (define_insn "*mov<mode>_internal" - [(set (match_operand:V16 0 "nonimmediate_operand" "=x,x ,m") - (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,xm,x"))] + [(set (match_operand:V16 0 "nonimmediate_operand" "=v,v ,m") + (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,vm,v"))] ... + /* Reg -> reg move is always aligned. Just use wider move. */ + switch (mode) + { + case MODE_V8SF: + case MODE_V4SF: + return "vmovaps\t{%g1, %g0|%g0, %g1}"; + case MODE_V4DF: + case MODE_V2DF: + return "%vmovapd\t{%g1, %g0|%g0, %g1}"; No need for % prefix. This is AVX only insn. + case MODE_OI: + case MODE_TI: + return "vmovdqu64\t{%g1, %g0|%g0, %g1}"; vmovdqa64, we are operating on registers only. + case MODE_XI: + if (<MODE>mode == V8DImode) + return "vmovdqu64\t{%1, %0|%0, %1}"; + else + return "vmovdqu32\t{%1, %0|%0, %1}"; Please put mode calculations to mode attribute. It looks that sseinsnmode is not working correctly here, and needs to be updated to correctly select OI and XI mode. Uros.